# BAYES THEOREM - FLOOD ANALYSIS

- Bayes' Theorem, named after 18th-century British mathematician, Thomas Bayes, is a mathematical formula for determining conditional probability.
- Conditional probability is the likelihood of an outcome occuring, when some other probabilities are known in advance.  
P(A|B) = $\frac{P(B|A)\ *\ P(A)}{P(B)}$

### Modules

In [1]:
import pandas as pd
import numpy as np

### Initializing dataframe

In [2]:
df = pd.read_csv('kerala_floods.csv')
df.head()

Unnamed: 0,SUBDIVISION,YEAR,JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC,ANNUAL RAINFALL,FLOODS
0,KERALA,1901,28.7,44.7,51.6,160.0,174.7,824.6,743.0,357.5,197.7,266.9,350.8,48.4,3248.6,YES
1,KERALA,1902,6.7,2.6,57.3,83.9,134.5,390.9,1205.0,315.8,491.6,358.4,158.3,121.5,3326.6,YES
2,KERALA,1903,3.2,18.6,3.1,83.6,249.7,558.6,1022.5,420.2,341.8,354.1,157.0,59.0,3271.2,YES
3,KERALA,1904,23.7,3.0,32.2,71.5,235.7,1098.2,725.5,351.8,222.7,328.1,33.9,3.3,3129.7,YES
4,KERALA,1905,1.2,22.3,9.4,105.9,263.3,850.2,520.5,293.6,217.2,383.5,74.4,0.2,2741.6,NO


In [6]:
df.describe()

Unnamed: 0,YEAR,JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC,ANNUAL RAINFALL
count,118.0,118.0,118.0,118.0,118.0,118.0,118.0,118.0,118.0,118.0,118.0,118.0,118.0,118.0
mean,1959.5,12.218644,15.633898,36.670339,110.330508,228.644915,651.617797,698.220339,430.369492,246.207627,293.207627,162.311017,40.009322,2925.405085
std,34.207699,15.473766,16.40629,30.063862,44.633452,147.548778,186.181363,228.988966,181.980463,121.901131,93.705253,83.200485,36.67633,452.169407
min,1901.0,0.0,0.0,0.1,13.1,53.4,196.8,167.5,178.6,41.3,68.5,31.5,0.1,2068.8
25%,1930.25,2.175,4.7,18.1,74.35,125.05,535.55,533.2,316.725,155.425,222.125,93.025,10.35,2613.525
50%,1959.5,5.8,8.35,28.4,110.4,184.6,625.6,691.65,386.25,223.55,284.3,152.45,31.1,2934.3
75%,1988.75,18.175,21.4,49.825,136.45,264.875,786.975,832.425,500.1,334.5,355.15,218.325,54.025,3170.4
max,2018.0,83.5,79.0,217.2,238.0,738.8,1098.2,1526.5,1398.9,526.7,567.9,365.6,202.3,4473.0


In [8]:
df.isnull().sum()

SUBDIVISION         0
YEAR                0
JAN                 0
FEB                 0
MAR                 0
APR                 0
MAY                 0
JUN                 0
JUL                 0
AUG                 0
SEP                 0
OCT                 0
NOV                 0
DEC                 0
 ANNUAL RAINFALL    0
FLOODS              0
dtype: int64

In [24]:
# creating a new dataframe to store the new data
newDf = pd.DataFrame()

# converting FLOODS column into binary format
df['FLOODS'].unique()
newDf['Floods'] = df['FLOODS'].map({'YES':1, 'NO':0})
newDf['Year'] = df['YEAR']

### Que: What is the probability of rainfall more than 500mm in June, provided it has flooded that year?

In [25]:
# creating a column that stores the data for the month of June where rainfall is more than 500mm.
newDf['June_more_than_500'] = (df['JUN'] > 500).astype('int')
newDf.head()

Unnamed: 0,Floods,Year,June_more_than_500
0,1,1901,1
1,1,1902,0
2,1,1903,1
3,1,1904,1
4,0,1905,1


In [29]:
cross = pd.crosstab(newDf['Floods'], newDf['June_more_than_500'], margins=True)
cross

June_more_than_500,0,1,All
Floods,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,19,39,58
1,6,54,60
All,25,93,118


In [37]:
# Probability of rainfall more than 500mm in the month of June is:
p_Rainfall = 93/118

# Probability of Flood in the month of June is:
p_Flood = 60/118

# Joint Probability i.e. Probability of flood and rainfall more than 500mm in the month of June
p_Rainfall_and_Flood = 54/118

In [40]:
# Conditional Probability:

# Probability of Flood, given it rained more than  500mm in the month of June is:
p_Flood_given_Rainfall = p_Rainfall_and_Flood / p_Rainfall
p_Flood_given_Rainfall

0.5806451612903226

In [43]:
# Complement

# Probability of Rainfall less than 500mm in June is given by:
p_noRainfall = 25/118

# Probability of Rainfall less than 500mm and Flood in June is given by:
p_Flood_and_noRainfall = 6/118

# Probability of Flooding given it rained less than 500mm in the month of June
p_Flood_given_noRainfall = p_Flood_and_noRainfall / p_noRainfall
p_Flood_given_noRainfall

0.24000000000000002

In [49]:
# Now, Probability of Rainfall more than 500mm, given it flooded that year is:
# Using Bayes Theorem;

    # OR simply use, p_Rainfall_given_flood = (p_Flood_given_Rainfall*p_Rainfall) / p_Flood

p_Rainfall_given_flood = (p_Flood_given_Rainfall*p_Rainfall) / ((p_Rainfall*p_Flood_given_Rainfall) + (p_noRainfall*p_Flood_given_noRainfall))
p_Rainfall_given_flood

0.8999999999999999

We can conclude that, whenever there is flood in the month of June, it most probably rains more than 500mm.