<h1>Introduction</h1>
Rain water's most well-known and most important effect is providing you with water to drink. According to the United States Geological Survey, rain water seeps into the ground in a process called infiltration. Some of the water seeps deep beneath the top layers of soil where it fills up the space between subsurface rocks--it becomes ground water, also called the water table. Less than 2 percent of the earth's water is ground water, but it provides 30 percent of our fresh water. Without rain water's continued replenishment of the water table, potable water would become scarcer than it already is. [[source]](https://www.livestrong.com/article/179964-importance-of-rain-water/)<br/><br/>
The Climate of India comprises a wide range of weather conditions across a vast geographic scale and varied topography, making generalisations difficult. Based on the Köppen system, India hosts six major climatic subtypes, ranging from arid desert in the west, alpine tundra and glaciers in the north, and humid tropical regions supporting rainforests in the southwest and the island territories. Many regions have starkly different microclimates. The country's meteorological department follows the international standard of four climatological seasons with some local adjustments: winter (December, January and February), summer (March, April and May), a monsoon rainy season (June to September), and a post-monsoon period (October to November). [[source]](https://en.wikipedia.org/wiki/Climate_of_India)<br/> <br/>
In thia kernel, we are going to analyze the data of rainfall in india to extract the information and represent the results in beautiful and friendly manner.

In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plot
import seaborn as sns

df = pd.read_csv("../input/Sub_Division_IMD_2017.csv")
df.head()

<h1>Sneak Peak</h1>
Lets sneak peak the data, to understant the basic context of the data. 
<br/>We got **19 columns** and **4188 rows**.

In [None]:
print("Columns:",df.shape[1],"   Rows:",df.shape[0])

Getting the datatypes to understand the columns and the type of value it contained.

In [None]:
print("Data Types\n",df.dtypes)

As indian teritory is divided into sub-devisions, we should get all unique sub-divisions to determine the data contained by each sub-divisions. We have **36 Sub-divisions** data.

In [None]:
print("India Subdivisons Unique Values:",df.SUBDIVISION.unique())
print("Total Sub-divisions:",len(df.SUBDIVISION.unique()))

Now , We will get the Min and Max and Average of each numerical columns including the year column.<br>India is getting average rainfall 1409.45 mm annualy.

In [None]:
for index, column in enumerate(df.columns):
    if (index == 0):
        continue
    print(column,": Max=",df[column].max(),"  Average:",round(df[column].mean(),2),"  Min:",df[column].min())

<h1>Exploratory Data Analysis</h1>
exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. Exploratory data analysis was promoted by John Tukey to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA). which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA. [[Source]](https://en.wikipedia.org/wiki/Exploratory_data_analysis)

**Box Plot for Months**<br/>
Following figure show the box plot of all months and all divisions of india. We can clearly see that,  July, august and june has maximum rainfall with average of 347.02, 289.74 and 230.12 respectly.

In [None]:
fig = plot.figure(figsize=(20, 10))
ax = sns.boxplot(data=df[df.columns[2:14]])
ax.set_title("Rainfall in Months")
plot.xlabel("Months",size='18')
plot.ylabel("Rainfall (mm)",size='18')
plot.show()

Following figure show the box plot of comulaive months and all divisions of india. We can clearly see that, JJAS(june july aguest and september) has maximum rainfall in box plot we can easiley see the mean and median of the each comulative month respectively.

In [None]:
fig = plot.figure(figsize=(20, 10))
ax = sns.boxplot(data=df[df.columns[15:19]])
ax.set_title("Rainfall in Comulative Months")
plot.xlabel("Comulative Months",size='18')
plot.ylabel("Rainfall (mm)",size='18')
plot.show()

In this box plot graph we measure the rainfall in one year as box plot shows the averge vavues and median and maximun value it become very easy to understand the graph.

In [None]:
fig = plot.figure(figsize=(20, 10))
ax = sns.boxplot(data=df.ANNUAL, orient='h')
ax.set_title("Rainfall in Comulative Months")
plot.xlabel("Rainfall (mm)",size='18')
plot.ylabel("Annual",size='18')
plot.show()

This graph is simple bar graph between the subdivisions and annual  rainfall in india and from this grpah we can see that most rain falls in  arunchal pardesh and costal karnataka.

In [None]:
fig = plot.figure(figsize =(20,30))
ax = sns.barplot(y=df.SUBDIVISION, x=df.ANNUAL,color="#db2959")
ax.set_title("Anual Rainfall in Subdivisions")
plot.yticks(size="14")
plot.show()

In this grph we will show you the values of rain annualy in diffrent years  we will show you minimun  maximum and mean values of each year respectively

In [None]:
grouped = df.groupby("YEAR")
year_rainfall_mean = grouped["ANNUAL"].agg(np.mean)
year_rainfall_max = grouped["ANNUAL"].agg(np.max)
year_rainfall_min = grouped["ANNUAL"].agg(np.min)

year_rainfall = pd.DataFrame(index= year_rainfall_mean.index)
year_rainfall['Mean'] = year_rainfall_mean.values
year_rainfall['Max'] = year_rainfall_max.values
year_rainfall['Min'] = year_rainfall_min.values

fig = plot.figure(figsize=(20,10))
ax = sns.lineplot(data = year_rainfall[year_rainfall.columns[0:3]])
plot.ylabel("Rainfall (mm)",size = "18")
plot.xlabel("Year",size = "18")
plot.show()

**Cleaning**

in last not least we will show you the comparison of rainfall in every month  of a year  by using the strip plot graph  from this graph we can conclude that most of rainfall occurs in the month of june 

In [None]:
for index,column in enumerate(df.columns):
    if index == 0 or index == 1:
        continue
    else:
        df[column].fillna(df[column].mean(), inplace = True)

In [None]:
months = df.columns[2:14]

fig, ax = plot.subplots(4,3)
fig.subplots_adjust(0,0,3,3)

month_count = 0

for i in range(0,4,1):
    for j in range(0,3,1):
        sns.stripplot(x=df[months[month_count]],ax=ax[i,j])
        month_count +=1
plot.show()