#PROBLEM STATEMENT
'''1.Import the data into Python environment as a Pandas DataFrame.
2.Check for missing values, if any and drop the corresponding rows.
3.Find the district that gets the highest annual rainfall.
4.Display the top 5 states that get the highest annual rainfall.
5.Drop the columns 'Jan-Feb', 'Mar-May', 'Jun-Sep', 'Oct-Dec'.
6.Display the state-wise mean rainfall for all the months using a pivot table.
7.Display the count of districts in each state.
8.For each state, display the district that gets the highest rainfall in May. Also display the recorded rainfall.'''

In [25]:
import pandas as pd

# Step 1: Import the data into Python as a Pandas DataFrame
url = 'rainfall.csv'  # Replace with the actual URL or file path
df = pd.read_csv(url)

In [4]:
# Step 2: Check for missing values and drop corresponding rows
df.dropna(inplace=True)

In [6]:
# Step 3: Find the district that gets the highest annual rainfall
df['Annual'] = df.iloc[:, 2:].sum(axis=1) 
max_rainfall_district = df.loc[df['Annual'].idxmax(),'DISTRICT']
print(f"District with the highest annual rainfall: {max_rainfall_district}")


District with the highest annual rainfall: TAMENGLONG


In [8]:
# Step 4: Display the top 5 states that get the highest annual rainfall
state_rainfall = df.groupby('STATE_UT_NAME')['Annual'].sum().sort_values(ascending=False)
top_5_states = state_rainfall.head(5)
print("Top 5 states with the highest annual rainfall:")
print(top_5_states)

Top 5 states with the highest annual rainfall:
STATE_UT_NAME
UTTAR PRADESH        407019.6
ASSAM                397606.2
MADHYA PRADESH       309693.0
ARUNACHAL PRADESH    281028.0
BIHAR                273726.6
Name: Annual, dtype: float64


In [15]:
df.columns

Index(['STATE_UT_NAME', 'DISTRICT', 'JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN',
       'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC', 'ANNUAL', 'Annual'],
      dtype='object')

In [16]:
# Step 5: Drop the columns 'Jan-Feb', 'Mar-May', 'Jun-Sep', 'Oct-Dec'
columns_to_drop = ['Annual']
df.drop(columns=columns_to_drop, inplace=True)

In [18]:
# Step 6: Display the state-wise mean rainfall for all the months using a pivot table
# Adjust the column names if necessary
monthly_columns = [col for col in df.columns if col not in ['DISTRICT', 'STATE_UT_NAME', 'ANNUAL']]
pivot_table = pd.pivot_table(df, index='STATE_UT_NAME', values=monthly_columns, aggfunc='mean')
print("State-wise mean rainfall for all the months:")
print(pivot_table)

State-wise mean rainfall for all the months:
                                    APR         AUG         DEC        FEB  \
STATE_UT_NAME                                                                
ANDAMAN And NICOBAR ISLANDS   86.966667  385.300000  159.733333  33.266667   
ANDHRA PRADESH                19.873913  179.426087   15.565217   7.352174   
ARUNACHAL PRADESH            275.162500  378.600000   35.956250  93.293750   
ASSAM                        181.266667  377.370370   11.440741  31.714815   
BIHAR                         16.865789  289.481579    5.786842   9.278947   
CHANDIGARH                    14.800000  287.500000   23.400000  38.900000   
CHATISGARH                    13.116667  375.338889    5.811111  10.472222   
DADAR NAGAR HAVELI             0.000000  655.900000    0.000000   0.300000   
DAMAN AND DUI                  0.100000  394.600000    0.450000   0.500000   
DELHI                          8.900000  245.500000    8.600000  16.300000   
GOA                

In [20]:
# Step 7: Display the count of districts in each state
district_count = df['STATE_UT_NAME'].value_counts()
print("Count of districts in each state:")
print(district_count)

Count of districts in each state:
STATE_UT_NAME
UTTAR PRADESH                  71
MADHYA PRADESH                 50
BIHAR                          38
MAHARASHTRA                    35
RAJASTHAN                      33
TAMIL NADU                     32
KARNATAKA                      30
ORISSA                         30
ASSAM                          27
GUJARAT                        26
JHARKHAND                      24
ANDHRA PRADESH                 23
JAMMU AND KASHMIR              22
HARYANA                        21
PUNJAB                         20
WEST BENGAL                    19
CHATISGARH                     18
ARUNACHAL PRADESH              16
KERALA                         14
UTTARANCHAL                    13
HIMACHAL                       12
NAGALAND                       11
MIZORAM                         9
MANIPUR                         9
DELHI                           9
MEGHALAYA                       7
SIKKIM                          4
TRIPURA                         4


In [21]:
# Step 8: For each state, display the district that gets the highest rainfall in May and the recorded rainfall
highest_may_rainfall = df.loc[df.groupby('STATE_UT_NAME')['MAY'].idxmax()][['STATE_UT_NAME', 'DISTRICT', 'MAY']]
print("District with the highest rainfall in May for each state:")
print(highest_may_rainfall)

District with the highest rainfall in May for each state:
                   STATE_UT_NAME       DISTRICT    MAY
1    ANDAMAN And NICOBAR ISLANDS  SOUTH ANDAMAN  374.4
544               ANDHRA PRADESH  VISAKHAPATNAM   96.6
10             ARUNACHAL PRADESH     PAPUM PARE  453.0
31                         ASSAM      KARIMGANJ  604.0
194                        BIHAR     KISHANGANJ  155.7
306                   CHANDIGARH     CHANDIGARH   30.1
519                   CHATISGARH         BASTAR   38.6
479           DADAR NAGAR HAVELI            DNH    7.4
480                DAMAN AND DUI          DAMAN    7.4
307                        DELHI      NEW DELHI   19.3
488                          GOA      NORTH GOA   94.3
458                      GUJARAT          DANGS   12.5
303                      HARYANA      PANCHKULA   27.9
341                     HIMACHAL  LAHUL & SPITI   91.7
349            JAMMU AND KASHMIR      BARAMULLA  111.4
154                    JHARKHAND          PAKUR   86.1
598    