# Problem Statement
# Title: Flood risk prediction in India using machine learning

# Statement :
### Floods in India cause severe damage due to unpredictable rainfall, rising river discharge, 
### and growing population pressure. Existing models mostly rely on basic factors like rainfall and water levels, 
### which limits accuracy. This project aims to build an enhanced flood risk prediction system using historical climate,
### hydrological,geographical, and socio-environmental data. Along with core features (rainfall, river discharge,
### water level,land cover, population density, soil type), it introduces new features such as Rainfall Anomaly, 
### Flood Risk Index,and Urbanization Pressure Score. By combining these, the system can provide early,
### location-specific flood warnings, supporting disaster management authorities in reducing economic loss and 
### safeguarding communities.

In [8]:
#  Step 1: Import libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#  Step 2: Load dataset

df = pd.read_csv("flood_risk_dataset_india.csv")

# Show first 5 rows
df.head()

#  Step 3: Basic Exploration
# Dataset info
print("\nDataset Info:")
print(df.info())

# Summary statistics
print("\nSummary Statistics:")
print(df.describe())

# Missing values check
print("\nMissing Values:")
print(df.isnull().sum())


Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Latitude                10000 non-null  float64
 1   Longitude               10000 non-null  float64
 2   Rainfall (mm)           10000 non-null  float64
 3   Temperature (°C)        10000 non-null  float64
 4   Humidity (%)            10000 non-null  float64
 5   River Discharge (m³/s)  10000 non-null  float64
 6   Water Level (m)         10000 non-null  float64
 7   Elevation (m)           10000 non-null  float64
 8   Land Cover              10000 non-null  object 
 9   Soil Type               10000 non-null  object 
 10  Population Density      10000 non-null  float64
 11  Infrastructure          10000 non-null  int64  
 12  Historical Floods       10000 non-null  int64  
 13  Flood Occurred          10000 non-null  int64  
dtypes: float64(9), int64(3),