# DISASTER RISK PREDICTION

*   Overview: Predicts flood probability using environmental and     human-influenced features.
*   Problem Statement: Estimates flood risk for better disaster preparedness and planning.
*   Dataset: flood.csv with 20 features and a target FloodProbability.
*   Features: Includes MonsoonIntensity, TopographyDrainage, RiverManagement, Urbanization, ClimateChange, DamsQuality, Siltation, AgriculturalPractices, Encroachments, PoliticalFactors, InadequatePlanning, WetlandLoss, etc.
*   Target: FloodProbability (0–1)
Preprocessing: Numerical features scaled using StandardScaler; no categorical features currently.
*   Model: Linear Regression to predict flood probability.
Train/Test Split: 80% training, 20% testing.
*   Evaluation: Mean Squared Error (MSE) and R-squared (R²) score.
Predictions: Input feature values and get flood probability as decimal and percentage.
*   CLI Interface: Users can enter values interactively to get real-time predictions.
*   Dependencies: Python 3.8+, pandas, scikit-learn, numpy.
*   Future Enhancements: Use Random Forest/XGBoost, add satellite data, web interface, extend to multiple disaster types.












In [17]:
!pip install -q google-genai pandas scikit-learn

# Task
Analyze the dataset located at "/content/flood.csv", preprocess it, build a machine learning model to predict floods, and evaluate the model's performance.

## Load the data

### Subtask:
Load the data from "/content/flood.csv" into a dataframe.


**Reasoning**:
Import pandas and load the data into a DataFrame, then display the head of the DataFrame.



In [16]:
import pandas as pd

df = pd.read_csv("/content/flood.csv")
df.head()

Unnamed: 0,MonsoonIntensity,TopographyDrainage,RiverManagement,Deforestation,Urbanization,ClimateChange,DamsQuality,Siltation,AgriculturalPractices,Encroachments,...,DrainageSystems,CoastalVulnerability,Landslides,Watersheds,DeterioratingInfrastructure,PopulationScore,WetlandLoss,InadequatePlanning,PoliticalFactors,FloodProbability
0,3,8,6,6,4,4,6,2,3,2,...,10,7,4,2,3,4,3,2,6,0.45
1,8,4,5,7,7,9,1,5,5,4,...,9,2,6,2,1,1,9,1,3,0.475
2,3,10,4,1,7,5,4,7,4,9,...,7,4,4,8,6,1,8,3,6,0.515
3,4,4,2,7,3,4,1,4,6,4,...,4,2,6,6,8,8,6,6,10,0.52
4,3,7,5,2,5,8,5,2,7,5,...,7,6,5,3,3,4,4,3,4,0.475


## Analyze the data

### Subtask:
Perform basic data analysis to understand the data structure, summary statistics, and identify missing values.


**Reasoning**:
Display the data structure, summary statistics, and missing values of the dataframe.



In [7]:
df.info()
df.describe()
df.isnull().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 21 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   MonsoonIntensity                 50000 non-null  int64  
 1   TopographyDrainage               50000 non-null  int64  
 2   RiverManagement                  50000 non-null  int64  
 3   Deforestation                    50000 non-null  int64  
 4   Urbanization                     50000 non-null  int64  
 5   ClimateChange                    50000 non-null  int64  
 6   DamsQuality                      50000 non-null  int64  
 7   Siltation                        50000 non-null  int64  
 8   AgriculturalPractices            50000 non-null  int64  
 9   Encroachments                    50000 non-null  int64  
 10  IneffectiveDisasterPreparedness  50000 non-null  int64  
 11  DrainageSystems                  50000 non-null  int64  
 12  CoastalVulnerabili

Unnamed: 0,0
MonsoonIntensity,0
TopographyDrainage,0
RiverManagement,0
Deforestation,0
Urbanization,0
ClimateChange,0
DamsQuality,0
Siltation,0
AgriculturalPractices,0
Encroachments,0


## Preprocess the data

### Subtask:
Handle missing values, encode categorical features, and scale numerical features as needed.


**Reasoning**:
Identify and encode categorical features, separate features and target variable, and scale numerical features.



In [8]:
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# Identify categorical and numerical features
categorical_features = df.select_dtypes(include='object').columns
numerical_features = df.select_dtypes(include=['int64', 'float64']).columns

# The target variable is 'FloodProbability', so remove it from features
numerical_features = numerical_features.drop('FloodProbability')

# Create a column transformer for preprocessing
# No categorical features found, so only numerical features are processed
# We can skip encoding if there are no categorical features
if len(categorical_features) == 0:
    preprocessor = StandardScaler()
    X = df[numerical_features]
else:
    # This part is for handling categorical features, but based on previous analysis, there are none.
    # We keep it here for completeness in case the data changes.
    preprocessor = ColumnTransformer(
        transformers=[
            ('num', StandardScaler(), numerical_features),
            # Add other transformers for categorical features if they exist
        ],
        remainder='passthrough' # Keep other columns (if any)
    )
    X = df[numerical_features.tolist() + categorical_features.tolist()]

y = df['FloodProbability']

# Apply preprocessing (scaling) to features
if len(categorical_features) == 0:
    X_processed = preprocessor.fit_transform(X)
    X_processed = pd.DataFrame(X_processed, columns=numerical_features)
else:
    X_processed = preprocessor.fit_transform(X)
    # Note: Column names are lost after ColumnTransformer, need to handle this if needed

display(X_processed.head())
display(y.head())

Unnamed: 0,MonsoonIntensity,TopographyDrainage,RiverManagement,Deforestation,Urbanization,ClimateChange,DamsQuality,Siltation,AgriculturalPractices,Encroachments,IneffectiveDisasterPreparedness,DrainageSystems,CoastalVulnerability,Landslides,Watersheds,DeterioratingInfrastructure,PopulationScore,WetlandLoss,InadequatePlanning,PoliticalFactors
0,-0.890321,1.342509,0.441028,0.446084,-0.440927,-0.443851,0.438597,-1.338607,-0.897767,-1.341169,-0.002255,2.231345,0.89008,-0.441806,-1.334944,-0.891125,-0.440066,-0.898457,-1.342769,0.449446
1,1.345004,-0.438066,-0.007144,0.895983,0.896486,1.801586,-1.788597,0.005106,-0.002739,-0.448954,0.44697,1.784535,-1.335031,0.455973,-1.334944,-1.787539,-1.780394,1.790031,-1.791202,-0.886231
2,-0.890321,2.232796,-0.455316,-1.803411,0.896486,0.005236,-0.452281,0.900915,-0.450253,1.781585,-1.349932,0.890914,-0.444987,-0.441806,1.353026,0.453496,-1.780394,1.34195,-0.894336,0.449446
3,-0.443256,-0.438066,-1.351659,0.895983,-0.886732,-0.443851,-1.788597,-0.442798,0.444776,-0.448954,1.794647,-0.449518,-1.335031,0.455973,0.457036,1.34991,1.347039,0.445787,0.450962,2.230349
4,-0.890321,0.897365,-0.007144,-1.353512,0.004877,1.352498,-0.006842,-1.338607,0.89229,-0.002846,0.896196,0.890914,0.445058,0.007083,-0.886949,-0.891125,-0.440066,-0.450375,-0.894336,-0.441005


Unnamed: 0,FloodProbability
0,0.45
1,0.475
2,0.515
3,0.52
4,0.475


## Build the model

### Subtask:
Split the data into training and testing sets and train a machine learning model to predict floods.


**Reasoning**:
Split the data into training and testing sets, import and instantiate a Linear Regression model, and train the model.



In [9]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X_processed, y, test_size=0.2, random_state=42)

# Instantiate and train the model
model = LinearRegression()
model.fit(X_train, y_train)

print("Model training complete.")

Model training complete.


## Evaluate the model

### Subtask:
Evaluate the performance of the trained model using appropriate metrics.


**Reasoning**:
Use the trained Linear Regression model to make predictions on the test set (`X_test`) and then calculate the Mean Squared Error (MSE) and R-squared score to evaluate the model's performance.



In [10]:
from sklearn.metrics import mean_squared_error, r2_score

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)

# Calculate R-squared score
r2 = r2_score(y_test, y_pred)

# Print the evaluation metrics
print(f"Mean Squared Error (MSE): {mse}")
print(f"R-squared (R2) Score: {r2}")

Mean Squared Error (MSE): 1.2317015329136353e-32
R-squared (R2) Score: 1.0


In [11]:
import numpy as np
import pandas as pd

# Create a sample input dictionary
sample_input = {
    'MonsoonIntensity': 3, 'TopographyDrainage': 5, 'RiverManagement': 7, 'Deforestation': 2,
    'Urbanization': 6, 'ClimateChange': 4, 'DamsQuality': 7, 'Siltation': 3, 'AgriculturalPractices': 5,
    'Encroachments': 2, 'IneffectiveDisasterPreparedness': 6, 'DrainageSystems': 8, 'CoastalVulnerability': 7,
    'Landslides': 4, 'Watersheds': 5, 'DeterioratingInfrastructure': 4, 'PopulationScore': 3,
    'WetlandLoss': 2, 'InadequatePlanning': 6, 'PoliticalFactors': 7
}

# Convert the dictionary to a DataFrame
sample_input_df = pd.DataFrame([sample_input])

# Ensure the columns are in the same order as the training data
feature_names = X_processed.columns
sample_input_df = sample_input_df[feature_names]

# Preprocess the sample input using the same scaler
sample_input_processed = preprocessor.transform(sample_input_df)

# Make a prediction
predicted_probability = model.predict(sample_input_processed)

# Convert the probability to a percentage
predicted_percentage = predicted_probability[0] * 100

print(f"Predicted Flood Probability: {predicted_percentage:.2f}%")

Predicted Flood Probability: 48.00%




## Summary:

### Data Analysis Key Findings

*   The dataset contains 50,000 entries and 21 columns with no missing values.
*   All features are numerical except for the target variable `FloodProbability`, which is `float64`.
*   The preprocessing step involved scaling the numerical features using `StandardScaler`.
*   A Linear Regression model was trained on 80% of the data and tested on the remaining 20%.
*   The trained model achieved a Mean Squared Error (MSE) of approximately 1.23e-32 and an R-squared score of 1.0 on the test set.

### Insights or Next Steps

*   The perfect R-squared score of 1.0 and extremely low MSE suggest potential issues like data leakage, perfect linearity in the test set, or overfitting. Further investigation is needed to understand this result.
*   Consider evaluating the model on a separate, unseen dataset or using cross-validation to get a more robust measure of performance and check for overfitting.


In [13]:
import numpy as np
import pandas as pd

# Assuming model, preprocessor, feature_names exist from previous steps

def cli_flood_predict():
    print("🌊 DisasterNet - Flood Risk Predictor (CLI)")
    print("Enter values for the following features:\n")

    # Create dictionary to hold user inputs
    user_input = {}

    for feature in feature_names:
        while True:
            try:
                val = float(input(f"{feature}: "))
                user_input[feature] = val
                break
            except ValueError:
                print("Invalid input! Please enter a numeric value.")

    # Convert input to DataFrame
    input_df = pd.DataFrame([user_input])

    # Scale using the same preprocessor
    input_scaled = preprocessor.transform(input_df)

    # Predict flood probability
    pred_prob = model.predict(input_scaled)[0]

    # Clip probability between 0 and 1
    pred_prob = np.clip(pred_prob, 0, 1)

    print(f"\nPredicted Flood Probability: {pred_prob:.3f} (~{pred_prob*100:.1f}%)\n")

# Run the CLI predictor
cli_flood_predict()


🌊 DisasterNet - Flood Risk Predictor (CLI)
Enter values for the following features:

MonsoonIntensity: 3
TopographyDrainage: 3
RiverManagement: 10
Deforestation: 2
Urbanization: 8
ClimateChange: 2
DamsQuality: 3
Siltation: 5
AgriculturalPractices: 8
Encroachments: 1
IneffectiveDisasterPreparedness: 3
DrainageSystems: 6
CoastalVulnerability: 2
Landslides: 6
Watersheds: 1
DeterioratingInfrastructure: 4
PopulationScore: 3
WetlandLoss: 6
InadequatePlanning: 2
PoliticalFactors: 4

Predicted Flood Probability: 0.410 (~41.0%)





In [15]:
import gradio as gr

with gr.Blocks() as demo:
    gr.Markdown("## 🌊 DisasterNet - Flood Risk Predictor")

    with gr.Row():
        with gr.Column():
            input1 = gr.Number(label="MonsoonIntensity")
            input2 = gr.Number(label="TopographyDrainage")
            input3 = gr.Number(label="RiverManagement")
        with gr.Column():
            input4 = gr.Number(label="Deforestation")
            input5 = gr.Number(label="Urbanization")
            input6 = gr.Number(label="ClimateChange")
    with gr.Row():
        with gr.Column():
            input7 = gr.Number(label="DamsQuality")
            input8 = gr.Number(label="Siltation")
            input9 = gr.Number(label="AgriculturalPractices")
        with gr.Column():
            input10 = gr.Number(label="Encroachments")
            input11 = gr.Number(label="IneffectiveDisasterPreparedness")
            input12 = gr.Number(label="DrainageSystems")
    # Continue grouping all 20 features similarly

    output = gr.Textbox(label="Flood Probability")
    submit_btn = gr.Button("Predict")

    # Connect function
    submit_btn.click(
        fn=predict_flood_ui,
        inputs=[input1,input2,input3,input4,input5,input6,input7,input8,input9,input10,input11,input12],  # add remaining inputs
        outputs=output
    )

demo.launch()




It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://315d56e1a7b2ce612e.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


