In [12]:
import matplotlib
matplotlib.use('Agg')  # Use a non-interactive backend
import matplotlib.pyplot as plt
print("Matplotlib imported successfully with Agg backend!")

AttributeError: module 'matplotlib' has no attribute 'get_data_path'

# Intuitive Supervised Learning for Flood Prediction

This notebook demonstrates how we can use machine learning to predict flood probability based on various environmental and human factors. We'll walk through each step, explaining how it relates to our flood prediction goal.

In [None]:
# First, we import the tools we need for our flood prediction project
import pandas as pd  # For handling our flood data
import numpy as np  # For numerical operations
from sklearn.model_selection import train_test_split  # To split our flood data
from sklearn.ensemble import RandomForestRegressor  # Our flood prediction model
from sklearn.metrics import mean_squared_error, r2_score  # To evaluate our flood predictions

import matplotlib.pyplot as plt  # For visualizing flood risk factors
import seaborn as sns  # For prettier visualizations

# We set a random seed to make our flood predictions reproducible
np.random.seed(42)

AttributeError: module 'matplotlib' has no attribute 'get_data_path'

## Step 1: Loading and Exploring Our Flood Data

First, we need to load and examine our flood-related data to understand what information we have to work with.

In [None]:
# Load the flood data from our CSV file
flood_data = pd.read_csv('flood_kaggle.csv')

# Let's look at the first few rows of our flood data
print("Here are the first few rows of our flood data:")
print(flood_data.head())

# And get some basic information about our flood dataset
print("\nHere's some information about our flood dataset:")
print(flood_data.info())

# This gives us an overview of our dataset. We can see all the factors that might influence flood probability,
# like MonsoonIntensity, TopographyDrainage, etc., and our target variable 'FloodProbability' at the end.
# Understanding these factors is crucial for predicting flood risk.

## Step 2: Preparing Our Flood Data

Now that we've loaded our data, we need to prepare it for our machine learning model. We'll split our data into two parts:
1. The features (X) - all the factors that might influence flooding
2. The target (y) - the flood probability we want to predict

In [None]:
# Separate our flood risk factors (X) and flood probability (y)
X = flood_data.drop('FloodProbability', axis=1)  # All columns except FloodProbability
y = flood_data['FloodProbability']  # Just the FloodProbability column

# Now we split our data into training and testing sets
# We'll use 80% of the data to train our flood prediction model, and 20% to test it
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"We have {X_train.shape[0]} flood scenarios to train our model")
print(f"And {X_test.shape[0]} flood scenarios to test it")

# This split allows us to train our model on one set of flood data and then test how well it performs on data it hasn't seen before.
# This helps us understand if our model can generalize to new flood scenarios, which is crucial for predicting future flood risks.

## Step 3: Training Our Flood Prediction Model

Now we'll use a Random Forest model to learn patterns from our training data. Think of this like the model studying many examples of past flood scenarios to understand what factors lead to higher flood probabilities.

In [None]:
# Create our Random Forest model for flood prediction
# n_estimators=100 means it will create 100 decision trees to make its predictions
model = RandomForestRegressor(n_estimators=100, random_state=42)

# Train the model on our flood data
model.fit(X_train, y_train)

print("Our flood prediction model has finished learning from the training data!")

# The model has now learned patterns from the training data. It's like it has studied many past flood scenarios
# and understands how different factors (like monsoon intensity, topography, etc.) relate to flood probability.

## Step 4: Evaluating Our Flood Prediction Model

Now that our model has learned, let's see how well it can predict flood probabilities for scenarios it hasn't seen before.

In [None]:
# Use our trained model to make flood probability predictions on the test data
y_pred = model.predict(X_test)

# Calculate how well our predictions match the actual flood probabilities
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.4f}")
print(f"R-squared Score: {r2:.4f}")

# The Mean Squared Error (MSE) tells us how far off our flood predictions are on average. Lower is better.
# The R-squared score tells us how well our model explains the variability in flood probability. Closer to 1 is better.
# These metrics help us understand how reliable our flood predictions might be for new areas or future scenarios.

## Step 5: Understanding Important Factors for Flood Prediction

One of the benefits of our model is that it can tell us which factors are most important in predicting flood probability. This can help focus flood prevention efforts.

In [None]:
# Get the importance of each flood risk factor
feature_importance = model.feature_importances_
features = X.columns

# Sort flood risk factors by importance
feature_importance_sorted = sorted(zip(feature_importance, features), reverse=True)

# Create a bar chart of flood risk factor importances
plt.figure(figsize=(12, 8))
sns.barplot(x=[imp for imp, _ in feature_importance_sorted], 
            y=[feat for _, feat in feature_importance_sorted])
plt.title("Which Factors Are Most Important for Predicting Floods?")
plt.xlabel("Importance Score")
plt.tight_layout()
plt.show()

# Print the top 5 most important flood risk factors
print("The 5 most important factors for predicting flood probability are:")
for imp, feat in feature_importance_sorted[:5]:
    print(f"{feat}: {imp:.4f}")

# This analysis helps us understand which factors contribute most to flood risk.
# It could guide where to focus flood prevention efforts or what to monitor most closely for early warning systems.

## Conclusion

We've now built a model that can predict flood probability based on various environmental and human factors. This model could be used to:
1. Estimate flood risk for new areas
2. Identify the most critical factors contributing to flood risk
3. Guide decision-making for flood prevention and preparedness

Remember, this is a simplified model and real-world flood prediction is very complex. But this gives us a starting point for understanding and predicting flood risks. As you continue your project, you might want to consider:
- Collecting more detailed local data to improve predictions
- Incorporating time-based data to predict flood risks over time
- Exploring other machine learning models to see if they perform better
- Creating a user-friendly interface for local authorities to use these predictions