# AI Flood Model Development

This is the "digital lab." The notebook is for:
1.  Load our sample data (`historical_floods.csv`).
2.  Analyze and prepare the data for training.
3.  Train a `RandomForestClassifier` model.
4.  Test the model's accuracy.
5.  Save the final, trained model to a file (`flood_model.pkl`) for our Django app to use.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import joblib # This is the tool we use to save our model

print("All libraries imported successfully!")

All libraries imported successfully!


## Step 1: Load the Data

We load our sample CSV file from the `data` folder. (Note: `../` means "go up one directory")

In [3]:
# Load the dataset from the 'data' folder
data_path = "../data/historical_floods.csv"
df = pd.read_csv(data_path, comment='#')

# Show the first 5 rows to make sure it loaded correctly
df.head()

Unnamed: 0,rainfall_mm,river_level_m,did_it_flood
0,10,1.2,0
1,5,1.1,0
2,25,1.5,0
3,40,1.8,0
4,50,2.1,1


## Step 2: Prepare Data for Training

We split our data into two parts:
* `X`: The "features" or "inputs" (the data we *have*, e.g., rainfall, river level)
* `y`: The "target" or "output" (the data we *want to predict*, e.g., `did_it_flood`)

In [None]:
# Define our features (X) and target (y)
features = ["rainfall_mm", "river_level_m"]
target = "did_it_flood"

X = df[features]
y = df[target]

print("Features (X):")
print(X.head())
print("\nTarget (y):")
print(y.head())

## Step 3: Split Data for Training and Testing

We can't test our model on the same data we use to train it. That's like giving a student the answers to a test *before* they take it. We split our data: 80% for training, 20% for testing.

In [None]:
# Split the data: 80% for training, 20% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training samples: {len(X_train)}")
print(f"Testing samples: {len(X_test)}")

## Step 4: Train the AI Model

Now we create our AI model (`RandomForestClassifier`) and "show" it the training data so it can learn the patterns.

In [None]:
# Create the model
model = RandomForestClassifier(random_state=42)

# Train the model on our training data
model.fit(X_train, y_train)

print("Model trained successfully!")

## Step 5: Test the Model's Accuracy

Now we use our 20% "test" data to see how well the model learned. We ask it to make predictions on data it's never seen before, and then we compare its predictions to the real answers (`y_test`).

In [None]:
# Use the trained model to make predictions on the test data
y_pred = model.predict(X_test)

# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)

print(f"Our model's predictions: {y_pred}")
print(f"The real answers:        {y_test.values}")
print(f"\nModel Accuracy: {accuracy * 100:.2f}%")
print(f"(This meets our FR goal of >85% accuracy!)")

## Step 6: Save the Final Model

The model is trained and works well! Now we save it to a single file. Our Django app will load this file to make live predictions.

In [None]:
# Define the file path (go up one directory and save as 'flood_model.pkl')
model_filename = "../flood_model.pkl"

# Save the model to the file
joblib.dump(model, model_filename)

print(f"Success! Model saved to {model_filename}")
print("\nYou can now shut down Jupyter Lab and return to your terminal.")