###Load the Dataset

# Task
Load the dataset "dengue-dataset-with-alert-epidemic.csv", preprocess it, build a CNN-LSTM model, train it, and evaluate its performance.

## Load the dataset

### Subtask:
Load the dengue dataset from the provided CSV file into a pandas DataFrame.


**Reasoning**:
Load the dataset into a pandas DataFrame and display the first few rows.



In [248]:
import pandas as pd

df = pd.read_csv('/content/dengue-dataset-with-alert-epidemic.csv')
display(df.head())

Unnamed: 0,CITY,YEAR_WEEK,CASES,DEATHS,RAINFALL,TMAX,TMIN,TMEAN,RH,SUNSHINE,...,TMEAN_roll2_sum,TMEAN_roll4_sum,RH_roll2_mean,RH_roll4_mean,RH_roll2_sum,RH_roll4_sum,INCIDENCE_per_100k,RISK_LEVEL,ALERT,EPIDEMIC
0,CALOOCAN CITY,2016-W02,27,0,0.0,32.0,21.8,26.9,73.0,6.4,...,,,,,,,1.690776,Low,False,False
1,CALOOCAN CITY,2016-W03,19,0,0.0,32.3,23.0,27.65,67.0,8.3,...,,,,,,,1.189623,Low,False,False
2,CALOOCAN CITY,2016-W04,43,0,0.0,30.6,23.8,27.2,65.0,3.9,...,54.55,,70.0,,140.0,,2.691891,Moderate,False,False
3,CALOOCAN CITY,2016-W05,30,0,0.0,32.2,22.6,27.4,67.0,6.4,...,54.85,,66.0,,132.0,,1.877776,Low,False,False
4,CALOOCAN CITY,2016-W06,28,0,0.0,28.3,19.4,23.85,70.0,1.6,...,54.6,109.15,66.0,68.0,132.0,272.0,1.752322,Low,False,False


In [249]:
df.shape

(4403, 65)

## Preprocess the data

### Subtask:
Prepare the data for the CNN-LSTM model. This may involve handling missing values, scaling numerical features, encoding categorical features, and splitting the data into training and testing sets.


**Reasoning**:
Identify and handle missing values, identify categorical features and apply one-hot encoding, separate features and target, and split the data into training and testing sets as per the instructions.



## Preprocess the data

### Subtask:
Prepare the data for the CNN-LSTM model. This may involve handling missing values, scaling numerical features, encoding categorical features, and splitting the data into training and testing sets.


**Reasoning**:
Identify and handle missing values, identify categorical features and apply one-hot encoding, separate features and target, and split the data into training and testing sets as per the instructions.



In [250]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd # Ensure pandas is imported

# 1. Identify and handle missing values
# Check for missing values
print("Missing values before handling:")
print(df.isnull().sum())

# Impute missing numerical values with the mean
numerical_cols = df.select_dtypes(include=np.number).columns
for col in numerical_cols:
    if df[col].isnull().any():
        df[col] = df[col].fillna(df[col].mean())

# For categorical columns with missing values, imputation with mode is a common strategy.
categorical_cols = df.select_dtypes(include='object').columns
for col in categorical_cols:
     if df[col].isnull().any():
         df[col] = df[col].fillna(df[col].mode()[0])

print("\nMissing values after handling all types:")
print(df.isnull().sum())


# 2. Identify categorical features and apply appropriate encoding
# 'CITY' is categorical. 'RISK_LEVEL' is categorical with defined levels. 'ALERT' and 'EPIDEMIC' are boolean.

# Define the categories for 'RISK_LEVEL' including 'High' and 'Very High' as observed in the data
risk_level_categories = ['Low', 'Moderate', 'High', 'Very High']
df['RISK_LEVEL'] = pd.Categorical(df['RISK_LEVEL'], categories=risk_level_categories, ordered=False)


categorical_cols_to_encode = ['CITY', 'RISK_LEVEL']
# Use get_dummies without drop_first=True to keep all risk level columns for classification
df = pd.get_dummies(df, columns=categorical_cols_to_encode, drop_first=False)

# Convert boolean columns to integer (0 or 1)
df['ALERT'] = df['ALERT'].astype(int)
df['EPIDEMIC'] = df['EPIDEMIC'].astype(int)

# Convert 'YEAR_WEEK' to numerical format YYYYww
# Handle potential errors during conversion
def convert_year_week_to_numerical(year_week_str):
    try:
        year_str, week_str = year_week_str.split('-W')
        return int(year_str) * 100 + int(week_str)
    except:
        return np.nan # Return NaN for any conversion errors

df['YEAR_WEEK_numerical'] = df['YEAR_WEEK'].apply(convert_year_week_to_numerical)

# Drop the original 'YEAR_WEEK' column
df = df.drop('YEAR_WEEK', axis=1, errors='ignore') # Add errors='ignore'

# Impute any NaNs created during numerical conversion of YEAR_WEEK
if df['YEAR_WEEK_numerical'].isnull().any():
     df['YEAR_WEEK_numerical'] = df['YEAR_WEEK_numerical'].fillna(df['YEAR_WEEK_numerical'].mean())


# 3. Separate the target variable from the features
# Set the one-hot encoded 'RISK_LEVEL' columns as the target for classification.
# Identify the one-hot encoded 'RISK_LEVEL' columns after get_dummies.
risk_level_cols = [col for col in df.columns if 'RISK_LEVEL_' in col]
y_classification = df[risk_level_cols]

# Separate the features for classification
# Drop the original 'CASES' and the one-hot encoded 'RISK_LEVEL' columns from features
X_classification = df.drop(['CASES'] + risk_level_cols, axis=1)

# Identify and remove any remaining non-numerical columns from X_classification,
# except for the boolean columns from 'CITY' one-hot encoding which are already numerical (0 or 1).
non_numerical_cols_in_X_classification = X_classification.select_dtypes(exclude=np.number).columns
if len(non_numerical_cols_in_X_classification) > 0:
    print(f"\nRemoving non-numerical columns from features: {list(non_numerical_cols_in_X_classification)}")
    X_classification = X_classification.drop(non_numerical_cols_in_X_classification, axis=1, errors='ignore') # Add errors='ignore'


# Print columns in X_classification after removing non-numerical ones
print("\nColumns in X_classification after removing non-numerical columns:")
print(X_classification.columns)


# Scale numerical features in X_classification
# Identify numerical columns in X_classification after dropping any non-numerical ones
numerical_cols_classification = X_classification.select_dtypes(include=np.number).columns
scaler_classification = StandardScaler()
X_classification[numerical_cols_classification] = scaler_classification.fit_transform(X_classification[numerical_cols_classification])


# 4. Split the data into training and testing sets
X_train_classification, X_test_classification, y_train_classification, y_test_classification = train_test_split(
    X_classification, y_classification, test_size=0.2, random_state=42
)

# Reshape data for CNN-LSTM input (samples, timesteps, features)
# Since we are treating each row as a single timestep with multiple features,
# we reshape to (samples, 1, features)
X_train_classification_reshaped = X_train_classification.values.reshape((X_train_classification.shape[0], 1, X_train_classification.shape[1]))
X_test_classification_reshaped = X_test_classification.values.reshape((X_test_classification.shape[0], 1, X_test_classification.shape[1]))


print("\nShape of training features for classification:", X_train_classification_reshaped.shape)
print("Shape of testing features for classification:", X_test_classification_reshaped.shape)
print("Shape of training target for classification:", y_train_classification.shape)
print("Shape of testing target for classification:", y_test_classification.shape)
print("\nTarget columns after one-hot encoding:", y_classification.columns.tolist())

Missing values before handling:
CITY                   0
YEAR_WEEK              0
CASES                  0
DEATHS                 0
RAINFALL               0
                      ..
RH_roll4_sum          68
INCIDENCE_per_100k     0
RISK_LEVEL             0
ALERT                  0
EPIDEMIC               0
Length: 65, dtype: int64

Missing values after handling all types:
CITY                  0
YEAR_WEEK             0
CASES                 0
DEATHS                0
RAINFALL              0
                     ..
RH_roll4_sum          0
INCIDENCE_per_100k    0
RISK_LEVEL            0
ALERT                 0
EPIDEMIC              0
Length: 65, dtype: int64

Removing non-numerical columns from features: ['CITY_CALOOCAN CITY', 'CITY_LAS PINAS CITY', 'CITY_MAKATI CITY', 'CITY_MALABON CITY', 'CITY_MANDALUYONG CITY', 'CITY_MANILA CITY', 'CITY_MARIKINA CITY', 'CITY_MUNTINLUPA CITY', 'CITY_NAVOTAS CITY', 'CITY_PARANAQUE CITY', 'CITY_PASAY CITY', 'CITY_PASIG CITY', 'CITY_PATEROS', 'CITY_QUEZON C

**Reasoning**:
Evaluate the trained CNN-LSTM model on the testing data and print the test loss (Mean Squared Error).

## Modify cnn-lstm model for classification

### Subtask:
Change the architecture of the CNN-LSTM model to be suitable for multi-class classification, including adjusting the output layer and activation function.


**Reasoning**:
Design and implement the architecture of the CNN-LSTM model for multi-class classification using TensorFlow/Keras, adjusting the output layer and activation function, and printing the model summary.



In [251]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, LSTM, Dense, Dropout, MaxPooling1D, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

# Define the CNN-LSTM model for classification
model_classification = Sequential()

# CNN layers - Input shape should match X_train_classification_reshaped
model_classification.add(Conv1D(filters=64, kernel_size=1, activation='relu', input_shape=(X_train_classification_reshaped.shape[1], X_train_classification_reshaped.shape[2])))
model_classification.add(MaxPooling1D(pool_size=1))
model_classification.add(Dropout(0.2))

# LSTM layers
model_classification.add(LSTM(50, return_sequences=True))
model_classification.add(Dropout(0.2))
model_classification.add(LSTM(50))
model_classification.add(Dropout(0.2))

# Dense layers
model_classification.add(Dense(50, activation='relu'))
# Output layer for multi-class classification
# Number of units should be 4 for the four risk levels (Low, Moderate, High, Very High)
# Use 'softmax' activation for probability distribution over classes
model_classification.add(Dense(4, activation='softmax'))

# Print the model summary
model_classification.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


## Compile model for classification

### Subtask:
Compile the modified model with an appropriate loss function (e.g., categorical cross-entropy) and metrics for classification.


**Reasoning**:
Compile the CNN-LSTM model for classification using the Adam optimizer, categorical cross-entropy loss, and accuracy metric as instructed.



In [252]:
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

# Compile the classification model
# Use categorical_crossentropy for multi-class classification
model_classification.compile(optimizer=Adam(learning_rate=0.001),
                             loss='categorical_crossentropy',
                             metrics=['accuracy'])

print("Classification model compilation complete.")

# Train the classification model
# Using EarlyStopping to prevent overfitting
early_stopping_classification = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

history_classification = model_classification.fit(X_train_classification_reshaped, y_train_classification,
                                                  epochs=300, batch_size=32, validation_split=0.2,
                                                  callbacks=[early_stopping_classification])

print("Classification model training complete.")

Classification model compilation complete.
Epoch 1/300
[1m89/89[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 13ms/step - accuracy: 0.5245 - loss: 1.1839 - val_accuracy: 0.7617 - val_loss: 0.5804
Epoch 2/300
[1m89/89[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.7591 - loss: 0.5390 - val_accuracy: 0.8113 - val_loss: 0.4127
Epoch 3/300
[1m89/89[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.7989 - loss: 0.4414 - val_accuracy: 0.8539 - val_loss: 0.3423
Epoch 4/300
[1m89/89[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.8130 - loss: 0.4077 - val_accuracy: 0.8270 - val_loss: 0.3560
Epoch 5/300
[1m89/89[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.8250 - loss: 0.3662 - val_accuracy: 0.8894 - val_loss: 0.2745
Epoch 6/300
[1m89/89[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.8566 - loss: 0.3354 - val_accuracy: 0.8809 - val_los

**Reasoning**:
The classification model has been successfully compiled. The next step is to train the model on the prepared training data.



In [253]:
# Train the classification model
# Using EarlyStopping to prevent overfitting
early_stopping_classification = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

history_classification = model_classification.fit(X_train_classification_reshaped, y_train_classification,
                                                  epochs=300, batch_size=32, validation_split=0.2,
                                                  callbacks=[early_stopping_classification])

print("Classification model training complete.")

Epoch 1/300
[1m89/89[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.9192 - loss: 0.1859 - val_accuracy: 0.9191 - val_loss: 0.1970
Epoch 2/300
[1m89/89[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.9301 - loss: 0.1764 - val_accuracy: 0.9234 - val_loss: 0.1817
Epoch 3/300
[1m89/89[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.9169 - loss: 0.1851 - val_accuracy: 0.9277 - val_loss: 0.1812
Epoch 4/300
[1m89/89[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.9309 - loss: 0.1697 - val_accuracy: 0.9206 - val_loss: 0.1874
Epoch 5/300
[1m89/89[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.9181 - loss: 0.1810 - val_accuracy: 0.9305 - val_loss: 0.1810
Epoch 6/300
[1m89/89[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.9285 - loss: 0.1611 - val_accuracy: 0.9248 - val_loss: 0.1906
Epoch 7/300
[1m89/89[0m [32m━━━

## Evaluate the classification model

### Subtask:
Evaluate the performance of the classification model using appropriate metrics (e.g., accuracy, precision, recall, F1-score).


**Reasoning**:
Evaluate the trained classification model on the testing data using appropriate metrics including accuracy, precision, recall, and F1-score.



In [254]:
from sklearn.metrics import classification_report, confusion_matrix
import numpy as np

# Evaluate the classification model
loss_classification, accuracy_classification = model_classification.evaluate(X_test_classification_reshaped, y_test_classification, verbose=0)
print(f'Test Loss (Categorical Crossentropy): {loss_classification:.4f}')
print(f'Test Accuracy: {accuracy_classification:.4f}')

# Make predictions on the testing data
predictions_classification = model_classification.predict(X_test_classification_reshaped)

# Convert predicted probabilities to class labels (indices)
predicted_classes = np.argmax(predictions_classification, axis=1)

# Convert actual one-hot encoded test labels to class labels (indices)
actual_classes = np.argmax(y_test_classification.values, axis=1)

# Generate and print the classification report
# Get the target names from the columns of y_test_classification
target_names_classification = y_test_classification.columns.tolist()
print("\nClassification Report:")
print(classification_report(actual_classes, predicted_classes, target_names=target_names_classification))

# Generate and print the confusion matrix
print("\nConfusion Matrix:")
print(confusion_matrix(actual_classes, predicted_classes))

Test Loss (Categorical Crossentropy): 0.2453
Test Accuracy: 0.9205
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 19ms/step

Classification Report:
                      precision    recall  f1-score   support

      RISK_LEVEL_Low       0.96      0.94      0.95       452
 RISK_LEVEL_Moderate       0.87      0.92      0.89       217
     RISK_LEVEL_High       0.89      0.97      0.93       158
RISK_LEVEL_Very High       0.88      0.67      0.76        54

            accuracy                           0.92       881
           macro avg       0.90      0.87      0.88       881
        weighted avg       0.92      0.92      0.92       881


Confusion Matrix:
[[423  29   0   0]
 [ 18 199   0   0]
 [  0   0 153   5]
 [  0   0  18  36]]


## Make risk level predictions

### Subtask:
Use the trained classification model to predict the risk level on new data.


**Reasoning**:
Use the trained classification model to make predictions on the testing data, convert the predicted probabilities and actual one-hot encoded labels to class labels, and display the first few predictions and actuals for comparison.



In [255]:
import numpy as np

# Make predictions on the testing data
predictions_classification = model_classification.predict(X_test_classification_reshaped)

# Convert predicted probabilities to class labels (indices)
predicted_classes = np.argmax(predictions_classification, axis=1)

# Convert actual one-hot encoded test labels to class labels (indices)
actual_classes = np.argmax(y_test_classification.values, axis=1)

# Display the first 10 predicted class labels and their corresponding actual class labels
print("Sample Classification Predictions vs Actuals (Class Indices):")
for i in range(10):
    print(f"Predicted: {predicted_classes[i]}, Actual: {actual_classes[i]}")


[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Sample Classification Predictions vs Actuals (Class Indices):
Predicted: 0, Actual: 0
Predicted: 0, Actual: 0
Predicted: 2, Actual: 2
Predicted: 0, Actual: 0
Predicted: 2, Actual: 2
Predicted: 1, Actual: 1
Predicted: 0, Actual: 0
Predicted: 1, Actual: 1
Predicted: 1, Actual: 1
Predicted: 2, Actual: 2


In [256]:
df.head()

Unnamed: 0,CASES,DEATHS,RAINFALL,TMAX,TMIN,TMEAN,RH,SUNSHINE,POPULATION,LAND AREA,...,CITY_PATEROS,CITY_QUEZON CITY,CITY_SAN JUAN CITY,CITY_TAGUIG CITY,CITY_VALENZUELA CITY,RISK_LEVEL_Low,RISK_LEVEL_Moderate,RISK_LEVEL_High,RISK_LEVEL_Very High,YEAR_WEEK_numerical
0,27,0,0.0,32.0,21.8,26.9,73.0,6.4,1596900,55.8,...,False,False,False,False,False,True,False,False,False,201602
1,19,0,0.0,32.3,23.0,27.65,67.0,8.3,1597145,55.8,...,False,False,False,False,False,True,False,False,False,201603
2,43,0,0.0,30.6,23.8,27.2,65.0,3.9,1597390,55.8,...,False,False,False,False,False,False,True,False,False,201604
3,30,0,0.0,32.2,22.6,27.4,67.0,6.4,1597635,55.8,...,False,False,False,False,False,True,False,False,False,201605
4,28,0,0.0,28.3,19.4,23.85,70.0,1.6,1597880,55.8,...,False,False,False,False,False,True,False,False,False,201606


## Export the Classification Model

### Subtask:
Save the trained classification model in a suitable format for deployment.

**Reasoning**:
Save the trained classification model (`model_classification`) in the Keras SavedModel format, which is recommended for deployment and compatible with Streamlit.

In [257]:
# Export the classification model in Keras SavedModel format
model_classification_path = 'dengue_risk_level_classification_model.h5'
model_classification.save(model_classification_path)
print(f"Classification model saved to {model_classification_path}")



Classification model saved to dengue_risk_level_classification_model.h5


## Create a Simple Streamlit GUI for Testing

### Subtask:
Provide the Python code for a basic Streamlit application to load the model and make predictions.

**Reasoning**:
Generate a Python script that uses the `streamlit` library to create a simple web interface for testing. This script will load the saved classification model, take input features from the user, make a prediction using the model, and display the predicted risk level. It includes placeholders for loading the scaler and feature names, which would be necessary in a real application.

In [261]:
%%writefile streamlit_app.py
import streamlit as st
import tensorflow as tf
import numpy as np
import pandas as pd
# import json # No longer needed if not loading feature_names.json
# import joblib # No longer needed if not loading scaler.pkl
# import requests # Uncomment if you will call a weather API directly

# Load the trained classification model
@st.cache_resource
def load_model():
    # Use the correct path to your saved model file
    model = tf.keras.models.load_model('dengue_risk_level_classification_model.h5')
    return model

model = load_model()

# --- Scaler and Feature Names (Placeholders - NOT FOR REAL USE) ---
# This section uses placeholders as requested. In a real application,
# you MUST load the fitted scaler and the exact list of feature names
# used during model training.

# Determine the number of features from your trained model's input shape
# Use the actual input shape of your trained model
num_features = 62 # Replace with the actual number of features your model expects
feature_names = [f'feature_{i}' for i in range(num_features)] # Placeholder feature names

# Dummy scaler - REPLACE WITH YOUR LOADED FITTED SCALER IN A REAL APP
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# This dummy scaler is NOT fitted and will not correctly scale your data.
# In a real application, you need to load the scaler fitted on your training data.


# --- Streamlit App Layout ---
st.title('Dengue Risk Level Prediction with Weather Data')

st.write("Enter feature values, including weather data, to predict the dengue risk level.")

# Create input fields for features
input_data = {}
st.subheader("Input Features")

# You need to create input widgets for ALL num_features features
# (62 features in this case) and ensure they are in the correct order and data type.
# This is a simplified example.

# Example input fields for some features - YOU NEED TO ADD ALL 62 features
# Grouping inputs logically is recommended.

st.subheader("Weather Inputs")
# Example numerical weather inputs - replace with actual feature names
# These feature names must match the order and names used in your model's training data
# For this example, using generic names
if 'feature_0' in feature_names: # Replace 'feature_0' with the actual name of your first weather feature
    input_data['feature_0'] = st.number_input('Current Week Rainfall', value=0.0)
if 'feature_1' in feature_names: # Replace 'feature_1' with the actual name of your second weather feature
    input_data['feature_1'] = st.number_input('Current Week Max Temperature', value=30.0)
# Add inputs for all other weather features (current, lagged, rolled)...

st.subheader("Location")
# Example for one-hot encoded city - replace with actual city names and feature names
city_names = ['CALOOCAN CITY', 'LAS PINAS CITY', 'MAKATI CITY', 'MALABON CITY', 'MANDALUYONG CITY', 'MANILA CITY', 'MARIKINA CITY', 'MUNTINLUPA CITY', 'NAVOTAS CITY', 'PARANAQUE CITY', 'PASAY CITY', 'PASIG CITY', 'PATEROS', 'QUEZON CITY', 'SAN JUAN CITY', 'TAGUIG CITY', 'VALENZUELA CITY'] # Replace with your actual city names
selected_city = st.selectbox("Select City", city_names)

# You will need to map the selected_city to the correct one-hot encoded column name and set its value to 1.
# This requires knowing the exact column names from your training data.
# Example:
# if 'CITY_MAKATI_CITY' in feature_names: input_data['CITY_MAKATI_CITY'] = 1.0 # Example for Makati City


st.subheader("Other Features")
# Example other features - replace with actual feature names
if 'feature_20' in feature_names: # Replace 'feature_20' with the actual name of one of your other features
    input_data['feature_20'] = st.number_input('Population', value=1600000)
# Add inputs for all other features (lagged cases, population density, year_week, etc.)...


# --- API Call Integration (Example - Uncomment and modify) ---
# If you want to fetch data from an API directly in the app:
# api_key = "YOUR_WEATHER_API_KEY" # Get this securely, e.g., from Streamlit secrets
# city_for_api = selected_city # Or map selected_city to an API-compatible location name
# api_url = f"YOUR_WEATHER_API_ENDPOINT?location={city_for_api}&apikey={api_key}"
#
# if st.button('Fetch Weather Data from API'):
#     try:
#         response = requests.get(api_url)
#         response.raise_for_status() # Raise an exception for bad status codes
#         weather_data_from_api = response.json()
#
#         # Process weather_data_from_api to extract relevant features and update input_data
#         # This is where you map API response keys to your model's feature names.
#         # Example:
#         # input_data['feature_0'] = weather_data_from_api.get('current_rainfall', 0.0) # Map API data to placeholder feature names
#         # input_data['feature_1'] = weather_data_from_api.get('current_temp', 0.0)
#         # ... and calculate lagged/rolled features using historical data if needed.
#
#         st.success("Weather data fetched successfully from API (example). Please fill in other features and predict.")
#     except requests.exceptions.RequestException as e:
#         st.error(f"Error fetching weather data from API: {e}")
#         st.warning("Using manually entered weather data.")


if st.button('Predict Risk Level'):
    # --- Prepare Input Data for Prediction ---
    # Create a DataFrame with all expected features, initialized to 0.0
    # Use the placeholder feature_names for column names
    input_df = pd.DataFrame(0.0, index=[0], columns=feature_names)

    # Populate the DataFrame with user inputs
    # Map input_data from widgets to the correct placeholder feature_names columns
    for feature_name, value in input_data.items():
        if feature_name in input_df.columns:
             input_df[feature_name] = value

    # Handle one-hot encoded city: set the value to 1 for the selected city's column
    # This requires knowing the exact column names from your training data.
    # Example (REPLACE WITH ACTUAL COLUMN NAME LOGIC):
    city_col_name = f'CITY_{selected_city.replace(" ", "_").upper()}' # Example - adjust if needed
    if city_col_name in input_df.columns: # Check if the generated column name exists in placeholder features
         input_df[city_col_name] = 1.0
    # You would need to iterate through feature_names to find the correct city column
    # For example:
    # for col in feature_names:
    #     if col.startswith('CITY_') and selected_city.replace(" ", "_").upper() in col:
    #         input_df[col] = 1.0
    #         break # Assuming only one city can be selected


    # Ensure correct data types (e.g., float for numerical, int/float for binary)
    # input_df = input_df.astype(X_train_classification.dtypes) # Load and use actual dtypes


    # Scale numerical features using the dummy scaler
    # This will NOT produce correct results as the scaler is not fitted.
    # In a real app, load the fitted scaler and apply it correctly.
    numerical_cols_in_input = input_df.select_dtypes(include=np.number).columns.tolist()
    # Filter numerical_cols_in_input to exclude binary/one-hot encoded columns if necessary.
    # Example: numerical_cols_to_scale = [col for col in numerical_cols_in_input if not col.startswith('CITY_')]
    try:
         input_df[numerical_cols_in_input] = scaler.transform(input_df[numerical_cols_in_input])
    except Exception as e:
         st.error(f"Error scaling input features (using dummy scaler): {e}.")
         st.info("Please ensure you load and use the actual fitted scaler from training in a real application.")
         # Continue without scaling or stop depending on desired behavior


    # Ensure the final input_df has the EXACT same columns in the EXACT same order as the model expects
    # This is CRITICAL for the model prediction.
    # You must ensure the placeholder feature_names list matches the training features exactly.
    # If input_df is missing any columns from feature_names, add them with a default value (e.g., 0).
    # If input_df has extra columns, drop them.
    # Then, reindex input_df to match the order of feature_names.

    # Example: Reindex to match training features
    # if list(input_df.columns) != feature_names:
    #     st.warning("Input DataFrame columns do not exactly match the expected feature names or order. Attempting to reindex.")
    #     try:
    #         input_df = input_df.reindex(columns=feature_names, fill_value=0.0)
    #         st.info("Attempted to reindex input features.")
    #     except Exception as e:
    #         st.error(f"Failed to reindex input features: {e}. Cannot proceed with prediction.")
    #         st.stop() # Stop execution if reindexing fails
    # else:
    #      st.info("Input features match expected features and order.")


    # Reshape the input data for the model (samples, timesteps, features)
    # Assuming a single timestep as in our training data
    input_reshaped = input_df.values.reshape((input_df.shape[0], 1, input_df.shape[1]))


    # Make prediction
    try:
        prediction = model.predict(input_reshaped)
    except Exception as e:
        st.error(f"Error during model prediction: {e}. Ensure input shape matches model input shape.")
        st.stop()


    # Get the predicted class index
    predicted_class_index = np.argmax(prediction, axis=1)[0]

    # Map the predicted class index back to the original risk level label
    # This mapping MUST match the order of your one-hot encoded target columns (y_classification)
    # after preprocessing with drop_first=False.
    # Assuming the order is ['RISK_LEVEL_Low', 'RISK_LEVEL_Moderate', 'RISK_LEVEL_High', 'RISK_LEVEL_Very High']
    risk_level_map = {
        0: 'Low',
        1: 'Moderate',
        2: 'High',
        3: 'Very High'
    }
    predicted_risk_level = risk_level_map.get(predicted_class_index, 'Unknown')


    st.subheader('Prediction Result:')
    st.write(f'The predicted risk level is: **{predicted_risk_level}**')

    st.subheader('Prediction Probabilities:')
    # Display probabilities for each class
    # Ensure these column names match your model's output order
    col_names = ['Low', 'Moderate', 'High', 'Very High']
    prob_df = pd.DataFrame(prediction, columns=col_names)
    st.dataframe(prob_df)

# Add instructions on how to run the Streamlit app
st.markdown("---")
st.markdown("To run this Streamlit app:")
st.markdown("1. Save the code above as `streamlit_app.py` (which `%%writefile` does).")
st.markdown("2. Open a terminal in your Colab environment or local machine where the file is saved.")
st.markdown("3. Run the command: `streamlit run streamlit_app.py`")
st.markdown("4. If running in Colab, a public URL will be provided to access the app.")

Overwriting streamlit_app.py
