# Model Training

In this notebook, we will train a neural network model to predict air quality levels based on the features in the dataset.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler, LabelEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Input
from tensorflow.keras.utils import to_categorical
import joblib


## Load the Dataset

We will load the Air Quality dataset from Google Drive.

In [2]:
# Load the dataset
file_path = 'C:/Users/murug/Desktop/Projects/AIR_QUALITY_PREDICTION/notebooks/data/Modified_AirQualityUCI.xlsx'
air_quality_data = pd.read_excel(file_path)

## Prepare the Data

We will define the features (X) and the target (y) variables. Then, we will encode the target variable and split the data into training and testing sets.

In [3]:
# Define features (X) and target (y)
X = air_quality_data.drop(columns=['Pollutant_Level'])
y = air_quality_data['Pollutant_Level']

# Encode the target variable
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
y_categorical = to_categorical(y_encoded)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y_categorical, test_size=0.2, random_state=42)

## Standardize the Features

We will standardize the feature columns using StandardScaler.

In [4]:
# Standardize the feature columns
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## Define and Train the Neural Network Model

We will define a neural network model using Keras, compile it, and train it using the training data.

In [5]:
# Define the neural network model
def create_model():
    model = Sequential()
    model.add(Input(shape=(X_train.shape[1],)))
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(32, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(3, activation='softmax'))  # 3 output classes: Normal, Mid, High
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])  # Change loss function to 'categorical_crossentropy'
    return model

# Create and train the model
model = create_model()
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test))

Epoch 1/50
[1m234/234[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.6147 - loss: 0.8527 - val_accuracy: 0.8488 - val_loss: 0.3918
Epoch 2/50
[1m234/234[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8048 - loss: 0.4599 - val_accuracy: 0.9017 - val_loss: 0.2724
Epoch 3/50
[1m234/234[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8503 - loss: 0.3612 - val_accuracy: 0.9129 - val_loss: 0.2213
Epoch 4/50
[1m234/234[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8779 - loss: 0.3061 - val_accuracy: 0.9279 - val_loss: 0.1844
Epoch 5/50
[1m234/234[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8847 - loss: 0.2730 - val_accuracy: 0.9407 - val_loss: 0.1541
Epoch 6/50
[1m234/234[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9093 - loss: 0.2251 - val_accuracy: 0.9525 - val_loss: 0.1335
Epoch 7/50
[1m234/234[0m 

<keras.src.callbacks.history.History at 0x1af65144e60>

## Save the Model

We will save the trained model to a file.

In [6]:
# Save the model
model.save('air_quality_model.h5')



## Cross-Validation

We will perform cross-validation to evaluate the model's performance using scikeras.

In [7]:
# Perform cross-validation using scikeras
from scikeras.wrappers import KerasClassifier
from sklearn.pipeline import Pipeline

# One-hot encode the target variable for cross-validation
y_categorical = to_categorical(y_encoded)

keras_model = KerasClassifier(model=create_model, epochs=50, batch_size=32, verbose=0)
pipeline = Pipeline([('scaler', StandardScaler()), ('classifier', keras_model)])
cv_scores = cross_val_score(pipeline, X, y_categorical, cv=5)
print(f'Cross-Validation Accuracy: {cv_scores.mean():.4f} ± {cv_scores.std():.4f}')

joblib.dump(scaler, 'scaler.pkl')

Cross-Validation Accuracy: 0.9411 ± 0.0948


['scaler.pkl']