# Laboratory exercise 4

## Warm-Up Mode (2 points)

**Task Description**  
Using the given dataset, develop and implement **3** different neural networks to predict the **air quality level**. Each network should differ in the following ways:  

- **layer configurations** - use different numbers and types of layers;
- **activation functions** - try different activation functions;
- **neurons per layer** - experiment with different numbers of neurons in each layer; and
- **number of layers** - build networks with varying depths.

After developing the models, evaluate and compare the performance of all **3** approaches.

**About the Dataset**  
This dataset focuses on air quality assessment across various regions. The dataset contains 5,000 samples and captures critical environmental and demographic factors that influence pollution levels.

**Features**:  
- **Temperature (°C)**: Average temperature of the region.  
- **Humidity (%)**: Relative humidity recorded in the region.  
- **PM2.5 Concentration (µg/m³)**: Levels of fine particulate matter.  
- **PM10 Concentration (µg/m³)**: Levels of coarse particulate matter.  
- **NO2 Concentration (ppb)**: Nitrogen dioxide levels.  
- **SO2 Concentration (ppb)**: Sulfur dioxide levels.  
- **CO Concentration (ppm)**: Carbon monoxide levels.  
- **Proximity to Industrial Areas (km)**: Distance to the nearest industrial zone.  
- **Population Density (people/km²)**: Number of people per square kilometer in the region.  

**Target Variable**: **Air Quality**  
- **Good**: Clean air with low pollution levels.  
- **Moderate**: Acceptable air quality but with some pollutants present.  
- **Poor**: Noticeable pollution that may cause health issues for sensitive groups.  
- **Hazardous**: Highly polluted air posing serious health risks to the population.  

In [19]:
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder

In [4]:
df = pd.read_csv('./pollution_dataset.csv')

In [5]:
df.head()

Unnamed: 0,Temperature,Humidity,PM2.5,PM10,NO2,SO2,CO,Proximity_to_Industrial_Areas,Population_Density,Air Quality
0,29.8,59.1,5.2,17.9,18.9,9.2,1.72,6.3,319,Moderate
1,28.3,75.6,2.3,12.2,30.8,9.7,1.64,6.0,611,Moderate
2,23.1,74.7,26.7,33.8,24.4,12.6,1.63,5.2,619,Moderate
3,27.1,39.1,6.1,6.3,13.5,5.3,1.15,11.1,551,Good
4,26.5,70.7,6.9,16.0,21.9,5.6,1.01,12.7,303,Good


In [6]:
df['Air Quality'].value_counts()

Air Quality
Good         2000
Moderate     1500
Poor         1000
Hazardous     500
Name: count, dtype: int64

In [7]:
df.isnull().sum()

Temperature                      0
Humidity                         0
PM2.5                            0
PM10                             0
NO2                              0
SO2                              0
CO                               0
Proximity_to_Industrial_Areas    0
Population_Density               0
Air Quality                      0
dtype: int64

In [20]:
encoder = LabelEncoder()

In [21]:
df['Air Quality'] = encoder.fit_transform(df['Air Quality'])

In [22]:
df.head()

Unnamed: 0,Temperature,Humidity,PM2.5,PM10,NO2,SO2,CO,Proximity_to_Industrial_Areas,Population_Density,Air Quality
0,29.8,59.1,5.2,17.9,18.9,9.2,1.72,6.3,319,2
1,28.3,75.6,2.3,12.2,30.8,9.7,1.64,6.0,611,2
2,23.1,74.7,26.7,33.8,24.4,12.6,1.63,5.2,619,2
3,27.1,39.1,6.1,6.3,13.5,5.3,1.15,11.1,551,0
4,26.5,70.7,6.9,16.0,21.9,5.6,1.01,12.7,303,0


In [23]:
features = df.drop('Air Quality', axis=1)

In [24]:
features

Unnamed: 0,Temperature,Humidity,PM2.5,PM10,NO2,SO2,CO,Proximity_to_Industrial_Areas,Population_Density
0,29.8,59.1,5.2,17.9,18.9,9.2,1.72,6.3,319
1,28.3,75.6,2.3,12.2,30.8,9.7,1.64,6.0,611
2,23.1,74.7,26.7,33.8,24.4,12.6,1.63,5.2,619
3,27.1,39.1,6.1,6.3,13.5,5.3,1.15,11.1,551
4,26.5,70.7,6.9,16.0,21.9,5.6,1.01,12.7,303
...,...,...,...,...,...,...,...,...,...
4995,40.6,74.1,116.0,126.7,45.5,25.7,2.11,2.8,765
4996,28.1,96.9,6.9,25.0,25.3,10.8,1.54,5.7,709
4997,25.9,78.2,14.2,22.1,34.8,7.8,1.63,9.6,379
4998,25.3,44.4,21.4,29.0,23.7,5.7,0.89,11.6,241


In [25]:
target = df['Air Quality']

In [26]:
target

0       2
1       2
2       2
3       0
4       0
       ..
4995    1
4996    2
4997    2
4998    0
4999    2
Name: Air Quality, Length: 5000, dtype: int32

In [123]:
x_train, x_test, y_train, y_test = train_test_split(features, target, test_size=0.2)

In [124]:
scaler = StandardScaler()

In [125]:
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

In [101]:
x_train

array([[-1.37961906,  0.65819414, -0.51751803, ..., -0.89399498,
        -0.80924977,  0.48780816],
       [-1.63229926,  0.4487828 , -0.49714065, ..., -0.91217208,
         1.47088808, -1.9496115 ],
       [ 0.37427884,  1.20393281, -0.32189525, ...,  0.6874123 ,
        -0.86419285,  1.23728212],
       ...,
       [ 0.15132572,  0.13149288, -0.75797103, ...,  0.36022458,
        -0.91913593, -1.08282857],
       [-0.19053574,  0.69626893, -0.09366867, ..., -0.14873408,
         0.56432725, -0.95248527],
       [ 2.57408298,  1.16585801,  3.45606968, ...,  2.1961112 ,
        -0.06751818,  0.97659552]])

In [126]:
x_test

array([[ 1.8202474 ,  1.35579585, -0.80357901, ...,  1.19574161,
        -0.70537789,  1.23735957],
       [-1.67960997,  0.14120274, -0.46081569, ..., -1.19081246,
         1.30039641,  0.01611547],
       [-1.08134375, -0.21121806,  0.05740979, ...,  0.44305917,
        -0.87252575, -0.98728508],
       ...,
       [-0.3933376 ,  0.80828497, -0.281273  , ..., -0.49320435,
        -0.67751992,  0.28676957],
       [-0.1241178 , -1.55796897, -0.39960796, ..., -1.0623057 ,
         0.74323688, -1.82565265],
       [-1.3655202 , -0.48812011, -0.51794291, ..., -0.98887327,
         1.30039641, -1.32395238]])

In [127]:
y_train

1667    0
2199    3
4714    3
2189    2
3038    1
       ..
3265    0
2263    2
4526    3
561     2
3436    0
Name: Air Quality, Length: 4000, dtype: int32

In [141]:
def make_model(x_train, activation_functions, neurons, kernel_initializer):
    model = Sequential()
    for i, (a, n, k) in enumerate(zip(activation_functions, neurons, kernel_initializer)):
        if i == 0:
            model.add(Dense(n, input_dim=x_train.shape[1], kernel_initializer=k, activation=a))
        else:
            model.add(Dense(n, kernel_initializer=k, activation=a))
    model.add(Dense(4, activation='softmax', kernel_initializer='uniform'))
    model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model


In [142]:
model1 = make_model(
    x_train=x_train,
    activation_functions=['relu', 'relu'], 
    neurons=[32, 16], 
    kernel_initializer=['uniform', 'uniform'])

In [143]:
model2 = make_model(
    x_train=x_train,
    activation_functions=['relu', 'relu', 'relu'],
    neurons=[64, 32, 16],
    kernel_initializer=['uniform', 'uniform', 'uniform']
)

In [144]:
model3 = make_model(
    x_train=x_train,
    activation_functions=['relu', 'relu', 'relu', 'relu'],
    neurons=[128, 64, 32, 16],
    kernel_initializer=['uniform', 'uniform', 'uniform', 'uniform']
)

In [151]:
model1.fit(x_train, y_train, epochs=30, batch_size=32, verbose=2, validation_split=0.1)

Epoch 1/30
113/113 - 1s - 10ms/step - accuracy: 0.9469 - loss: 0.1335 - val_accuracy: 0.9275 - val_loss: 0.2407
Epoch 2/30
113/113 - 1s - 9ms/step - accuracy: 0.9483 - loss: 0.1312 - val_accuracy: 0.9200 - val_loss: 0.2273
Epoch 3/30
113/113 - 1s - 11ms/step - accuracy: 0.9497 - loss: 0.1316 - val_accuracy: 0.9250 - val_loss: 0.2233
Epoch 4/30
113/113 - 1s - 8ms/step - accuracy: 0.9486 - loss: 0.1306 - val_accuracy: 0.9300 - val_loss: 0.2287
Epoch 5/30
113/113 - 1s - 10ms/step - accuracy: 0.9478 - loss: 0.1301 - val_accuracy: 0.9325 - val_loss: 0.2211
Epoch 6/30
113/113 - 1s - 7ms/step - accuracy: 0.9469 - loss: 0.1289 - val_accuracy: 0.9275 - val_loss: 0.2230
Epoch 7/30
113/113 - 1s - 6ms/step - accuracy: 0.9481 - loss: 0.1288 - val_accuracy: 0.9275 - val_loss: 0.2242
Epoch 8/30
113/113 - 1s - 9ms/step - accuracy: 0.9483 - loss: 0.1287 - val_accuracy: 0.9250 - val_loss: 0.2269
Epoch 9/30
113/113 - 1s - 9ms/step - accuracy: 0.9500 - loss: 0.1288 - val_accuracy: 0.9275 - val_loss: 0.225

<keras.src.callbacks.history.History at 0x157d83a5cf0>

In [None]:
model2.fit(x_train, y_train, epochs=30, batch_size=32, verbose=2, validation_split=0.1)

Epoch 1/30
100/100 - 5s - 45ms/step - accuracy: 0.5403 - loss: 1.0953 - val_accuracy: 0.6762 - val_loss: 0.6796
Epoch 2/30
100/100 - 1s - 10ms/step - accuracy: 0.8141 - loss: 0.5074 - val_accuracy: 0.8338 - val_loss: 0.3491
Epoch 3/30
100/100 - 1s - 9ms/step - accuracy: 0.9056 - loss: 0.2748 - val_accuracy: 0.9075 - val_loss: 0.2249
Epoch 4/30
100/100 - 1s - 7ms/step - accuracy: 0.9325 - loss: 0.1996 - val_accuracy: 0.9212 - val_loss: 0.2004
Epoch 5/30
100/100 - 1s - 7ms/step - accuracy: 0.9416 - loss: 0.1666 - val_accuracy: 0.9275 - val_loss: 0.1951
Epoch 6/30
100/100 - 1s - 8ms/step - accuracy: 0.9434 - loss: 0.1534 - val_accuracy: 0.9300 - val_loss: 0.1841
Epoch 7/30
100/100 - 1s - 7ms/step - accuracy: 0.9456 - loss: 0.1482 - val_accuracy: 0.9350 - val_loss: 0.1835
Epoch 8/30
100/100 - 1s - 9ms/step - accuracy: 0.9438 - loss: 0.1484 - val_accuracy: 0.9325 - val_loss: 0.1810
Epoch 9/30
100/100 - 1s - 8ms/step - accuracy: 0.9497 - loss: 0.1426 - val_accuracy: 0.9312 - val_loss: 0.1824

<keras.src.callbacks.history.History at 0x157d83a5f90>

In [157]:
model3.fit(x_train, y_train, epochs=30, batch_size=32, verbose=2, validation_split=0.1)

Epoch 1/30
113/113 - 1s - 10ms/step - accuracy: 0.9739 - loss: 0.0710 - val_accuracy: 0.9400 - val_loss: 0.2082
Epoch 2/30
113/113 - 1s - 9ms/step - accuracy: 0.9697 - loss: 0.0781 - val_accuracy: 0.9425 - val_loss: 0.1883
Epoch 3/30
113/113 - 1s - 9ms/step - accuracy: 0.9722 - loss: 0.0754 - val_accuracy: 0.9425 - val_loss: 0.1895
Epoch 4/30
113/113 - 1s - 9ms/step - accuracy: 0.9750 - loss: 0.0728 - val_accuracy: 0.9375 - val_loss: 0.2086
Epoch 5/30
113/113 - 1s - 7ms/step - accuracy: 0.9700 - loss: 0.0806 - val_accuracy: 0.9325 - val_loss: 0.2315
Epoch 6/30
113/113 - 1s - 12ms/step - accuracy: 0.9747 - loss: 0.0688 - val_accuracy: 0.9375 - val_loss: 0.2469
Epoch 7/30
113/113 - 1s - 8ms/step - accuracy: 0.9725 - loss: 0.0718 - val_accuracy: 0.9300 - val_loss: 0.2090
Epoch 8/30
113/113 - 1s - 10ms/step - accuracy: 0.9758 - loss: 0.0670 - val_accuracy: 0.9475 - val_loss: 0.1847
Epoch 9/30
113/113 - 1s - 10ms/step - accuracy: 0.9767 - loss: 0.0666 - val_accuracy: 0.9400 - val_loss: 0.19

<keras.src.callbacks.history.History at 0x157ddc77760>

In [158]:
for i, model in enumerate([model1, model2, model3], 1):
    print(f"Evaluating Model {i}:")
    loss, accuracy = model.evaluate(x_test, y_test, verbose=0)
    print(f"Loss: {loss:.4f}, Accuracy: {accuracy:.4f}\n")


Evaluating Model 1:
Loss: 0.1530, Accuracy: 0.9370

Evaluating Model 2:
Loss: 0.1517, Accuracy: 0.9350

Evaluating Model 3:
Loss: 0.1694, Accuracy: 0.9420

