# Neural Network Classification task - Room occupancy

The goal of this taks is to predict a room occupancy based on Temperature, Humidity, Light and CO2 measurements using neural networks in Keras. Ground-truth occupancy was obtained from time stamped pictures that were taken every minute.

## Data source
[http://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+](http://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+)

## Feature description
* **Date** - time stamp in the followign format: year-month-day hour:minute:second
* **Temperature** - temperature in degrees of Celsius
* **Relative Humidity** - Relative humidity in %
* **Light** - light intensity in Lux
* **CO2** - amount of CO2 in the air, measured in ppm
* **Humidity Ratio** - Humidity ratio derived from temperature and relative humidity, in kgwater-vapor/kg-air
* **Occupancy** - a target binary value, 0 for not occupied, 1 for occupied status

In [8]:
import pandas as pd
data = pd.read_csv('https://raw.githubusercontent.com/mlcollege/introduction-to-ml/master/data/occupancy.csv', sep=',')
data.head()

Unnamed: 0,Date,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
0,2015-02-04 17:51:00,23.18,27.272,426.0,721.25,0.004793,1
1,2015-02-04 17:51:59,23.15,27.2675,429.5,714.0,0.004783,1
2,2015-02-04 17:53:00,23.15,27.245,426.0,713.5,0.004779,1
3,2015-02-04 17:54:00,23.15,27.2,426.0,708.25,0.004772,1
4,2015-02-04 17:55:00,23.1,27.2,426.0,704.5,0.004757,1


## Neural Network Classifier
Implement a neural network classifier based on all numerical features.

### Data preparation

In [9]:
from sklearn.model_selection import train_test_split

X_all = data[['Temperature', 'Humidity', 'Light', 'CO2', 'HumidityRatio']]
y_all = data['Occupancy']

X_train, X_test, y_train, y_test = train_test_split(
    X_all,
    y_all,
    random_state=1,
    test_size=0.1)

print('Train size: {}'.format(len(X_train)))
print('Test size: {}'.format(len(X_test)))

Train size: 18504
Test size: 2056


Standardize the features

In [10]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Since the target values are binary, we don't need to encode them in one-hot representation.

In [11]:
print(y_test[:5])

16483    0
4625     0
14896    0
213      0
2052     0
Name: Occupancy, dtype: int64


### Training a classifier

Design and train a classification model. Use the [binary crossentropy](https://keras.io/losses/) loss function and Sigmoid output function. Experiment with various architectures and [optimizers](https://keras.io/optimizers/).

In [26]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation,Input, Dropout

model = Sequential([
    Input(shape=(5,)),  # Vstupní vrstva: 5 atributů Temperature, Humidity, Light, CO2, HumidityRatio
    Dense(128),          # První skrytá vrstva: 128 neuronů
    Activation('relu'),   # Aktivační funkce ReLU
    #Dropout(0.3),        # Dropout pro regularizaci (30 % neuronů vypnuto během tréninku)
    #Dense(64),          # Druhá skrytá vrstva: 64 neuronů
    #Activation('relu'),   # Aktivační funkce ReLU
    #Dropout(0.2),        # Dropout pro regularizaci (20 % neuronů vypnuto)
    Dense(1),           # Výstupní vrstva: 1 neuron (počet tříd)
    Activation('sigmoid') # Aktivační funkce sigmoid pro klasifikaci [0,1]
])

Compile the model

In [27]:
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])


Train the model

In [35]:
from tensorflow.keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_accuracy', patience=5)
model.fit(X_train, y_train,
          batch_size = 16, epochs = 10, verbose=1,
          validation_data=(X_test, y_test), callbacks=[early_stopping])

Epoch 1/10
[1m1157/1157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.9892 - loss: 0.0407 - val_accuracy: 0.9859 - val_loss: 0.0519
Epoch 2/10
[1m1157/1157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - accuracy: 0.9881 - loss: 0.0461 - val_accuracy: 0.9859 - val_loss: 0.0507
Epoch 3/10
[1m1157/1157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - accuracy: 0.9880 - loss: 0.0439 - val_accuracy: 0.9859 - val_loss: 0.0498
Epoch 4/10
[1m1157/1157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.9898 - loss: 0.0371 - val_accuracy: 0.9859 - val_loss: 0.0496
Epoch 5/10
[1m1157/1157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - accuracy: 0.9878 - loss: 0.0452 - val_accuracy: 0.9859 - val_loss: 0.0493
Epoch 6/10
[1m1157/1157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - accuracy: 0.9881 - loss: 0.0410 - val_accuracy: 0.9869 - val_loss: 0.0510
Epoch 7/10
[1m1

<keras.src.callbacks.history.History at 0x7ad6c63514d0>

### Evaluate the model

Predict target values and convert probabilities to binary values.

In [36]:
y_pred = model.predict(X_test)

print(y_pred.shape)

import numpy as np
y_test_class = y_test  # Protože y_test je již binární (0 nebo 1)
y_pred_class = (y_pred > 0.5).astype(int)  # Thresholding: hodnoty > 0.5 se překlopí na 1, jinak 0
print(y_pred_class.shape)

[1m65/65[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
(2056, 1)
(2056, 1)


Print evaluation metrics

In [37]:
from sklearn import metrics
from sklearn.metrics import accuracy_score


print ("Test accuracy: {:.4f}".format(accuracy_score(y_test_class, y_pred_class)))
print ()
print(metrics.classification_report(y_test_class, y_pred_class, digits=4))

Test accuracy: 0.9864

              precision    recall  f1-score   support

           0     0.9955    0.9866    0.9910      1570
           1     0.9580    0.9856    0.9716       486

    accuracy                         0.9864      2056
   macro avg     0.9768    0.9861    0.9813      2056
weighted avg     0.9866    0.9864    0.9864      2056

