# **Imports**

In [None]:
# Common
import os 
import keras 
import numpy as np
import tensorflow as tf

# Data
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedShuffleSplit, train_test_split

# Model
from keras.models import Sequential, load_model
from keras.layers import Dense

# Callbacks
from keras.callbacks import EarlyStopping, ModelCheckpoint

# Metrics
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, precision_score, recall_score, classification_report

# **Data**

This first task to do is to **load the data and understand it**.

In [None]:
data = pd.read_csv('../input/smoke-detection-dataset/smoke_detection_iot.csv', )
# data.drop(['Unnamed'], axis=1)
data.head()

Great, all the **data is Numerical**. This will help us in **preprocessing**. The problem is the data is not on the same **Scale**. We will do that, but before that let's explore the data through **Data Visualization**.

In [None]:
data.drop(['Unnamed: 0', 'UTC'], axis=1, inplace=True)

# **EDA**

Let's start with the **Class Distribution**.

In [None]:
sns.catplot(
    data=data,
    x='Fire Alarm',
    kind='count'
)
plt.show()

The data is highly biased towards **"1"** class, the "**0" class** is **not even the 50%** of the **whole data**. This is not a good news because this will result in a **biased model**. This can be protected up to an extent using **stratified splitting**.

---
Let's have a look at the **cross relations** among the **data features**.

In [None]:
# Calculate
corr = data.corr()

# Visualize
plt.figure(figsize=(15,8))
sns.heatmap(
    data=corr,
    annot=True,
)
plt.show()

Keep in mind that **correlation only shows linear relations**. That means if there is **some quadratic or cubic or any other kind of relation** among features, it **cannot be seen in Correlation matrix**. We can easily notice some **strongly negative** and some **highly positive** relations. Let's visualize each of them.

The only reason for a **70% strong linear relation** could be that on **increasing pressure** you can see an **increase in humidity**. 

In [None]:
plt.figure(figsize=(10,8))
sns.scatterplot(
    data=data,
    y='Humidity[%]',
    x='Pressure[hPa]',
    hue='Fire Alarm',
)
plt.show()

Now that is interesting. Because on **lower pressure the values** are generally belonging to the **Class 1** and on **higher pressure and higher humidity**. The classes are **also belonging to Class 1**. That's the remaining areas of **Class 0**.

Again, this relation is good, but it's **not perfect**.

In [None]:
plt.figure(figsize=(10,8))
sns.scatterplot(
    data=data,
    y='eCO2[ppm]',
    x='TVOC[ppb]',
    hue="Fire Alarm"
)
plt.show()

This also reveals a something, like when **TVOC values** are **close to zero** The data belongs to **Class 1**. As the **TVOC values increases**. The **class changes to 0**.

In [None]:
plt.figure(figsize=(10,8))
sns.scatterplot(
    data=data,
    y='PM1.0',
    x='TVOC[ppb]',
    hue="Fire Alarm"
)
plt.show()

In [None]:
plt.figure(figsize=(10,8))
sns.scatterplot(
    data=data,
    y='NC0.5',
    x='TVOC[ppb]',
    hue="Fire Alarm"
)
plt.show()

All the above charts represents the same kind of relation.

In [None]:
plt.figure(figsize=(10,8))
sns.scatterplot(
    data=data,
    y='Raw H2',
    x='TVOC[ppb]',
    hue="Fire Alarm"
)
plt.show()

Now, this is interesting.

In [None]:
plt.figure(figsize=(10,8))
sns.scatterplot(
    data=data,
    y='Raw Ethanol',
    x='TVOC[ppb]',
    hue="Fire Alarm"
)
plt.show()

A situation that I think can be a problem for both the above charts is the **saturation or the constant values at 60,000 TVOC**. Both had a **negative relation** and this **seems correct**.

In [None]:
plt.figure(figsize=(10,8))
sns.scatterplot(
    data=data,
    y='Raw H2',
    x='eCO2[ppm]',
    hue="Fire Alarm"
)
plt.show()

There is a **clear divergence** between class as the values of **Raw H2 decreases**. That's why it has a **strong negative relation**. It can easily be spotted.

In [None]:
plt.figure(figsize=(10,8))
sns.scatterplot(
    data=data,
    y='PM1.0',
    x='eCO2[ppm]',
    hue="Fire Alarm"
)
plt.show()

In [None]:
plt.figure(figsize=(10,8))
sns.scatterplot(
    data=data,
    y='NC0.5',
    x='eCO2[ppm]',
    hue="Fire Alarm"
)
plt.show()

Both the **above distributions** are different but they look exactly the same.

In [None]:
plt.figure(figsize=(10,8))
sns.scatterplot(
    data=data,
    y='Raw H2',
    x='Raw Ethanol',
    hue="Fire Alarm"
)
plt.show()

Now, this looks like a **good linear relation**. The **linearity can be easily spotted**. The **Class distribution** is well separated. But it gets a little messy when both values **increases simultaneously**.

In [None]:
plt.figure(figsize=(10,8))
sns.scatterplot(
    data=data,
    y='PM1.0',
    x='NC0.5',
    hue="Fire Alarm"
)
plt.show()

The data is clearly linear, but it's not well separated in terms of class distribution. Other religions are kind of the same and there is **no single feature** which can decide the **fire alarm**.

# **Data Preprocessing**

The data is **numerical**. That's a **plus point** for us because we don't have to work on **categorical values**. But the scale of each feature is very different, so we need to bring all of them to the **same scale**.

In [None]:
y = data.pop('Fire Alarm').to_numpy()
X = data

In [None]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

The data is scaled, so now we can move towards **splitting the data into training and testing data.**

In [None]:
spliter = StratifiedShuffleSplit(n_splits=3, test_size=0.2)
for train_ids, test_ids in spliter.split(X_scaled, y):
    X_train, y_train = X_scaled[train_ids], y[train_ids]
    X_test, y_test = X_scaled[test_ids], y[test_ids]

In [None]:
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.2)

# **Dense Neural Network**

In [None]:
# Model Architecture
model = Sequential([
    Dense(32, activation='relu', kernel_initializer='he_normal', input_shape=(13,), name="Layer1"),
    Dense(64, activation='relu', kernel_initializer='he_normal', name="Layer2"),
    Dense(128, activation='relu', kernel_initializer='he_normal', name="Layer3"),
    Dense(1, activation='sigmoid', name="Output"),
], name="Model-V1")

# Compile
model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

# Callbacks
cbs = [
    EarlyStopping(patience=3, restore_best_weights=True),
    ModelCheckpoint("Dense-01.h5", save_best_only=True)
]

In [None]:
model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    epochs=10,
    callbacks=cbs
)

Superb the model is **perfect**. Let's confirm this by evaluating the model on the **testing data**.

In [None]:
model.evaluate(X_test, y_test)

# **Evaluation**

In [None]:
y_true, y_pred = y_test, np.round(model.predict(X_test))

In [None]:
# Calculate
f1 = f1_score(y_true, y_pred)
acc = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
cm = confusion_matrix(y_true, y_pred)

# Print
print(f"F1 Score : {f1}\n")
print(f"Accuracy : {acc}\n")
print(f"Precision : {precision}\n")
print(f"Recall : {recall}\n")
print(f"Confusion Matrix : \n{cm}\n")


In [None]:
print(classification_report(y_true, y_pred))

I don't have to say anything. Everything is in-front of you and it's **superb**.

---
**DeepNets**