# **AIR QUALITY PREDICTION**

*In this notebook I've attempted to predict the air quality in a certain room based on certain features*

*Learnings - Random Forest Regression Model*

# **What is Random Forrest Regression ?**

***Random Forest Regression is a supervised learning algorithm that uses ensemble learning method for regression. Ensemble learning method is a technique that combines predictions from multiple machine learning algorithms to make a more accurate prediction than a single model.***

https://miro.medium.com/max/1400/1*ZFuMI_HrI3jt2Wlay73IUQ.png

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#SKLEARN library has the modules for data split foir test,train and RandomForest
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

import shap

**MQ sensors (MQ2, MQ9, MQ135, MQ137, MQ138) which have great sensitivity, low latency and low cost; each sensor can respond to different gases;

Analog CO2 gas sensor (MG-811) which has excellent sensitivity to carbon dioxide and is scarcely affected by the temperature and humidity of the air.

The dataset contains 1845 collected samples describing 4 target situations:

1 - Normal situation - Activity: clean air, a person sleeping or studying or resting - Samples: 595;

2 - Preparing meals - Activities: cooking meat or pasta, fried vegetables. One or two people in the room, forced air circulation - Samples: 515.

3 - Presence of smoke - Activity: burning paper and wood for a short period of time in a room with closed windows and doors - Example: 195.

4 - Cleaning - Activity: use of spray and liquid detergents with ammonia and / or alcohol. Forced air circulation can be activated or deactivated - Samples: 540.**

In [3]:
data = pd.read_csv("../input/adl-classification/dataset.csv", names = ['MQ1', 'MQ2', 'MQ3', 'MQ4', 'MQ5', 'MQ6', 'CO2'])
data.info()


# **Splitting the dataset into test and train**

In [4]:
def preprocessing_input(df):
    df = df.copy()
    
    #split df into x and y
    y = df['CO2']
    x = df.drop('CO2', axis = 1)
    
    #Here 0.7 denotes 70% of training data and 30% testing
    x_train, x_test, y_train, y_test = train_test_split(x, y, train_size = 0.7, shuffle = True, random_state = 1)
    
    return x_train, x_test, y_train, y_test 

In [5]:
x_train, x_test, y_train, y_test = preprocessing_input(data)
x_train

In [6]:
y_train

# **Training a model with the train split**


In [7]:
model = RandomForestClassifier(random_state = 1)
model.fit(x_train, y_train)

# **Accuracy check for the trained model**

In [8]:
accuracy = model.score(x_test, y_test)
print(accuracy)

# **Feature Impact using SHAP values, This basically gives the most prominent features that are responsible for the change in the output**

In [9]:
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(x_test)
shap.summary_plot(shap_values, x_test, class_names = model.classes_)

Here we can infer that MQ5 is affected the max with the change in values of 2 and 4.

So incase if we drop down MQ5, the accuracy will prolly go down


In [25]:
model = RandomForestClassifier(random_state = 1)
model.fit(x_train.drop('MQ5', axis = 1), y_train)
acc = model.score(x_test.drop('MQ5' ,axis = 1), y_test)
print(acc)

In [26]:
model = RandomForestClassifier(random_state = 1)
model.fit(x_train.drop('MQ4', axis = 1), y_train)
acc = model.score(x_test.drop('MQ4' ,axis = 1), y_test)
print(acc)

In [27]:
model = RandomForestClassifier(random_state = 1)
model.fit(x_train.drop('MQ3', axis = 1), y_train)
acc = model.score(x_test.drop('MQ3' ,axis = 1), y_test)
print(acc)

In [28]:
model = RandomForestClassifier(random_state = 1)
model.fit(x_train.drop('MQ2', axis = 1), y_train)
acc = model.score(x_test.drop('MQ2' ,axis = 1), y_test)
print(acc)

In [29]:
model = RandomForestClassifier(random_state = 1)
model.fit(x_train.drop('MQ1', axis = 1), y_train)
acc = model.score(x_test.drop('MQ1' ,axis = 1), y_test)
print(acc)

In [30]:
model = RandomForestClassifier(random_state = 1)
model.fit(x_train.drop('MQ5', axis = 1), y_train)
acc = model.score(x_test.drop('MQ5' ,axis = 1), y_test)
print(acc)