### Supervised Learning Approch for Classification

This is the Supervised Learning Approach for Sensor's Data Classification. <br>

Firstly, we are manually labelling the sensor's data whether they are healthy (1) or not-healthy (0). Based on the standard range for each parameter labelling process is automated. <br>

We are using two classical machine learning algorithms called <strong>Support Vector Machine's (SVM)</strong> and <strong>Random Forest Algorithm</strong>.

In [1]:
# All necessary imports

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 

from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split

from scikitplot.metrics import plot_roc_curve

import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

1. Loading Dataset and assigning target labels to these(soil moisture, humidity, temperature) parameters.
2. The next step is to convert these numbers to arrays and assigning them as features and labels

In [2]:
iot = pd.read_csv('cleaned_iot_data2.csv')
soil_moisture = iot['moisture_1'].values
humidity = iot['humidity_1'].values
temperature = iot['temperature_1'].values
labels = []
for i in range(len(humidity)):
    if humidity[i] in range(60, 80) and (soil_moisture[i] in range(64, 72) or temperature[i] in range(18, 23)):
        labels.append(1)
    else:
        labels.append(0)
        
dataset = pd.DataFrame()
dataset['soil_moisture'] = soil_moisture
dataset['humidity'] = humidity
dataset['temperature'] = temperature
dataset['labels'] = labels

# dataset.to_csv('dataset_supervised_learning.csv')

features = np.array(dataset[['soil_moisture', 'humidity', 'temperature']])
labels = np.array(dataset[['labels']])

1. Shuffling the dataset so that predictions does not appear to be biased.
2. Splitting data into train and test set is always a good practice for evaluating model after training

In [3]:
X, Y = shuffle(features, labels)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1)

SVM Classifier for two classes

In [5]:
# Train SVC and plot roc curve
svc = SVC(probability=True)
svc.fit(X_train, Y_train)
score = svc.score(X_test, Y_test)
print('SVC Classifier:',score)
# preds = svc.predict_proba(X_test)
# plot_roc_curve(Y_test, preds)

SVC Classifier: 0.9865092748735245


Random Forest Classifier with 100 Estimators (Decision Trees)

In [6]:
# Train Random Forest and plot roc curve
randomforestclassifier = RandomForestClassifier(n_estimators=100)
randomforestclassifier.fit(X_train, Y_train)
score = randomforestclassifier.score(X_test, Y_test)
print('Random Forest Classifier:', score)
# preds = randomforestclassifier.predict_proba(X_test)
# plot_roc_curve(Y_test, preds)

Random Forest Classifier: 0.9966273187183811


### Results

In [7]:
result = {
    1 : 'Healthy',
    0 : 'Not-Healthy'
}

soil_moisture = 66
humidity = 60 
temperature = 30

svc_prediction = svc.predict(X=[[soil_moisture, humidity, temperature]])
print('Support Vector Classifier Prediction:', result[svc_prediction[0]])

random_forest = randomforestclassifier.predict([[soil_moisture, humidity, temperature]])
print('Random Forest Classifier Prediction:', result[random_forest[0]])

Support Vector Classifier Prediction: Healthy
Random Forest Classifier Prediction: Healthy


In [8]:
soil_moisture = 55
humidity = 80 
temperature = 23

svc_prediction = svc.predict(X=[[soil_moisture, humidity, temperature]])
print('Support Vector Classifier Prediction:', result[svc_prediction[0]])

random_forest = randomforestclassifier.predict([[soil_moisture, humidity, temperature]])
print('Random Forest Classifier Prediction:', result[random_forest[0]])

Support Vector Classifier Prediction: Not-Healthy
Random Forest Classifier Prediction: Not-Healthy
