**Outlier Detection**

One-class SVM is a variant of Linear SVM which can be used for outlier detection. One-class SVM trains a model on a single class of data (i.e., the "normal" or "inlier" class) and learns to recognize patterns that are anomalous or unusual.

In this context, the goal is to identify data points that do not conform to the expected pattern, which are considered outliers. One-class SVM can be used to detect outliers in a dataset, even if the outliers are not explicitly labeled.

In [1]:
import pandas as pd
import numpy as np
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

**Load the CSV file into a Pandas DataFrame**

In [2]:
def loadData(csvFile):
    try:
        data = pd.read_csv(csvFile)
        return data
    except Exception as e:
        print(f"Error loading data: {e}")
        return None

csvFile = 'LOG10-20171018_m.csv'
data = loadData(csvFile)

In [3]:
print(data.head())

     ID  macid           Date_time  Acceleration_x  Acceleration_y  \
0  1002    NaN  10/18/2017 0:00:01            1.50            0.89   
1  1024    NaN  10/18/2017 0:00:02            1.54            0.81   
2  1013    NaN  10/18/2017 0:00:02            1.51            0.89   
3  1009    NaN  10/18/2017 0:00:03            1.54            0.84   
4  1008    NaN  10/18/2017 0:00:04            1.47            0.93   

   Acceleration_z  Acceleration_s  Frequency  Amplitude  sound  ...  an  \
0            0.03            1.74       8.99      15.10      0  ...  59   
1            0.04            1.74       8.79      15.80      0  ...  62   
2            0.04            1.75       8.90      15.50      0  ...  59   
3            0.02            1.75       8.86      15.64      0  ...  61   
4            0.04            1.73       8.99      15.02      0  ...  57   

   device_id  node_firm gateway_firm radio_power   res  sen_type  Unnamed: 22  \
0        NaN        NaN          NaN         Na

In [4]:
# Select columns for outlier detection
outlierColumns = ['Frequency', 'Temp_t1', 'Temp_t2', 'Temp_t3']
outlierDf = data[outlierColumns]

In [5]:
# Create a One-class SVM model for outlier detection
ocsvmModel = svm.OneClassSVM(kernel='rbf', gamma=0.1, nu=0.1)

In [6]:
# Fit the model
ocsvmModel.fit(outlierDf)

In [7]:
# Predict outliers
outlierPred = ocsvmModel.predict(outlierDf)

In [8]:
# Identify outliers
outliers = outlierDf[outlierPred == -1]

In [9]:
print(outliers)

       Frequency  Temp_t1  Temp_t2  Temp_t3
1           8.79     23.0     22.3     23.0
6           8.95     23.0     22.2     23.0
11          8.82     22.9     22.5     23.0
16          8.91     22.9     22.2     23.0
21          8.78     23.0     22.4     23.0
...          ...      ...      ...      ...
13241      10.65     24.9     23.8     25.0
13242      22.52     24.9     24.1     25.0
13243      30.08     24.4     23.5     24.0
13244      23.68     24.0     23.7     24.0
13246      43.60     24.9     24.1     25.0

[1368 rows x 4 columns]


In [10]:
# Calculate cutoff values based on outlier detection
cutoffFrequency = np.percentile(outlierDf['Frequency'], 95)
cutoffTempt1 = np.percentile(outlierDf['Temp_t1'], 95)
cutoffTempt2 = np.percentile(outlierDf['Temp_t2'], 95)
cutoffTempt3 = np.percentile(outlierDf['Temp_t3'], 95)

In [11]:
# Create a new column 'fault_status' based on dynamic cutoff values
data['fault_status'] = np.where((data['Frequency'] > cutoffFrequency) | (data['Temp_t1'] > cutoffTempt1) | (data['Temp_t2'] > cutoffTempt2) | (data['Temp_t3'] > cutoffTempt3), 'Faulty', 'Normal')

# Map 'fault_status' to numerical values
data['fault_status'] = data['fault_status'].map({'Normal': 0, 'Faulty': 1})

In [12]:
# Select columns for classification
classificationColumns = ['Frequency', 'Temp_t1', 'Temp_t2', 'Temp_t3', 'fault_status']
classificationDf = data[classificationColumns]

In [13]:
# Split data into training and testing sets
xTrain, xTest, yTrain, yTest = train_test_split(classificationDf.drop('fault_status', axis=1), classificationDf['fault_status'], test_size=0.2, random_state=42)

# Create an SVM classifier
svmModel = svm.SVC(kernel='rbf', C=1)

# Train the model
svmModel.fit(xTrain, yTrain)

In [15]:
# Make predictions
yPred = svmModel.predict(xTest)

# Evaluate the model
accuracy = accuracy_score(yTest, yPred)
print("Accuracy:", accuracy)
print("Classification Report:\n", classification_report(yTest, yPred))

Accuracy: 0.9195526695526696
Classification Report:
               precision    recall  f1-score   support

           0       0.92      1.00      0.96      2538
           1       1.00      0.05      0.09       234

    accuracy                           0.92      2772
   macro avg       0.96      0.52      0.52      2772
weighted avg       0.93      0.92      0.88      2772

