<a href="https://colab.research.google.com/github/chychur/SVM_algoritm/blob/main/SVM_Algoritm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## SVM Algoritm

According to the accelerometer data from the mobile phone, we classify what kind of activity a person is engaged in: walking, standing, running or climbing stairs. You can find the dataset at [link](https://drive.google.com/file/d/1nzrtQpfaHL0OgJ_eXzA7VuEj7XotrSWO/view "Accelerometer data").

We use SVM algorithms and random forest from the `scikit-learn` library. As characteristics, we will choose indicators from the accelerometer, but in order to improve the performance of the algorithms, it is first necessary to prepare the dataset and calculate the time domain features.

Compare the performance results of both algorithms on different features and different models.

### 1. Data verification

Our data is downloaded from [archive](https://drive.google.com/file/d/1nzrtQpfaHL0OgJ_eXzA7VuEj7XotrSWO/view "Accelerometer data") to Google Drive. Let's import the necessary dependencies and also check their content.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
#presetings

import numpy as np
import pandas as pd
import pickle

from os import listdir
from os.path import join
from sklearn import model_selection, svm
from sklearn.metrics import mean_absolute_error, mean_squared_error, accuracy_score, classification_report
from sklearn.ensemble import RandomForestClassifier


URL = "/content/drive/MyDrive/Colab Notebooks/data"


In [3]:
activities = listdir(URL)
activities

['stairs', 'walking', 'running', 'idle']

In [4]:
for act in activities:
    path = join(URL, act)
    frames = listdir(path)
    print(f"{act}: {len(frames)}")

stairs: 165
walking: 1850
running: 3438
idle: 1039


In [5]:
frms = listdir(join(URL, "stairs"))
frame = pd.read_csv(join(join(URL, "stairs"), frms[2]))
frame.head(3)

Unnamed: 0,accelerometer_X,accelerometer_Y,accelerometer_Z
0,3.289633,-7.469909,-0.043096
1,-1.651999,-17.535133,-2.250549
2,-6.320692,-6.196194,1.896208


### 2. Data preparating

In [6]:
def get_stat_features(frame):
    features = np.array([])
    features = np.concatenate((features, frame.skew(axis=0).values), axis=0)
    features = np.concatenate((features, frame.kurt(axis=0).values), axis=0)
    features = np.concatenate((features, frame.max(axis=0).values), axis=0)
    features = np.concatenate((features, frame.min(axis=0).values), axis=0)
    features = np.concatenate((features, frame.mean(axis=0).values), axis=0)
    features = np.concatenate((features, frame.std(axis=0).values), axis=0)
    features = np.concatenate((features, frame.var(axis=0).values), axis=0)
    features = np.concatenate((features, frame.median(axis=0).values), axis=0)
    features = np.concatenate((features, frame.idxmax(axis=0).values), axis=0)
    features = np.concatenate((features, frame.idxmin(axis=0).values), axis=0)
    correlations = frame.corr()
    corr = np.array([correlations['accelerometer_X']['accelerometer_Y'], correlations['accelerometer_X']['accelerometer_Z'], correlations['accelerometer_Y']['accelerometer_Z']])
    features = np.concatenate((features, corr), axis=0)

    frame['mean_X'] = frame.mean(axis=0)['accelerometer_X']
    frame['mean_Y'] = frame.mean(axis=0)['accelerometer_Y']
    frame['mean_Z'] = frame.mean(axis=0)['accelerometer_Z']

    mae_X = mean_absolute_error(frame['accelerometer_X'], frame['mean_X'])
    mae_Y = mean_absolute_error(frame['accelerometer_Y'], frame['mean_Y'])
    mae_Z = mean_absolute_error(frame['accelerometer_Z'], frame['mean_Z'])

    rmse_x = np.sqrt(mean_squared_error(frame['accelerometer_X'], frame['mean_X']))
    rmse_y = np.sqrt(mean_squared_error(frame['accelerometer_Y'], frame['mean_Y']))
    rmse_z = np.sqrt(mean_squared_error(frame['accelerometer_Z'], frame['mean_Z']))

    metrics = np.array([mae_X, mae_Y, mae_Z, rmse_x, rmse_y, rmse_z])
    features = np.concatenate((features, metrics), axis=0)

    return features

In [7]:
len(get_stat_features(frame))

39

### 3. Discover calculation of features

In [8]:
new_frame = frame

In [9]:
new_frame['mean_X'] = frame.mean(axis=0)['accelerometer_X']
new_frame['mean_Y'] = frame.mean(axis=0)['accelerometer_Y']
new_frame['mean_Z'] = frame.mean(axis=0)['accelerometer_Z']

new_frame.head(3)

Unnamed: 0,accelerometer_X,accelerometer_Y,accelerometer_Z,mean_X,mean_Y,mean_Z
0,3.289633,-7.469909,-0.043096,2.366429,-9.539776,-0.848824
1,-1.651999,-17.535133,-2.250549,2.366429,-9.539776,-0.848824
2,-6.320692,-6.196194,1.896208,2.366429,-9.539776,-0.848824


In [10]:
mean_absolute_error(frame['accelerometer_X'], new_frame['mean_X'])

3.8835867733333336

In [11]:
np.sqrt(mean_squared_error(frame['accelerometer_X'], new_frame['mean_X']))

5.069736126099421

### 4. Prepare one class

In [15]:
def class_data_stat_prepare(class_name, class_number):
    path = join(URL, class_name)
    X = []
    for item in listdir(path):
        frame = pd.read_csv(join(path, item))
        features = get_stat_features(frame)
        X.append(features)

    y = [class_number]*len(X)

    X = np.array(X)
    y = np.array(y)

    return X, y

### 5. Data-set creation

In [16]:
def create_dataset(class_prepare):
    X_idle, y_idle = class_prepare('idle', 0)
    X_walking, y_walking = class_prepare('walking', 1)
    X_stairs, y_stairs = class_prepare('stairs', 2)
    X_running, y_running = class_prepare('running', 3)

    X = np.concatenate((X_idle, X_walking), axis=0)
    X = np.concatenate((X, X_stairs), axis=0)
    X = np.concatenate((X, X_running), axis=0)

    Y = np.concatenate((y_idle, y_walking), axis=0)
    Y = np.concatenate((Y, y_stairs), axis=0)
    Y = np.concatenate((Y, y_running), axis=0)

    return X, Y

In [17]:
X, y = create_dataset(class_data_stat_prepare)

In [18]:
def save_data(prefix, X, y):
    with open(f'{prefix}_X.pickle', 'wb') as f:
        pickle.dump(X, f)
    with open(f'{prefix}_y.pickle', 'wb') as f:
        pickle.dump(y, f)

In [19]:
save_data('data', X, y)

In [20]:
def load_data(prefix):
    with open(f'{prefix}_X.pickle', 'rb') as f:
        X = pickle.load(f)
    with open(f'{prefix}_y.pickle', 'rb') as f:
        y = pickle.load(f)
    return X, y

In [21]:
X, y = load_data('data')

In [22]:
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, train_size=0.3)

### 6. Two algoritms comparison (SVM Classifier VS RandomForest Classifier)

In [23]:
cls_ovo = svm.SVC(decision_function_shape='ovo', kernel='rbf', gamma=0.005, probability=True).fit(X_train, y_train)
cls_ovr = svm.SVC(decision_function_shape='ovr', kernel='rbf', gamma=0.005, probability=True).fit(X_train, y_train)

cls_forest = RandomForestClassifier().fit(X_train, y_train)

In [24]:
svm_ovo_pred = cls_ovo.predict(X_test)
svm_ovr_pred = cls_ovr.predict(X_test)

forest_pred = cls_forest.predict(X_test)

In [25]:
svm_ovo_accuracy = accuracy_score(y_test, svm_ovo_pred)
svm_ovr_accuracy = accuracy_score(y_test, svm_ovr_pred)

forest_ovr_accuracy = accuracy_score(y_test, forest_pred)

print("accuracy SVM ovo: ", svm_ovo_accuracy)
print("accuracy SVM ovr: ", svm_ovr_accuracy)
print("accuracy RandomForest: ", forest_ovr_accuracy)

accuracy SVM ovo:  0.9020902090209021
accuracy SVM ovr:  0.9020902090209021
accuracy RandomForest:  0.9922992299229924


In [26]:
svm_ovo_report = classification_report(y_test, svm_ovo_pred)
print("SVM ovo report: ")
print(svm_ovo_report)

svm_ovr_report = classification_report(y_test, svm_ovr_pred)
print("SVM ovr report: ")
print(svm_ovr_report)

forest_report = classification_report(y_test, forest_pred)
print("RandomForest report: ")
print(forest_report)

SVM ovo report: 
              precision    recall  f1-score   support

           0       1.00      0.86      0.93       715
           1       0.98      0.80      0.88      1269
           2       1.00      0.25      0.40       125
           3       0.85      1.00      0.92      2436

    accuracy                           0.90      4545
   macro avg       0.96      0.73      0.78      4545
weighted avg       0.91      0.90      0.90      4545

SVM ovr report: 
              precision    recall  f1-score   support

           0       1.00      0.86      0.93       715
           1       0.98      0.80      0.88      1269
           2       1.00      0.25      0.40       125
           3       0.85      1.00      0.92      2436

    accuracy                           0.90      4545
   macro avg       0.96      0.73      0.78      4545
weighted avg       0.91      0.90      0.90      4545

RandomForest report: 
              precision    recall  f1-score   support

           0       