# Classification of driving behavior using acc. data

This notebooks uses the driving behavior dataset from Kaggle and aims to classify the driver's driving based on accelerometer data.

## Libraries

In [89]:
from matplotlib import pyplot as plt
from scipy import integrate


import pandas as pd
import numpy as np

## Preprocessing

In [None]:
# loading data
df = pd.read_csv('datasets/train_motion_data.csv')

df.head()

In [None]:
# showing info
df.info()

In [None]:
# basic statistics
df.describe()

In [None]:
# Acceleration vs Time
plt.plot(df['Timestamp'], df['AccX'])
plt.plot(df['Timestamp'], df['AccY'])
plt.plot(df['Timestamp'], df['AccZ'])

plt.legend(['X', 'Y', 'Z'])
plt.title('Acceleration vs Time')
plt.show()

The dataset is clean with no null values. Becuase we are working with time-dependent physiscal measurements, we will not be removing the outliers as they carry important information regarding the driving at that moment in time.

From the plot, we can identify 4 distinct measurements and will analyse them separately.

We will divide the dataset into its 4 independent time frames

In [None]:
# verifying time stops
(df[['Timestamp']].diff() > 1).sum()

Three time stops indicate 4 time slots

In [None]:
# get indeces for time windows
time_index = df[(df[['Timestamp']].diff() > 1)['Timestamp']].index
time_index

In [None]:
# create new dfs
df_1 = df.iloc[0:time_index[0], :]
df_1.tail()

In [None]:
# create new dfs
df_2 = df.iloc[time_index[0]:time_index[1], :]
df_2.tail()

In [None]:
# create new dfs
df_3 = df.iloc[time_index[1]:time_index[2], :]
df_3.head()

In [None]:
# create new dfs
df_4 = df.iloc[time_index[2]:, :]
df_4.tail()

In [100]:
# velocity XYZ

In [101]:
# speed

## Enriching dataset

We will calculate more physical parameters from the available data such as: acceleration norm, jerk, velocity, etc.

In [None]:
# acceleration norm
dfs = [df_1, df_2, df_3, df_4]

for dataframe in dfs:
    acc = np.sqrt(dataframe.loc[:,'AccX']**2 + dataframe.loc[:,'AccY']**2 + dataframe.loc[:,'AccZ']**2)
    dataframe.loc[:,'Acc'] = acc
    dataframe = dataframe.reindex(sorted(dataframe.columns), axis=1)
    display(dataframe.head())

In [None]:
# jerk
for dataframe in dfs:
    jerkx = np.gradient(dataframe.loc[:,'AccX'])
    jerky = np.gradient(dataframe.loc[:,'AccY'])
    jerkz = np.gradient(dataframe.loc[:,'AccZ'])
    dataframe.loc[:,'JerkX'] = jerkx
    dataframe.loc[:,'JerkY'] = jerky
    dataframe.loc[:,'JerkZ'] = jerkz
    dataframe.loc[:,'Jerk'] = np.sqrt(dataframe.loc[:,'JerkX']**2 + dataframe.loc[:,'JerkY']**2 + dataframe.loc[:,'JerkZ']**2)
    dataframe = dataframe.reindex(sorted(dataframe.columns), axis=1)
    display(dataframe.head())

In [None]:
#velocity
for dataframe in dfs:
    vx = integrate.cumulative_trapezoid(dataframe.loc[:,'AccX'], initial=0)
    vy = integrate.cumulative_trapezoid(dataframe.loc[:,'AccY'], initial=0)
    vz = integrate.cumulative_trapezoid(dataframe.loc[:,'AccZ'], initial=0)
    dataframe.loc[:,'VelX'] = vx
    dataframe.loc[:,'VelY'] = vy
    dataframe.loc[:,'VelZ'] = vz
    dataframe.loc[:,'Vel'] = np.sqrt(dataframe.loc[:,'VelX']**2 + dataframe.loc[:,'VelY']**2 + dataframe.loc[:,'VelZ']**2)
    dataframe = dataframe.reindex(sorted(dataframe.columns), axis=1)
    display(dataframe.head())


In [None]:
len(integrate.cumulative_trapezoid(df_1.loc[:,'AccX']))

In [None]:
len(df_1.loc[:,'AccX'])