# Help Navigate Robots Kaggle Competition
by Michael Cascio

![](floor-robot.png)

The purpose of this competition was to predict what floor type a robot is traveling on using output from 10 different sensors on the robot. Measurements such as angular velocity, angular acceleration, and robot orientation are reported in each axis. Per the Kaggle competition data summary, the orientation measurements are provided in quaternions and thus must be converted into Euler angles.

For this competition I decided to use the [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) model, part of sklearn.ensemble, and cross validate my results using [StratifiedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html).

### Importing Required Packages

In [93]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold

### Loading Datasets

In [94]:
X_train = pd.read_csv('data/X_train.csv')
y_train = pd.read_csv('data/y_train.csv')
X_test = pd.read_csv('data/X_test.csv')

In [95]:
X_train.head()

Unnamed: 0,row_id,series_id,measurement_number,orientation_X,orientation_Y,orientation_Z,orientation_W,angular_velocity_X,angular_velocity_Y,angular_velocity_Z,linear_acceleration_X,linear_acceleration_Y,linear_acceleration_Z
0,0_0,0,0,-0.75853,-0.63435,-0.10488,-0.10597,0.10765,0.017561,0.000767,-0.74857,2.103,-9.7532
1,0_1,0,1,-0.75853,-0.63434,-0.1049,-0.106,0.067851,0.029939,0.003385,0.33995,1.5064,-9.4128
2,0_2,0,2,-0.75853,-0.63435,-0.10492,-0.10597,0.007275,0.028934,-0.005978,-0.26429,1.5922,-8.7267
3,0_3,0,3,-0.75852,-0.63436,-0.10495,-0.10597,-0.013053,0.019448,-0.008974,0.42684,1.0993,-10.096
4,0_4,0,4,-0.75852,-0.63435,-0.10495,-0.10596,0.005135,0.007652,0.005245,-0.50969,1.4689,-10.441


In [96]:
y_train.head()

Unnamed: 0,series_id,group_id,surface
0,0,13,fine_concrete
1,1,31,concrete
2,2,20,concrete
3,3,31,concrete
4,4,22,soft_tiles


### Feature Generation
Custom functions are created to transform the input dataframes to include a more robust set of descriptive features describing the robot's movement. The quaternion_to_euler formula was found [here](https://stackoverflow.com/questions/53033620/how-to-convert-euler-angles-to-quaternions-and-get-the-same-euler-angles-back-fr?rq=1). Total angular velocity and acceleration are calculated by the root-mean-square of each component angular velocity and acceleration. Statistical properties of readings are calculated by 'series id', denoting the measurement series each measurement is a part of. 

In [101]:
from scipy.stats import kurtosis
from scipy.stats import skew
def quaternion_to_euler(x, y, z, w):
    import math
    t0 = +2.0 * (w * x + y * z)
    t1 = +1.0 - 2.0 * (x * x + y * y)
    X = math.atan2(t0, t1)

    t2 = +2.0 * (w * y - z * x)
    t2 = +1.0 if t2 > +1.0 else t2
    t2 = -1.0 if t2 < -1.0 else t2
    Y = math.asin(t2)

    t3 = +2.0 * (w * z + x * y)
    t4 = +1.0 - 2.0 * (y * y + z * z)
    Z = math.atan2(t3, t4)

    return X, Y, Z

def generate_features(data):
    new_data = pd.DataFrame()
    data['total_angular_velocity'] = (data['angular_velocity_X'] ** 2 + data['angular_velocity_Y'] ** 2 + data['angular_velocity_Z'] ** 2) ** 0.5
    data['total_linear_acceleration'] = (data['linear_acceleration_X'] ** 2 + data['linear_acceleration_Y'] ** 2 + data['linear_acceleration_Z'] ** 2) ** 0.5
    
    data['acc_vs_vel'] = data['total_linear_acceleration'] / data['total_angular_velocity']
    
    x, y, z, w = data['orientation_X'].tolist(), data['orientation_Y'].tolist(), data['orientation_Z'].tolist(), data['orientation_W'].tolist()
    nx, ny, nz = [], [], []
    for i in range(len(x)):
        xx, yy, zz = quaternion_to_euler(x[i], y[i], z[i], w[i])
        nx.append(xx)
        ny.append(yy)
        nz.append(zz)
    
    data['euler_x'] = nx
    data['euler_y'] = ny
    data['euler_z'] = nz
    
    data['total_angle'] = (data['euler_x'] ** 2 + data['euler_y'] ** 2 + data['euler_z'] ** 2) ** 5
    data['angle_vs_acc'] = data['total_angle'] / data['total_linear_acceleration']
    data['angle_vs_vel'] = data['total_angle'] / data['total_angular_velocity']
    
    def mean_change_of_abs_change(x):
        return np.mean(np.diff(np.abs(np.diff(x))))

    def mean_abs_change(x):
        return np.mean(np.abs(np.diff(x)))
    
    for col in data.columns:
        if col in ['row_id', 'series_id', 'measurement_number']:
            continue
        new_data[col + '_mean'] = data.groupby(['series_id'])[col].mean()
        new_data[col + '_min'] = data.groupby(['series_id'])[col].min()
        new_data[col + '_max'] = data.groupby(['series_id'])[col].max()
        new_data[col + '_std'] = data.groupby(['series_id'])[col].std()
        new_data[col + '_max_to_min'] = new_data[col + '_max'] / new_data[col + '_min']
        new_data[col + '_kurtosis'] = data.groupby('series_id')[col].apply(lambda x: kurtosis(x))
        new_data[col + '_skew'] = data.groupby('series_id')[col].apply(lambda x: skew(x))
        
        # 1st order derivative
        new_data[col + '_mean_abs_change'] = data.groupby('series_id')[col].apply(mean_abs_change)
        
        # 2nd order derivative
        new_data[col + '_mean_change_of_abs_change'] = data.groupby('series_id')[col].apply(mean_change_of_abs_change)
        
        new_data[col + '_abs_max'] = data.groupby('series_id')[col].apply(lambda x: np.max(np.abs(x)))
        new_data[col + '_abs_min'] = data.groupby('series_id')[col].apply(lambda x: np.min(np.abs(x)))

    return new_data

In [102]:
X_train = generate_features(X_train)
X_test = generate_features(X_test)

In [103]:
X_train.iloc[:]

Unnamed: 0_level_0,orientation_X_mean,orientation_X_min,orientation_X_max,orientation_X_std,orientation_X_max_to_min,orientation_X_kurtosis,orientation_X_skew,orientation_X_mean_abs_change,orientation_X_mean_change_of_abs_change,orientation_X_abs_max,...,angle_vs_vel_min,angle_vs_vel_max,angle_vs_vel_std,angle_vs_vel_max_to_min,angle_vs_vel_kurtosis,angle_vs_vel_skew,angle_vs_vel_mean_abs_change,angle_vs_vel_mean_change_of_abs_change,angle_vs_vel_abs_max,angle_vs_vel_abs_min
series_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,-0.758666,-0.759530,-0.758220,0.000363,0.998275,-0.667740,-0.651333,0.000015,2.380952e-07,0.759530,...,6.255962e+05,9.597909e+06,1.409996e+06,15.342019,9.255191,2.551964,1.198969e+06,8.924001e+03,9.597909e+06,6.255962e+05
1,-0.958606,-0.958960,-0.958370,0.000151,0.999385,-0.664664,-0.392618,0.000023,-4.761905e-07,0.958960,...,1.377975e+05,1.957088e+06,2.979271e+05,14.202636,8.430262,2.545027,2.049951e+05,5.691644e+02,1.957088e+06,1.377975e+05
2,-0.512057,-0.514340,-0.509440,0.001377,0.990473,-1.058422,0.150184,0.000041,0.000000e+00,0.514340,...,1.674839e+06,2.866679e+07,3.802655e+06,17.116144,11.068951,2.782992,2.355426e+06,-9.135272e+03,2.866679e+07,1.674839e+06
3,-0.939169,-0.939680,-0.938840,0.000227,0.999106,-1.082946,-0.094976,0.000026,-6.349206e-07,0.939680,...,8.652378e+04,3.121275e+06,4.627398e+05,36.074189,14.109422,3.367755,2.981020e+05,1.741240e+04,3.121275e+06,8.652378e+04
4,-0.891301,-0.896890,-0.886730,0.002955,0.988672,-1.167403,-0.224035,0.000080,7.936508e-08,0.896890,...,2.330043e+05,6.165090e+05,9.199986e+04,2.645912,-0.316085,0.679229,2.281891e+04,-7.079317e+01,6.165090e+05,2.330043e+05
5,0.464712,0.464030,0.465450,0.000315,1.003060,-0.560020,0.163822,0.000045,4.761905e-07,0.465450,...,1.203089e+06,4.362511e+07,4.101916e+06,36.260923,65.857368,7.232989,2.192300e+06,-1.406523e+04,4.362511e+07,1.203089e+06
6,-0.402356,-0.405750,-0.398560,0.002120,0.982280,-1.203271,0.085438,0.000057,3.968254e-07,0.405750,...,4.454190e+06,1.872495e+07,2.471369e+06,4.203895,6.741335,2.278149,1.324621e+06,-2.963461e+04,1.872495e+07,4.454190e+06
7,-0.925909,-0.926190,-0.925590,0.000136,0.999352,-0.288917,0.293780,0.000026,-7.936508e-08,0.926190,...,9.303946e+04,1.090929e+06,2.168889e+05,11.725439,0.429783,0.862106,1.326502e+05,-6.212771e+02,1.090929e+06,9.303946e+04
8,0.012041,0.009159,0.018242,0.002931,1.991746,-0.949675,0.677614,0.000073,-6.333333e-07,0.018242,...,1.552970e+07,2.834830e+08,4.900011e+07,18.254243,8.263136,2.732803,1.184448e+07,2.278509e+04,2.834830e+08,1.552970e+07
9,0.789137,0.751010,0.824630,0.021568,1.098028,-1.202663,-0.078165,0.000580,7.936508e-07,0.824630,...,8.760919e+04,1.400625e+05,1.114292e+04,1.598719,-0.383982,0.242472,4.764647e+03,-2.193543e+01,1.400625e+05,8.760919e+04


In [104]:
y_train.head()

Unnamed: 0,series_id,group_id,surface
0,0,13,fine_concrete
1,1,31,concrete
2,2,20,concrete
3,3,31,concrete
4,4,22,soft_tiles


### Model Training and Cross Validation
Starting by encoding the categorical y_train 'surface' types.

In [105]:
label_encoder = LabelEncoder()
y_train['surface'] = label_encoder.fit_transform(y_train['surface'])

StratifiedKFold cross validation is specified.

In [106]:
folds = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)

Creating 'out of fold' and submission prediction placeholders and running the RandomForestClassifier for each KFold.

In [107]:
submission_predictions = np.zeros((X_test.shape[0], 9))
oof_predictions = np.zeros((X_train.shape[0]))
score = 0
for fold_, (trn_idx, val_idx) in enumerate(folds.split(X_train, y_train['surface'])):
    clf =  RandomForestClassifier(n_estimators = 1000, n_jobs = -1)
    clf.fit(X_train.iloc[trn_idx], y_train['surface'][trn_idx])
    oof_predictions[val_idx] = clf.predict(X_train.iloc[val_idx])
    submission_predictions += clf.predict_proba(X_test) / folds.n_splits
    score += clf.score(X_train.iloc[val_idx], y_train['surface'][val_idx])
    print('Fold: {} score: {}'.format(fold_,clf.score(X_train.iloc[val_idx], y_train['surface'][val_idx])))
print('Avg Accuracy', score / folds.n_splits)

Fold: 0 score: 0.9324675324675324
Fold: 1 score: 0.8958333333333334
Fold: 2 score: 0.9190600522193212
Fold: 3 score: 0.9267015706806283
Fold: 4 score: 0.9081364829396326
Fold: 5 score: 0.910761154855643
Fold: 6 score: 0.905511811023622
Fold: 7 score: 0.8839050131926122
Fold: 8 score: 0.8783068783068783
Fold: 9 score: 0.901595744680851
Avg Accuracy 0.9062279573700055


### Submission Finalization
For the final deliverable, the encoded surface labels need to be transformed back to the original categorical values.

In [109]:
submission = pd.read_csv('data/sample_submission.csv')
submission['surface'] = label_encoder.inverse_transform(submission_predictions.argmax(axis=1))
submission.to_csv('submission.csv', index=False)
submission.head()

Unnamed: 0,series_id,surface
0,0,hard_tiles_large_space
1,1,concrete
2,2,tiled
3,3,carpet
4,4,soft_tiles
