# Introduction

## Data Set Information

The donation includes 5 datasets, each of them defining a different learning problem:

* LP1: failures in approach to grasp position
* LP2: failures in transfer of a part
* LP3: position of part after a transfer failure
* LP4: failures in approach to ungrasp position
* LP5: failures in motion with part

## Attribute Information

All features are numeric although they are integer valued only. Each feature represents a force or a torque measured after failure detection; each failure instance is characterized in terms of **15 force/torque samples** collected at regular time intervals starting immediately after failure detection. *The total observation window for each failure instance was of 315 ms*.

Each example is described as follows:

**class**\
Fx1 Fy1 Fz1 Tx1 Ty1 Tz1\
Fx2 Fy2 Fz2 Tx2 Ty2 Tz2\
......\
Fx15 Fy15 Fz15 Tx15 Ty15 Tz15

where Fx1 ... Fx15 is the evolution of force Fx in the observation window, the same for Fy, Fz and the torques; there is a **total of 90 features**.

# Imports

In [1]:
# Linear Algebra
import numpy as np

# Data Processing
import pandas as pd
from glob import glob
import re

# Data Visualization
import seaborn as sns
import matplotlib.pyplot as plt
import missingno as msno

# Algorithms
from pycaret.classification import *

# Stats
from scipy import stats

from data_processing import process_data

# Set random seed for reproducibility
np.random.seed(0)
sns.set(style="whitegrid")  # Stylises graphs

In [2]:
import sys
import matplotlib
import scipy
import sklearn

print(f"Python version: {sys.version}")
print(f"pandas version: {pd.__version__}")
print(f"matplotlib version: {matplotlib.__version__}")
print(f"NumPy version: {np.__version__}")
print(f"SciPy version: {scipy.__version__}") 
print(f"scikit-learn version: {sklearn.__version__}")
print(f"Seaborn version: {sns.__version__}")

Python version: 3.7.7 (tags/v3.7.7:d7c567b08f, Mar 10 2020, 10:41:24) [MSC v.1900 64 bit (AMD64)]
pandas version: 1.1.0
matplotlib version: 3.3.0
NumPy version: 1.18.5
SciPy version: 1.5.4
scikit-learn version: 0.23.2
Seaborn version: 0.10.1


# Reading in the Data

The data takes the form:
```
normal
	-1	-1	63	-3	-1	0
	0	0	62	-3	-1	0
	-1	-1	61	-3	0	0
	-1	-1	63	-2	-1	0
	-1	-1	63	-3	-1	0
	-1	-1	63	-3	-1	0
	-1	-1	63	-3	0	0
	-1	-1	63	-3	-1	0
	-1	-1	63	-3	-1	0
	-1	-1	61	-3	0	0
	-1	-1	61	-3	0	0
	-1	-1	64	-3	-1	0
	-1	-1	64	-3	-1	0
	-1	-1	60	-3	0	0
	-1	0	64	-2	-1	0


normal
	-1	-1	63	-2	-1	0
	-1	-1	63	-3	-1	0
	-1	-1	61	-3	0	0
	0	-4	63	1	0	0
	0	-1	59	-2	0	-1
	-3	3	57	-8	-3	-1
	-1	3	70	-10	-2	-1
	0	-3	61	0	0	0
	0	-2	53	-1	-2	0
	0	-3	66	1	4	0
	-3	3	58	-10	-5	0
	-1	-1	66	-4	-2	0
	-1	-2	67	-3	-1	0
	0	1	66	-6	-3	-1
	-1	-1	59	-3	-4	0
```

In [29]:
dfs = []

for file in glob('./Dataset/*.data'):
    dfs.append(process_data(file))

In [17]:
df_LP1 = dfs[0]
df_LP2 = dfs[1]
df_LP3 = dfs[2]
df_LP4 = dfs[3]
df_LP5 = dfs[4]
del dfs

In [6]:
windows = ''

# for file in glob('./Dataset/*.data'):
with open('./Dataset/lp1.data') as f:
    windows += f.read()

windows[:50]

'normal\n\t-1\t-1\t63\t-3\t-1\t0\n\t0\t0\t62\t-3\t-1\t0\n\t-1\t-1\t61'

In [7]:
windows = windows.split('\n\n')
print(f'Number of windows: {len(windows)}')

Number of windows: 88


In [8]:
windows[0]

'normal\n\t-1\t-1\t63\t-3\t-1\t0\n\t0\t0\t62\t-3\t-1\t0\n\t-1\t-1\t61\t-3\t0\t0\n\t-1\t-1\t63\t-2\t-1\t0\n\t-1\t-1\t63\t-3\t-1\t0\n\t-1\t-1\t63\t-3\t-1\t0\n\t-1\t-1\t63\t-3\t0\t0\n\t-1\t-1\t63\t-3\t-1\t0\n\t-1\t-1\t63\t-3\t-1\t0\n\t-1\t-1\t61\t-3\t0\t0\n\t-1\t-1\t61\t-3\t0\t0\n\t-1\t-1\t64\t-3\t-1\t0\n\t-1\t-1\t64\t-3\t-1\t0\n\t-1\t-1\t60\t-3\t0\t0\n\t-1\t0\t64\t-2\t-1\t0'

In [9]:
data = {
    'classification': [],
    'window_data': []
}

for window in windows:
    classification = re.findall('^\n?([a-zA-z]+)\n', window)[0]
    window = window.replace(classification, '')
    window = [line.split('\t') for line in window.split('\n') if line != '']
    window = np.ravel(window)
    window = np.array([v for v in window if v != ''])
    window = window.astype(int)
    
    data['classification'].append(classification)
    data['window_data'].append(window)

In [10]:
column_names = [
    [f'Fx{i}', f'Fy{i}', f'Fz{i}', f'Tx{i}', f'Ty{i}', f'Tz{i}']
    for i in range(15)
]
column_names = np.ravel(column_names)
column_names = np.append(column_names, ['label'])

data = [
    np.append([window_data][0], [classification][0])
    for classification, window_data in zip(data['classification'], data['window_data'])
]

In [11]:
df = pd.DataFrame(
    data,
    columns=column_names
)

In [12]:
df.head()

Unnamed: 0,Fx0,Fy0,Fz0,Tx0,Ty0,Tz0,Fx1,Fy1,Fz1,Tx1,Ty1,Tz1,Fx2,Fy2,Fz2,Tx2,Ty2,Tz2,Fx3,Fy3,Fz3,Tx3,Ty3,Tz3,Fx4,Fy4,Fz4,Tx4,Ty4,Tz4,Fx5,Fy5,Fz5,Tx5,Ty5,Tz5,Fx6,Fy6,Fz6,Tx6,Ty6,Tz6,Fx7,Fy7,Fz7,Tx7,Ty7,Tz7,Fx8,Fy8,Fz8,Tx8,Ty8,Tz8,Fx9,Fy9,Fz9,Tx9,Ty9,Tz9,Fx10,Fy10,Fz10,Tx10,Ty10,Tz10,Fx11,Fy11,Fz11,Tx11,Ty11,Tz11,Fx12,Fy12,Fz12,Tx12,Ty12,Tz12,Fx13,Fy13,Fz13,Tx13,Ty13,Tz13,Fx14,Fy14,Fz14,Tx14,Ty14,Tz14,label
0,-1,-1,63,-3,-1,0,0,0,62,-3,-1,0,-1,-1,61,-3,0,0,-1,-1,63,-2,-1,0,-1,-1,63,-3,-1,0,-1,-1,63,-3,-1,0,-1,-1,63,-3,0,0,-1,-1,63,-3,-1,0,-1,-1,63,-3,-1,0,-1,-1,61,-3,0,0,-1,-1,61,-3,0,0,-1,-1,64,-3,-1,0,-1,-1,64,-3,-1,0,-1,-1,60,-3,0,0,-1,0,64,-2,-1,0,normal
1,-1,-1,63,-2,-1,0,-1,-1,63,-3,-1,0,-1,-1,61,-3,0,0,0,-4,63,1,0,0,0,-1,59,-2,0,-1,-3,3,57,-8,-3,-1,-1,3,70,-10,-2,-1,0,-3,61,0,0,0,0,-2,53,-1,-2,0,0,-3,66,1,4,0,-3,3,58,-10,-5,0,-1,-1,66,-4,-2,0,-1,-2,67,-3,-1,0,0,1,66,-6,-3,-1,-1,-1,59,-3,-4,0,normal
2,-1,0,57,-5,-3,0,0,-3,63,-1,0,0,-1,1,51,-4,-1,-1,-1,-2,68,-2,-2,0,-1,-1,65,-6,1,0,0,0,61,-5,-2,0,-1,1,61,-6,0,-1,0,-3,57,3,-4,0,-1,-1,59,-4,-4,0,1,-3,65,-1,1,0,-1,2,64,-7,-2,0,-1,1,66,-7,-3,-1,-1,0,61,-5,-5,0,-1,0,65,-6,-2,-1,-1,0,54,-4,-3,0,normal
3,0,-1,59,-2,-1,-1,0,-3,61,-1,2,0,-2,1,56,-6,-3,0,1,-3,64,-1,4,0,-1,1,62,-7,1,-1,-1,0,60,-9,-5,-1,1,1,56,-5,0,0,1,-1,66,-4,2,1,-2,5,64,-15,-2,0,-1,2,58,-8,-4,0,0,1,70,-9,-2,-1,-1,1,64,-8,-6,-1,0,-1,67,-6,0,-1,0,-2,63,-4,0,0,-1,1,63,-8,-2,0,normal
4,0,-2,65,-4,-2,0,-1,-2,56,-5,-3,0,0,0,58,-9,-1,0,-1,-1,56,-5,-3,0,-2,3,57,-12,-4,-1,-1,-2,65,-5,-2,0,-1,2,56,-9,-5,0,2,-2,60,-2,3,1,0,1,67,-9,-2,1,-1,2,60,-10,-5,0,0,-3,63,-3,-1,0,-1,-1,73,-8,-5,0,-1,0,57,-7,-4,-1,-1,0,59,-8,-4,-1,-1,1,57,-9,-4,-1,normal


# Data Cleaning

## Null Values

In [13]:
print(f'Number of missing values: {sum(df[df.columns].isnull().sum())}')

Number of missing values: 0


## Duplicates

In [14]:
print(f"Duplicates: {df.duplicated().sum()}")

Duplicates: 0


# EDA

In [15]:
print(f"Types of errors: {list(df['label'].unique())}")

Types of errors: ['normal', 'collision', 'obstruction', 'fr_collision']


# Pycaret

In [16]:
STOP YOU WANKER

SyntaxError: invalid syntax (<ipython-input-16-563996c940ff>, line 1)

In [None]:
clf = setup(data=df, target='label', fold=2)

In [None]:
best = compare_models()

# XGBoost

In [None]:
xgb = create_model('xgboost')

In [None]:
tuned_xgb = tune_model(xgb)

In [None]:
plot_model(tuned_xgb, plot='confusion_matrix')

In [None]:
plot_model(tuned_xgb, plot='class_report')

In [None]:
plot_model(tuned_xgb, plot='error')

## Predict on Test / Hold-out Sample

In [None]:
predict_model(tuned_xgb)