<div>
<img src='pics/banner.PNG'/>
</div>
<div>
<img width="20%" src='pics/pandas.svg'/>
</div>
<div>
<img width="15%" src='pics/tinlab.png'/>
<strong>EDA Torcs logging - Jeroen Boogaard</strong>
</div>

# Exploratory Data Analysis

## Imports

In [None]:
from pathlib import Path
from sklearn import linear_model
from numpy import asarray
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline     
sns.set(color_codes=True)

### Global variables

In [None]:
csvFile = 'aalborg_2024-05-24.csv'
csvFilePath = Path.cwd().parent.joinpath('csv').joinpath(csvFile)

## Data Collection

In [None]:
# Import data
dfCollected = pd.read_csv(csvFilePath, sep=";", header=0)
dfCollected.round(2)
dfCollected.shape

## Data Inspection

**Display the top 5 rows**

In [None]:
dfCollected.head()

**Display last 5 rows**

In [None]:
dfCollected.tail()

**Show data types**

In [None]:
dfCollected.info()

### Show basic statistics

In [None]:
dfCollected.describe()

## Data Selection

**Drop all rows of the first track**

In [None]:
selection = dfCollected['s_distance_raced'] > dfCollected['s_distance_from_start']
display(selection)
dfSelected = dfCollected.copy()[selection]

## Feature Selection

<p>
    Features:
    <ul>
        <li><b>m</b>eta</li>
        <li><b>s</b>ensor</li>
        <li><b>a</b>ction</li>
    </ul>        
</p>

In [None]:
features = [
    's_speed_x',
    's_speed_y',
    's_rpm',     
    's_gear',  
    's_angle',
    's_track_position',
    's_distance_raced',
    's_plus5_degrees2caraxis',
    's_parallel2caraxis',
    's_min5_degrees2caraxis',
]

dfFeatures = dfSelected.copy().loc[:, features]
dfFeatures.shape

**Combine features**

In [None]:
dfFeatures.shape

In [None]:
caraxisFeatures = [
    's_plus5_degrees2caraxis',
    's_parallel2caraxis',
    's_min5_degrees2caraxis'
]

caraxisMatrix = MinMaxScaler().fit_transform( dfFeatures.copy().loc[:, caraxisFeatures].to_numpy() )
c_caraxis = caraxisMatrix[:, 0] + caraxisMatrix[:, 1] + caraxisMatrix[:, 2]
display(c_caraxis)
c_caraxis.shape

**Likewise, combine rmp and gear**

In [None]:
speedFeatures = [
    's_rpm',
    's_gear',  
]

speedMatrix = MinMaxScaler().fit_transform( dfFeatures.loc[:, speedFeatures].to_numpy() )
c_speed = speedMatrix[:, 0] * speedMatrix[:, 1]
display(c_speed)
c_speed.shape

**Replace related features by the combined feature**

In [None]:
dfReduced = dfFeatures.copy()

idxCarAxisFeatures = dfReduced.columns.get_loc( caraxisFeatures[0] )
dfReduced = dfReduced.drop(caraxisFeatures, axis=1)
dfReduced.insert(idxCarAxisFeatures, "c_caraxis", c_caraxis )

idxSpeedFeatures = dfReduced.columns.get_loc( speedFeatures[0] )
dfReduced = dfReduced.drop(speedFeatures, axis=1)
dfReduced.insert(idxSpeedFeatures, "c_speed", c_speed )

display(dfReduced)
dfReduced.shape

In [None]:
x = dfReduced['s_speed_x']
y = dfReduced['c_speed']
plt.scatter(x, y) 

<p>The scatterplot shows that <i>c_speed</i> makes <i>s_speed_x</i> redundant</p>

In [None]:
dfReduced = dfReduced.drop('s_speed_x', axis=1)
display(dfReduced)

## Data Visualization

**Display correlations**

In [None]:
corFeatures = [
    's_speed_y',
    'c_speed',
    's_angle',
    's_track_position',   
    'c_caraxis',
]

dfCorrelation = dfReduced.copy().loc[:, corFeatures]
corrMatrix = dfCorrelation.corr()
sns.heatmap(corrMatrix, annot=True)