# PathPilot ML training
##### Author: [Joseph Selva Raj]

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import xgboost as xgb
from sklearn.metrics import accuracy_score, classification_report
from micromlgen import port

### Loading the data

Load the LIDAR measurement data from txt file

In [2]:
data = pd.read_csv('C:\\Users\\josep\\Documents\\Github Repo\\PathPilot\\PathPilot\\MASTERDATA360.TXT', header=None)
print(data.head())

   0    1    2    3    4    5    6    7    8    9    ...  351  352  353  354  \
0  204  333    0  204    0  205    0    0  206    0  ...    0  206  205  204   
1  204    0    0  204    0  205  205  207  207    0  ...    0  206    0    0   
2    0  204    0  204    0    0  205    0  206    0  ...    0    0  205    0   
3  204  204  204  204    0  205  205    0  206    0  ...    0  206  205    0   
4    0  204    0    0  204    0    0  205    0    0  ...  206  205    0    0   

   355  356  357  358  359  360  
0    0  204  204    0  204    D  
1  204    0  204    0    0    D  
2  204    0  204    0    0    D  
3  204    0  204    0    0    D  
4    0  204    0    0  204    D  

[5 rows x 361 columns]


### Data cleaning

Rename the last column as "label" and clean the data by eliminating all data strings that are not annotated with "Forward" command labels.

The processed data should only contain the LIDAR measurements and the corresponding command labels:
- F - forward
- R - forward right
- L - forward left

In [3]:
data.rename(columns={data.columns[-1]: 'Label'}, inplace=True)
print(f"Label counts before cleaning the data: \n {data['Label'].value_counts()}")
data = data[data['Label'].isin(['F', 'L', 'R'])]
data.reset_index(drop=True, inplace=True)
print(f"Label counts after cleaning the data: \n {data['Label'].value_counts()}")

Label counts before cleaning the data: 
 Label
F    2029
R    1516
L     263
s      29
D      10
b       4
r       1
Name: count, dtype: int64
Label counts after cleaning the data: 
 Label
F    2029
R    1516
L     263
Name: count, dtype: int64


### Spilt data into train and test sets
Separate X and Y as the input and output data and divide them into train and test sets with train_test_split. 
Label encoder is used to convert the labels from character to number format to interface with the classifier.

In [4]:
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)
label_mapping = dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))
print(f"Label encoding mapping for motor control in Arduino code: {label_mapping}")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Label encoding mapping for motor control in Arduino code: {'F': 0, 'L': 1, 'R': 2}


### Training the model
Training the model is a straightforward process, thanks to all the libraries available in Python. The outcome of the training process depends on the dataset and the preceding steps. Post-training, accuracy will be computed using the test set, and a higher accuracy is desirable.

In [5]:
clf = xgb.XGBClassifier(max_depth=3,random_state=42)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')

class_names = label_encoder.classes_
report = classification_report(y_test, y_pred, target_names=class_names, zero_division=0)
print('Classification Report:\n', report)

Accuracy: 81.10%
Classification Report:
               precision    recall  f1-score   support

           F       0.82      0.83      0.82       401
           L       0.85      0.71      0.78        49
           R       0.80      0.81      0.80       312

    accuracy                           0.81       762
   macro avg       0.82      0.78      0.80       762
weighted avg       0.81      0.81      0.81       762

