# Classification based on the Second Original Featuring Result

<p><b>Author</b>: Jingze Dai</p>
<p><b>McMaster University</b>, Honors Computer Science (Coop) student</p>
<p><b>Personal Email Address</b>: <a>david1147062956@gmail.com</a>, or <a>dai.jingze@icloud.com</a></p>
<a href="https://github.com/daijingz">Github Homepage</a>
<a href="https://www.linkedin.com/in/jingze-dai/">Linkedin Webpage</a>
<a href="https://leetcode.com/david1147062956/">Leetcode Webpage</a>

<i>The original research's second featuring method selected distinct features from the Recursive Feature Elimination (RFE) implementation. This notebook includes misbehavior classification by using these distinct features.</i>

<i>Your Feedback is important for Jingze's further development. If you want to give feedback and suggestions, or you want to participate in working and learning together, please email Jingze at dai.jingze@icloud.com. If you want Jingze to provide contributions to your research or opensource project or you want Jingze to help you with any programming issues, please email Jingze at david1147062956@gmail.com. Thank you for your help.</i>

## Table of Contents
* [Section 1: Selected Features](#bullet1)
* [Section 2: Extract and Load Datasets](#bullet2)
* [Section 3: Classification (Binary Classification Approach (BCA))](#bullet3)
* [Section 4: Classification (A Multi-class Classification Approach for Three Classes (MCATC))](#bullet4)
* [Section 5: Classification (A Classic Learning Approach for Multi-class classification (C-LAMC))](#bullet5)

### <a class="anchor" id="bullet1"><p><b>Section 1</b>: Selected Features</p></a>

There are 12 features selected: `posx`, `posy`, `posx_n`,
`posy_n`, `spdx`, `spdy`, `spdx_n`, `aclx_n`, `hedx`, `hedy`, `hedx_n`, `hedy_n`

### <a class="anchor" id="bullet2"><p><b>Section 2</b>: Extract and Load Datasets</p></a>

Then the next step is to install all necessary packages and libraries.

In [1]:
pip install gdown

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install tensorflow

Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install --upgrade tensorflow --user

Note: you may need to restart the kernel to use updated packages.


In [4]:
pip install --user numpy==1.24.4

Note: you may need to restart the kernel to use updated packages.


In [5]:
pip install --upgrade scipy

Note: you may need to restart the kernel to use updated packages.


<p>There are two methods to download the package, choose one of them to download the dataset: </p>
<p><b>Method 1</b>: Using gdown commands (Sometimes with errors)</p>

<p>Here we download the CSV VANETs dataset file from remote google drive, and savce it in your local computer's download folder. </p>
The <b>correct</b> dataset name is "mixalldata_clean.csv".

In [6]:
import pandas as pd
import gdown

file_id = '1mbQUfSEe2EU2sh40Q1Q0KiZD-k7vRuU9'
file_url = f'https://drive.google.com/uc?id={file_id}'

output_file = 'mixalldata_clean.csv'
gdown.download(file_url, output_file, quiet=False)

Downloading...
From (original): https://drive.google.com/uc?id=1mbQUfSEe2EU2sh40Q1Q0KiZD-k7vRuU9
From (redirected): https://drive.google.com/uc?id=1mbQUfSEe2EU2sh40Q1Q0KiZD-k7vRuU9&confirm=t&uuid=91a42141-951e-4ad4-b50c-54c20847c43e
To: C:\Users\david\Downloads\mixalldata_clean.csv
100%|█████████████████████████████████████████████████████████████████████████████| 1.21G/1.21G [00:23<00:00, 51.1MB/s]


'mixalldata_clean.csv'

<p><b>Method 2</b>: Direct downloading from sources</p>
At first, go to the webpage <a href="https://data.mendeley.com/datasets/k62n4z9gdz/1">Dataset for Misbehaviors in VANETs</a>.
Then click the button "Download All 314 MB". Then de-compress this compressed folder.

<b>Expected Outcome</b>
<p>After downloading the dataset, to have a good double check, the program below prints out the first 5 records inside.</p>

In [2]:
import pandas as pd
import gdown

# Load the dataset
output_file = 'mixalldata_clean.csv'
df = pd.read_csv(output_file)

# Display the DataFrame
print(df.head())

   type      sendTime  sender  senderPseudo  messageID  class        posx  \
0     4  72002.302942  130137     101301377  422013806      0  266.982401   
1     4  72003.302942  130137     101301377  422023410      0  266.827208   
2     4  72004.302942  130137     101301377  422032081      0  266.420297   
3     4  72005.302942  130137     101301377  422040712      0  268.912026   
4     4  72006.302942  130137     101301377  422052949      0  268.242276   

        posy  posz    posx_n  ...  aclz    aclx_n    acly_n  aclz_n      hedx  \
0  32.336955   0.0  3.480882  ...   0.0  0.000862  0.000862     0.0 -0.102790   
1  34.624145   0.0  3.546261  ...   0.0  0.000107  0.001040     0.0 -0.099856   
2  38.836461   0.0  3.544045  ...   0.0  0.000172  0.001661     0.0 -0.099856   
3  45.414229   0.0  3.340080  ...   0.0  0.000171  0.001654     0.0 -0.100172   
4  53.729986   0.0  3.328872  ...   0.0  0.000193  0.001852     0.0 -0.097105   

       hedy  hedz     hedx_n     hedy_n  hedz_n  


<b>Important: before completing later sections, please run all of this section programs in order to prevent possible errors.</b>

### <a class="anchor" id="bullet3"><p><b>Section 3</b>: Classification (Binary Classification Approach (BCA))</p></a>

<p>At first we need to divide dataset into training data and testing data. This is completed on each algorithm's implementation. </p>
<p>80% data is training data, while 20% remaining is testing data. (This is normal setting). However, in order to improve accuracy, some models' training-testing data ratio s are customized.</p>

In [11]:
from sklearn.metrics import precision_score, recall_score, f1_score
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']
X = df[features]
Y = (df['class'] != 0).astype(int)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, Y_train, Y_test = train_test_split(X_scaled, Y, test_size=0.2, random_state=42)

lr = LogisticRegression(random_state=42)

lr.fit(X_train, Y_train)

Y_pred = lr.predict(X_test)

accuracy = accuracy_score(Y_test, Y_pred)
precision = precision_score(Y_test, Y_pred)
recall = recall_score(Y_test, Y_pred)
f1 = f1_score(Y_test, Y_pred)

print("Logistic Regression Accuracy:", accuracy)
print("Logistic Regression Precision:", precision)
print("Logistic Regression Recall:", recall)
print("Logistic Regression F1-score:", f1)

Logistic Regression Accuracy: 0.7163430689148963
Logistic Regression Precision: 0.8915322944782814
Logistic Regression Recall: 0.34267433384627843
Logistic Regression F1-score: 0.4950633517946889


In [12]:
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import pandas as pd
import warnings

warnings.filterwarnings("ignore", category=FutureWarning)

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']
X = df[features]
Y = (df['class'] != 0).astype(int)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

n_neighbor_amount = 1

print("*********")
while n_neighbor_amount < 11:
    knn = KNeighborsClassifier(n_neighbors=n_neighbor_amount)
    knn.fit(X_train, Y_train)
    Y_pred = knn.predict(X_test)
    accuracy = accuracy_score(Y_test, Y_pred)
    print("KNN Accuracy when n_neighbors =", n_neighbor_amount, ":", accuracy)
    precision = precision_score(Y_test, Y_pred)
    recall = recall_score(Y_test, Y_pred)
    f1 = f1_score(Y_test, Y_pred)

    print("KNN Precision:", precision)
    print("KNN Recall:", recall)
    print("KNN F1-score:", f1)
    print("*********")
    n_neighbor_amount += 1

*********
KNN Accuracy when n_neighbors = 1 : 0.8439609867253451
KNN Precision: 0.7930297466030114
KNN Recall: 0.8328300300439282
KNN F1-score: 0.8124427422040043
*********
KNN Accuracy when n_neighbors = 2 : 0.8684976571376701
KNN Precision: 0.9037141461886465
KNN Recall: 0.756543906944814
KNN F1-score: 0.8236062214888663
*********
KNN Accuracy when n_neighbors = 3 : 0.8565063337099859
KNN Precision: 0.8381049981440537
KNN Recall: 0.8011431348274306
KNN F1-score: 0.8192073573517319
*********
KNN Accuracy when n_neighbors = 4 : 0.8629229907255831
KNN Precision: 0.8989349343395384
KNN Recall: 0.7460805979474482
KNN F1-score: 0.8154061772237268
*********
KNN Accuracy when n_neighbors = 5 : 0.8516015036887953
KNN Precision: 0.8545473359259802
KNN Recall: 0.7644116365263203
KNN F1-score: 0.8069703292788436
*********
KNN Accuracy when n_neighbors = 6 : 0.8509958338680548
KNN Precision: 0.8996877116228764
KNN Recall: 0.7122185069054754
KNN F1-score: 0.7950515126596721
*********
KNN Accuracy 

In [13]:
from sklearn.metrics import precision_score, recall_score, f1_score
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']
X = df[features]
Y = (df['class'] != 0).astype(int)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

rf = RandomForestClassifier(n_estimators=20, random_state=42)

rf.fit(X_train, Y_train)

Y_pred = rf.predict(X_test)

accuracy = accuracy_score(Y_test, Y_pred)
precision = precision_score(Y_test, Y_pred)
recall = recall_score(Y_test, Y_pred)
f1 = f1_score(Y_test, Y_pred)

print("Random Forest Accuracy:", accuracy)
print("Random Forest Precision:", precision)
print("Random Forest Recall:", recall)
print("Random Forest F1-score:", f1)

Random Forest Accuracy: 0.8981707832390659
Random Forest Precision: 0.903764437828983
Random Forest Recall: 0.8383297272906085
Random Forest F1-score: 0.8698181876386801


In [14]:
from sklearn.metrics import precision_score, recall_score, f1_score
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']
X = df[features]
Y = (df['class'] != 0).astype(int)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1, random_state=42)

dt = DecisionTreeClassifier(random_state=42)

dt.fit(X_train, Y_train)

Y_pred = dt.predict(X_test)

accuracy = accuracy_score(Y_test, Y_pred)
precision = precision_score(Y_test, Y_pred)
recall = recall_score(Y_test, Y_pred)
f1 = f1_score(Y_test, Y_pred)

print("Decision Tree Accuracy:", accuracy)
print("Decision Tree Precision:", precision)
print("Decision Tree Recall:", recall)
print("Decision Tree F1-score:", f1)

Decision Tree Accuracy: 0.8627805722406027
Decision Tree Precision: 0.819138748877226
Decision Tree Recall: 0.8499869051470474
Decision Tree F1-score: 0.8342777649669416


In [15]:
from sklearn.metrics import precision_score, recall_score, f1_score
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']
X = df[features]
Y = (df['class'] != 0).astype(int)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.31, random_state=42)

nb = GaussianNB()

nb.fit(X_train, Y_train)

Y_pred = nb.predict(X_test)

accuracy = accuracy_score(Y_test, Y_pred)
precision = precision_score(Y_test, Y_pred)
recall = recall_score(Y_test, Y_pred)
f1 = f1_score(Y_test, Y_pred)

print("Naive Bayes Accuracy:", accuracy)
print("Naive Bayes Precision:", precision)
print("Naive Bayes Recall:", recall)
print("Naive Bayes F1-score:", f1)

Naive Bayes Accuracy: 0.6771456929636881
Naive Bayes Precision: 0.7058645996287782
Naive Bayes Recall: 0.34941796043375956
Naive Bayes F1-score: 0.46744213934524526


In [16]:
from sklearn.metrics import precision_score, recall_score, f1_score
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']
X = df[features]
Y = (df['class'] != 0).astype(int)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, Y_train, Y_test = train_test_split(X_scaled, Y, test_size=0.2, random_state=42)

mlp = MLPClassifier(hidden_layer_sizes=(100, 50), activation='relu', solver='adam', random_state=42,
                    max_iter=100, tol=0.001)

mlp.fit(X_train, Y_train)

Y_pred = mlp.predict(X_test)

accuracy = accuracy_score(Y_test, Y_pred)
precision = precision_score(Y_test, Y_pred)
recall = recall_score(Y_test, Y_pred)
f1 = f1_score(Y_test, Y_pred)

print("MLP Accuracy:", accuracy)
print("MLP Precision:", precision)
print("MLP Recall:", recall)
print("MLP F1-score:", f1)



MLP Accuracy: 0.8146306040108802
MLP Precision: 0.98294437548434
MLP Recall: 0.5527851377045513
MLP F1-score: 0.707620759113709


In [17]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']
X = df[features]
Y = (df['class'] != 0).astype(int)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

model = Sequential([
    Dense(64, input_shape=(len(features),), activation='relu'),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(X_train, Y_train, epochs=1, batch_size=32, validation_split=0.2)

Y_pred_prob = model.predict(X_test)
Y_pred = (Y_pred_prob > 0.5).astype(int)

loss, accuracy = model.evaluate(X_test, Y_test)

Y_test = np.squeeze(Y_test)
Y_pred = np.squeeze(Y_pred)

precision = precision_score(Y_test, Y_pred)
recall = recall_score(Y_test, Y_pred)
f1 = f1_score(Y_test, Y_pred)

print("ANN Accuracy:", accuracy)
print("ANN Precision:", precision)
print("ANN Recall:", recall)
print("ANN F1-score:", f1)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m63897/63897[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m107s[0m 2ms/step - accuracy: 0.6863 - loss: 0.6609 - val_accuracy: 0.7556 - val_loss: 0.5082
[1m19968/19968[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 1ms/step
[1m19968/19968[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 1ms/step - accuracy: 0.7552 - loss: 0.5082
ANN Accuracy: 0.7555269598960876
ANN Precision: 0.9728088362108508
ANN Recall: 0.4089753824912163
ANN F1-score: 0.5758567227723579


In [18]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, MaxPooling1D, Flatten, Dense
from sklearn.metrics import precision_score, recall_score, f1_score

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']
X = df[features].values
Y = (df['class'] != 0).astype(int).values

sequence_length = len(features)
X = X.reshape(-1, sequence_length, 1)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

model = Sequential([
    Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(sequence_length, 1)),
    MaxPooling1D(pool_size=2),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(X_train, Y_train, epochs=1, batch_size=32, validation_split=0.2)

Y_pred = model.predict(X_test)
Y_pred_binary = (Y_pred > 0.5).astype(int)

accuracy = accuracy_score(Y_test, Y_pred_binary)
precision = precision_score(Y_test, Y_pred_binary)
recall = recall_score(Y_test, Y_pred_binary)
f1 = f1_score(Y_test, Y_pred_binary)

print("CNN Accuracy:", accuracy)
print("CNN Precision:", precision)
print("CNN Recall:", recall)
print("CNN F1-score:", f1)

  super().__init__(


[1m63897/63897[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m177s[0m 3ms/step - accuracy: 0.7427 - loss: 0.5481 - val_accuracy: 0.7939 - val_loss: 0.4566
[1m19968/19968[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m27s[0m 1ms/step
CNN Accuracy: 0.7931457582767051
CNN Precision: 0.9625828802666725
CNN Recall: 0.510075707613571
CNN F1-score: 0.6668078369684686


In [19]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from sklearn.metrics import precision_score, recall_score, f1_score

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']
X = df[features].values
Y = (df['class'] != 0).astype(int).values

sequence_length = len(features)
X = X.reshape(-1, sequence_length, 1)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

model = Sequential([
    LSTM(64, input_shape=(sequence_length, 1)),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(X_train, Y_train, epochs=1, batch_size=32, validation_split=0.2)

Y_pred_prob = model.predict(X_test)
Y_pred = (Y_pred_prob > 0.5).astype(int)

loss, accuracy = model.evaluate(X_test, Y_test)

Y_test = np.squeeze(Y_test)
Y_pred = np.squeeze(Y_pred)

precision = precision_score(Y_test, Y_pred)
recall = recall_score(Y_test, Y_pred)
f1 = f1_score(Y_test, Y_pred)

print("LSTM Accuracy:", accuracy)
print("LSTM Precision:", precision)
print("LSTM Recall:", recall)
print("LSTM F1-score:", f1)

  super().__init__(**kwargs)


[1m63897/63897[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m394s[0m 6ms/step - accuracy: 0.7600 - loss: 0.5018 - val_accuracy: 0.7913 - val_loss: 0.4606
[1m19968/19968[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m52s[0m 3ms/step
[1m19968/19968[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m49s[0m 2ms/step - accuracy: 0.7902 - loss: 0.4617
LSTM Accuracy: 0.790992259979248
LSTM Precision: 0.9751573919418349
LSTM Recall: 0.49761846910951957
LSTM F1-score: 0.6589683350357507


### <a class="anchor" id="bullet4"><p><b>Section 4</b>: Classification (A Multi-class Classification Approach for Three Classes (MCATC))</p></a>

In [6]:
from sklearn.metrics import precision_score, recall_score, f1_score
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

def characterize_class(value):
    if value == 0:
        return 0
    elif value >= 1 and value <= 12:
        return 1
    else:
        return 2

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']

X = df[features].values
Y = df['class'].apply(characterize_class).astype(int).values

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, Y_train, Y_test = train_test_split(X_scaled, Y, test_size=0.2, random_state=42)

lr = LogisticRegression(random_state=42)

lr.fit(X_train, Y_train)

Y_pred = lr.predict(X_test)

accuracy = accuracy_score(Y_test, Y_pred)
precision = precision_score(Y_test, Y_pred, average='weighted')
recall = recall_score(Y_test, Y_pred, average='weighted')
f1 = f1_score(Y_test, Y_pred, average='weighted')

print("Logistic Regression Accuracy:", accuracy)
print("Logistic Regression Precision:", precision)
print("Logistic Regression Recall:", recall)
print("Logistic Regression F1-score:", f1)

Logistic Regression Accuracy: 0.6687314738591653
Logistic Regression Precision: 0.6374461945250697
Logistic Regression Recall: 0.6687314738591653
Logistic Regression F1-score: 0.5787906916350495


In [8]:
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import pandas as pd
import warnings

warnings.filterwarnings("ignore", category=FutureWarning)

def characterize_class(value):
    if value == 0:
        return 0
    elif value >= 1 and value <= 12:
        return 1
    else:
        return 2

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']
X = df[features]
Y = df['class'].apply(characterize_class).astype(int).values

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

n_neighbor_amount = 1

print("*********")
while n_neighbor_amount < 11:
    knn = KNeighborsClassifier(n_neighbors=n_neighbor_amount)
    knn.fit(X_train, Y_train)
    Y_pred = knn.predict(X_test)
    accuracy = accuracy_score(Y_test, Y_pred)
    print("KNN Accuracy when n_neighbors =", n_neighbor_amount, ":", accuracy)
    precision = precision_score(Y_test, Y_pred, average='weighted')
    recall = recall_score(Y_test, Y_pred, average='weighted')
    f1 = f1_score(Y_test, Y_pred, average='weighted')

    print("KNN Precision:", precision)
    print("KNN Recall:", recall)
    print("KNN F1-score:", f1)
    print("*********")
    n_neighbor_amount += 1

*********
KNN Accuracy when n_neighbors = 1 : 0.8027159674597237
KNN Precision: 0.8066590359961757
KNN Recall: 0.8027159674597237
KNN F1-score: 0.804227308527448
*********
KNN Accuracy when n_neighbors = 2 : 0.8293560493425274
KNN Precision: 0.8267237731004661
KNN Recall: 0.8293560493425274
KNN F1-score: 0.8231326044770159
*********
KNN Accuracy when n_neighbors = 3 : 0.824667194606252
KNN Precision: 0.822739680871316
KNN Recall: 0.824667194606252
KNN F1-score: 0.8223984559151134
*********
KNN Accuracy when n_neighbors = 4 : 0.8278332670800455
KNN Precision: 0.8242200153592177
KNN Recall: 0.8278332670800455
KNN F1-score: 0.8218003239306294
*********
KNN Accuracy when n_neighbors = 5 : 0.8204494163972192
KNN Precision: 0.8173448159450895
KNN Recall: 0.8204494163972192
KNN F1-score: 0.8158442803840729
*********
KNN Accuracy when n_neighbors = 6 : 0.817145620553335
KNN Precision: 0.8142309332525839
KNN Recall: 0.817145620553335
KNN F1-score: 0.8089476523027588
*********
KNN Accuracy when 

In [5]:
from sklearn.metrics import precision_score, recall_score, f1_score
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import warnings

warnings.filterwarnings("ignore", category=FutureWarning)

def characterize_class(value):
    if value == 0:
        return 0
    elif value >= 1 and value <= 12:
        return 1
    else:
        return 2

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']
X = df[features]
Y = df['class'].apply(characterize_class).astype(int).values

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

rf = RandomForestClassifier(n_estimators=20, random_state=42)

rf.fit(X_train, Y_train)

Y_pred = rf.predict(X_test)

accuracy = accuracy_score(Y_test, Y_pred)
precision = precision_score(Y_test, Y_pred, average='weighted')
recall = recall_score(Y_test, Y_pred, average='weighted')
f1 = f1_score(Y_test, Y_pred, average='weighted')

print("Random Forest Accuracy:", accuracy)
print("Random Forest Precision:", precision)
print("Random Forest Recall:", recall)
print("Random Forest F1-score:", f1)

Random Forest Accuracy: 0.8770443312747863
Random Forest Precision: 0.8756821197468284
Random Forest Recall: 0.8770443312747863
Random Forest F1-score: 0.875217927070579


In [7]:
from sklearn.metrics import precision_score, recall_score, f1_score
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import warnings

warnings.filterwarnings("ignore", category=FutureWarning)

def characterize_class(value):
    if value == 0:
        return 0
    elif value >= 1 and value <= 12:
        return 1
    else:
        return 2

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']
X = df[features]
Y = df['class'].apply(characterize_class).astype(int).values

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

dt = DecisionTreeClassifier(random_state=42)

dt.fit(X_train, Y_train)

Y_pred = dt.predict(X_test)

accuracy = accuracy_score(Y_test, Y_pred)
precision = precision_score(Y_test, Y_pred, average='weighted')
recall = recall_score(Y_test, Y_pred, average='weighted')
f1 = f1_score(Y_test, Y_pred, average='weighted')

print("Decision Tree Accuracy:", accuracy)
print("Decision Tree Precision:", precision)
print("Decision Tree Recall:", recall)
print("Decision Tree F1-score:", f1)

Decision Tree Accuracy: 0.8322498051527321
Decision Tree Precision: 0.8343664982332558
Decision Tree Recall: 0.8322498051527321
Decision Tree F1-score: 0.8331118608663234


In [8]:
from sklearn.metrics import precision_score, recall_score, f1_score
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
import warnings

warnings.filterwarnings("ignore", category=FutureWarning)

def characterize_class(value):
    if value == 0:
        return 0
    elif value >= 1 and value <= 12:
        return 1
    else:
        return 2

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']
X = df[features]
Y = df['class'].apply(characterize_class).astype(int).values

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

nb = GaussianNB()

nb.fit(X_train, Y_train)

Y_pred = nb.predict(X_test)

accuracy = accuracy_score(Y_test, Y_pred)
precision = precision_score(Y_test, Y_pred, average='weighted')
recall = recall_score(Y_test, Y_pred, average='weighted')
f1 = f1_score(Y_test, Y_pred, average='weighted')

print("Naive Bayes Accuracy:", accuracy)
print("Naive Bayes Precision:", precision)
print("Naive Bayes Recall:", recall)
print("Naive Bayes F1-score:", f1)

Naive Bayes Accuracy: 0.6558418184492975
Naive Bayes Precision: 0.6164199640285972
Naive Bayes Recall: 0.6558418184492975
Naive Bayes F1-score: 0.5848392450226766


In [9]:
from sklearn.metrics import precision_score, recall_score, f1_score
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
import warnings

warnings.filterwarnings("ignore", category=FutureWarning)

def characterize_class(value):
    if value == 0:
        return 0
    elif value >= 1 and value <= 12:
        return 1
    else:
        return 2

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']
X = df[features]
Y = df['class'].apply(characterize_class).astype(int).values

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

mlp = MLPClassifier(hidden_layer_sizes=(100, 50), activation='relu', solver='adam', random_state=42,
                    max_iter=100, tol=0.001)

mlp.fit(X_train, Y_train)

Y_pred = mlp.predict(X_test)

accuracy = accuracy_score(Y_test, Y_pred)
precision = precision_score(Y_test, Y_pred, average='weighted')
recall = recall_score(Y_test, Y_pred, average='weighted')
f1 = f1_score(Y_test, Y_pred, average='weighted')

print("MLP Accuracy:", accuracy)
print("MLP Precision:", precision)
print("MLP Recall:", recall)
print("MLP F1-score:", f1)



MLP Accuracy: 0.7682882550136002
MLP Precision: 0.7805252032273552
MLP Recall: 0.7682882550136002
MLP F1-score: 0.7425781760165167


In [10]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

def characterize_class(value):
    if value == 0:
        return 0
    elif value >= 1 and value <= 12:
        return 1
    else:
        return 2

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']

X = df[features].values
Y = df['class'].apply(characterize_class).astype(int).values
Y = to_categorical(Y)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Normalizing the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = Sequential([
    Dense(256, input_shape=(len(features),), activation='relu'),
    Dropout(0.5),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(64, activation='relu'),
    Dropout(0.5),
    Dense(32, activation='relu'),
    Dense(3, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.001)

model.fit(X_train, Y_train, epochs=50, batch_size=32, validation_split=0.2, callbacks=[early_stopping, reduce_lr])

Y_pred_prob = model.predict(X_test)
Y_pred = np.argmax(Y_pred_prob, axis=1)
Y_test_classes = np.argmax(Y_test, axis=1)

accuracy = accuracy_score(Y_test_classes, Y_pred)
precision = precision_score(Y_test_classes, Y_pred, average='weighted')
recall = recall_score(Y_test_classes, Y_pred, average='weighted')
f1 = f1_score(Y_test_classes, Y_pred, average='weighted')

print("ANN Accuracy:", accuracy)
print("ANN Precision:", precision)
print("ANN Recall:", recall)
print("ANN F1-score:", f1)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/50
[1m63897/63897[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m149s[0m 2ms/step - accuracy: 0.7044 - loss: 0.7609 - val_accuracy: 0.7539 - val_loss: 0.6875 - learning_rate: 0.0010
Epoch 2/50
[1m63897/63897[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m154s[0m 2ms/step - accuracy: 0.7422 - loss: 0.7030 - val_accuracy: 0.7658 - val_loss: 0.6652 - learning_rate: 0.0010
Epoch 3/50
[1m63897/63897[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m157s[0m 2ms/step - accuracy: 0.7509 - loss: 0.6891 - val_accuracy: 0.7674 - val_loss: 0.6584 - learning_rate: 0.0010
Epoch 4/50
[1m63897/63897[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m162s[0m 3ms/step - accuracy: 0.7545 - loss: 0.6835 - val_accuracy: 0.7737 - val_loss: 0.6515 - learning_rate: 0.0010
Epoch 5/50
[1m63897/63897[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m149s[0m 2ms/step - accuracy: 0.7590 - loss: 0.6767 - val_accuracy: 0.7755 - val_loss: 0.6431 - learning_rate: 0.0010
Epoch 6/50
[1m63897/63897[0m [32m━━━━

In [3]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, MaxPooling1D, Flatten, Dense
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

def characterize_class(value):
    if value == 0:
        return 0
    elif value >= 1 and value <= 12:
        return 1
    else:
        return 2

# Ensure 'class' column exists in df
df['class'] = df['class'].apply(characterize_class).astype(int)

# Check the unique classes in the target
unique_classes = df['class'].unique()
print(f"Unique classes in target: {unique_classes}")

features = ['posx', 'posy', 'posx_n', 'posy_n', 'spdx', 'spdy', 'spdx_n', 'aclx_n', 'hedx', 'hedy', 'hedx_n', 'hedy_n']
X = df[features].values
Y = to_categorical(df['class'].values, num_classes=3)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Normalizing the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Reshape the input data to fit the Conv1D layer
sequence_length = X_train.shape[1]
X_train = X_train.reshape((X_train.shape[0], sequence_length, 1))
X_test = X_test.reshape((X_test.shape[0], sequence_length, 1))

model = Sequential([
    Conv1D(filters=64, kernel_size=2, activation='relu', input_shape=(sequence_length, 1)),
    MaxPooling1D(pool_size=2),
    Conv1D(filters=128, kernel_size=2, activation='relu'),
    MaxPooling1D(pool_size=2),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(3, activation='softmax')  # Assuming 3 classes
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.001)

model.fit(X_train, Y_train, epochs=50, batch_size=32, validation_split=0.2, callbacks=[early_stopping, reduce_lr])

Y_pred_prob = model.predict(X_test)
Y_pred = np.argmax(Y_pred_prob, axis=1)
Y_test_classes = np.argmax(Y_test, axis=1)

accuracy = accuracy_score(Y_test_classes, Y_pred)
precision = precision_score(Y_test_classes, Y_pred, average='weighted')
recall = recall_score(Y_test_classes, Y_pred, average='weighted')
f1 = f1_score(Y_test_classes, Y_pred, average='weighted')

print("CNN Accuracy:", accuracy)
print("CNN Precision:", precision)
print("CNN Recall:", recall)
print("CNN F1-score:", f1)

Unique classes in target: [0 2 1]


  super().__init__(


Epoch 1/50
[1m63897/63897[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m340s[0m 5ms/step - accuracy: 0.7293 - loss: 0.7208 - val_accuracy: 0.7632 - val_loss: 0.6572 - learning_rate: 0.0010
Epoch 2/50
[1m63897/63897[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m340s[0m 5ms/step - accuracy: 0.7672 - loss: 0.6505 - val_accuracy: 0.7755 - val_loss: 0.6343 - learning_rate: 0.0010
Epoch 3/50
[1m63897/63897[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m324s[0m 5ms/step - accuracy: 0.7756 - loss: 0.6334 - val_accuracy: 0.7814 - val_loss: 0.6236 - learning_rate: 0.0010
Epoch 4/50
[1m63897/63897[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m304s[0m 5ms/step - accuracy: 0.7800 - loss: 0.6240 - val_accuracy: 0.7843 - val_loss: 0.6155 - learning_rate: 0.0010
Epoch 5/50
[1m63897/63897[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m294s[0m 5ms/step - accuracy: 0.7834 - loss: 0.6162 - val_accuracy: 0.7855 - val_loss: 0.6124 - learning_rate: 0.0010
Epoch 6/50
[1m63897/63897[0m [32m━━━━