# Wi-Fi localization. Modele de clasificare

Sîrbu Matei-Dan, _grupa 10LF383_

<i>Sursă dataset:</i> http://archive.ics.uci.edu/ml/datasets/Wireless+Indoor+Localization

<i>Descriere dataset:</i> [DOI 10.1007/978-981-10-3322-3_27 via ResearchGate](Docs/chp_10.1007_978-981-10-3322-3_27.pdf)

<i>Synopsis:</i> Setul de date _Wireless Indoor Localization_ cuprinde 2000 de măsurători ale puterii semnalului (măsurat în dBm) recepționat de la routerele unui birou din Pittsburgh. Acest birou are șapte routere și patru camere; un utilizator înregistrează cu ajutorul unui smartphone o dată pe secundă puterea semnalelor venite de la cele șapte routere, fiecărei înregistrări fiindu-i asociate camera în care se afla utilizatorul la momentul măsurării (1, 2, 3 sau 4).

În figura de mai jos este ilustrat un sample din dataset: <br><br>
![Sample](./Images/wifi_localization_sample.png)

În cele ce urmează, coloana Class (camera) este reprezentată de y, iar coloanele WS1 - WS7 (features: puterea semnalului de la fiecare router), de X.

In [1]:
import numpy as np
import pandas as pd
from IPython.display import display, HTML

In [2]:
header = ['WS1', 'WS2', 'WS3', 'WS4', 'WS5', 'WS6', 'WS7', 'Class']
data_wifi = pd.read_csv("./Datasets/wifi_localization.txt", names=header, sep='\t')
display(HTML("<i>Dataset overview:</i>"))
display(data_wifi)
X = data_wifi.values[:, :7]
y = data_wifi.values[:, -1]

Unnamed: 0,WS1,WS2,WS3,WS4,WS5,WS6,WS7,Class
0,-64,-56,-61,-66,-71,-82,-81,1
1,-68,-57,-61,-65,-71,-85,-85,1
2,-63,-60,-60,-67,-76,-85,-84,1
3,-61,-60,-68,-62,-77,-90,-80,1
4,-63,-65,-60,-63,-77,-81,-87,1
...,...,...,...,...,...,...,...,...
1995,-59,-59,-48,-66,-50,-86,-94,4
1996,-59,-56,-50,-62,-47,-87,-90,4
1997,-62,-59,-46,-65,-45,-87,-88,4
1998,-62,-58,-52,-61,-41,-90,-85,4


# Modele de clasificare
### 1. <i>k</i>-nearest neighbors classifier

In [4]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_validate

# hiperparametri
neighbors, folds = 4, 5

# implementare KNN
model = KNeighborsClassifier(n_neighbors=neighbors)
model_acc = cross_validate(model, X, y, cv=folds, scoring='accuracy', return_train_score=True)
model_f1 = cross_validate(model, X, y, cv=folds, scoring='f1_macro', return_train_score=True)

# statistici
print(f"{folds}-fold cross validation for {neighbors}-nearest neighbors classification:\n")
print(f"Test accuracy: {model_acc['test_score']} \n=> Average test accuracy: {round(model_acc['test_score'].mean() * 100, 3)}%")
print(f"Train accuracy: {model_acc['train_score']} \n=> Average train accuracy: {round(model_acc['train_score'].mean() * 100, 3)}%")
print(f"Test F1 score: {model_f1['test_score']} \n=> Average test F1 score: {round(model_f1['test_score'].mean() * 100, 3)}%")
print(f"Train F1 score: {model_f1['train_score']} \n=> Average train F1 score: {round(model_f1['train_score'].mean() * 100, 3)}%")

5-fold cross validation for 4-nearest neighbors classification:

Test accuracy: [0.965  0.98   0.975  0.9825 0.985 ] 
=> Average test accuracy: 97.75%
Train accuracy: [0.993125 0.991875 0.993125 0.99     0.99125 ] 
=> Average train accuracy: 99.188%
Test F1 score: [0.96489838 0.97996795 0.97488398 0.98255407 0.98503657] 
=> Average test F1 score: 97.747%
Train F1 score: [0.99311709 0.99188648 0.99312342 0.98998978 0.99124912] 
=> Average train F1 score: 99.187%


### 2. TBA
<i> TODO: TBA </i>

### 3. TBA
<i> TODO: TBA </i>

### 4. TBA
<i> TODO: TBA </i>

### 5. TBA
<i> TODO: TBA </i>