# Computer Lab 1: k-NN classifier

## Exercise 3 – User localization from RSSI

Consider the following scenario, in which we wish to localize a user employing a non-GPS system  (e.g., in indoor localization). The user holds a transmission device (e.g., a smartphone or other sensor with transmission capabilities). Localization is based on measurements of the Received Signal Strength Indicator (RSSI) from D sensors (base stations) placed in the area in which the localization service is provided. The area is divided into $N_C$ square cells, and localization amounts to identifying the cell in which the user is located.

In a **training stage**, the transmission device is placed in the center of each cell and broadcasts a data packet, and RSSI is measured by each sensor. This yields one measurement, corresponding to a vector of length $D$. The process is repeated $M$ times for each cell, and for all $N_C$ cells. The training stage provides a 3-dimensional array of size $N_C \times D \times M$.

In a **test stage**, the user is located in an unknown cell. The transmission device broadcasts a data packet, and each sensor measures the RSSI and communicates it to a fusion center. The fusion center treats the received RSSI values as a test vector of length $D$. It applies a k-NN classifier, comparing the test vector with all $M \times N_C$ training vectors available in the training set. For each test vector, the k-NN classifier outputs the probability that each cell contains the user.

**Available data**: you are provided with a file (`localization.mat` in `/data/` folder) containing two variables, called traindata and testdata. These variables have the same size, and are 3-dimensional arrays of size $D=7$, $M=5$, and $N_C = 24$.

The training data can be seen as labelled data where each cell is a class, and you are given M data vectors for each cell. Regarding the test data, a test vector consists of a single measurement; so each measurement has to be used individually and you can perform up to M tests for each cell.
The data correspond to real acquisition experiments performed outdoors nearby Politecnico di Torino, using an STM32L microcontroller with 915 MHz 802.15.4 transceiver.

**Task**: your task is to implement a k-NN classifier in Matlab for the classification task described above, and evaluate its performance.

**Performance evaluation**: The performance is defined in terms of accuracy in the localization task, and it has to be averaged over all cells. Average accuracy is defined as the posterior probability associated to the cell that the user is actually located in.

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import scipy.io
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier
from tqdm import tqdm

# Plots setting.
sns.set_context(
    'talk', rc = {
        'font.size': 12.0,
        'axes.labelsize': 10.0,
        'axes.titlesize': 10.0,
        'xtick.labelsize': 10.0,
        'ytick.labelsize': 10.0,
        'legend.fontsize': 10.0,
        'legend.title_fontsize': 12.0,
        'patch.linewidth': 2.0
        }
    )

data_sets = ['Train', 'Test']

In [3]:
# Check current folder.
os.getcwd()

'/'

In [27]:
data_path = "/Users/ernestocolacrai/Documents/GitHub/StatisticalLearning/data/"

try:
    # Attempt to load the MATLAB data file.
    data = scipy.io.loadmat(data_path + f"localization.mat")

    print(
        f"Data ✓\n",
        f"Data Keys: {data.keys()}"
        )
except:
    print(f"Not found data! ({data_path})")

Data ✓
 Data Keys: dict_keys(['__header__', '__version__', '__globals__', 'cell_coordinates', 'testdata', 'traindata'])


In [106]:
def rearrange(dataset, rows, columns, depth):
    arranged = np.zeros([columns * depth, rows + 1])
    count = 0
    for j in range(depth):
        for i in np.arange(0, columns, 1):
            arranged[i + count, :-1] = dataset[:, i, j].T
            arranged[i + count, -1] = j + 1
        
        count = count + columns
    return arranged

# function [ArrangedSet] = Rearrange(dataset, row, column, depth)
#     ArrangedSet = zeros(column*depth,row+1);
#     count = 0;
#     for j=1:depth
#         for i=1:column
#             ArrangedSet(i+count,1:7) = dataset(:,i,j)';
#             ArrangedSet(i+count,8) = j;
#         end
#         count = count + column;
#     end
# end

In [103]:
arranged = np.zeros([M * Nc, D + 1])
i = 0
arranged[i:M, :-1] = data['traindata'][:, :, i].T
arranged[i:M, -1] = i

In [109]:
Nc = 24 # Classes number (cells number)
M = 5 # Measures number for each cell(class)
D = 7 # Features number

len(rearrange(data['traindata'], D, M, Nc)), len(rearrange(data['testdata'], D, M, Nc))

(120, 120)

In [75]:
data['traindata']

array([[[-32, -31, -27, -28, -44, -49, -44, -44, -52, -52, -60, -53,
         -62, -61, -61, -56, -68, -71, -63, -62, -73, -63, -69, -79],
        [-32, -31, -27, -28, -46, -49, -44, -44, -52, -52, -60, -53,
         -63, -61, -56, -57, -67, -71, -63, -63, -74, -63, -68, -77],
        [-32, -31, -27, -27, -44, -49, -44, -44, -52, -52, -60, -53,
         -62, -61, -56, -57, -68, -68, -63, -62, -74, -63, -68, -77],
        [-32, -30, -27, -28, -45, -49, -45, -44, -51, -52, -60, -53,
         -63, -61, -56, -56, -68, -68, -63, -64, -74, -63, -68, -75],
        [-32, -30, -28, -28, -44, -48, -45, -44, -52, -52, -60, -53,
         -63, -60, -56, -57, -68, -68, -63, -64, -74, -63, -68, -74]],

       [[-49, -49, -35, -44, -34, -33, -34, -36, -39, -33, -39, -54,
         -66, -47, -50, -53, -58, -57, -55, -54, -63, -62, -68, -54],
        [-49, -49, -35, -44, -32, -33, -35, -37, -39, -33, -39, -54,
         -66, -46, -54, -53, -58, -58, -54, -56, -63, -63, -69, -54],
        [-48, -49, -35, -