# Computer Lab 1: k-NN classifier

## Exercise 3 – User localization from RSSI

Consider the following scenario, in which we wish to localize a user employing a non-GPS system  (e.g., in indoor localization). The user holds a transmission device (e.g., a smartphone or other sensor with transmission capabilities). Localization is based on measurements of the Received Signal Strength Indicator (RSSI) from D sensors (base stations) placed in the area in which the localization service is provided. The area is divided into $N_C$ square cells, and localization amounts to identifying the cell in which the user is located.

In a **training stage**, the transmission device is placed in the center of each cell and broadcasts a data packet, and RSSI is measured by each sensor. This yields one measurement, corresponding to a vector of length $D$. The process is repeated $M$ times for each cell, and for all $N_C$ cells. The training stage provides a 3-dimensional array of size $N_C \times D \times M$.

In a **test stage**, the user is located in an unknown cell. The transmission device broadcasts a data packet, and each sensor measures the RSSI and communicates it to a fusion center. The fusion center treats the received RSSI values as a test vector of length $D$. It applies a k-NN classifier, comparing the test vector with all $M \times N_C$ training vectors available in the training set. For each test vector, the k-NN classifier outputs the probability that each cell contains the user.

**Available data**: you are provided with a file (`localization.mat` in `/data/` folder) containing two variables, called traindata and testdata. These variables have the same size, and are 3-dimensional arrays of size $D=7$, $M=5$, and $N_C = 24$.

The training data can be seen as labelled data where each cell is a class, and you are given M data vectors for each cell. Regarding the test data, a test vector consists of a single measurement; so each measurement has to be used individually and you can perform up to M tests for each cell.
The data correspond to real acquisition experiments performed outdoors nearby Politecnico di Torino, using an STM32L microcontroller with 915 MHz 802.15.4 transceiver.

**Task**: your task is to implement a k-NN classifier in Matlab for the classification task described above, and evaluate its performance.

**Performance evaluation**: The performance is defined in terms of accuracy in the localization task, and it has to be averaged over all cells. Average accuracy is defined as the posterior probability associated to the cell that the user is actually located in.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import os
import random
import scipy.io
import seaborn as sns
from tqdm import tqdm

# Plot Seaborn settings
sns.set_context(
    'talk',
    rc = {
        'font.size': 12.0,
        'axes.labelsize': 10.0,
        'axes.titlesize': 10.0,
        'xtick.labelsize': 10.0,
        'ytick.labelsize': 10.0,
        'legend.fontsize': 10.0,
        'legend.title_fontsize': 12.0,
        'patch.linewidth': 2.0
    }
)

## 0. Load data

In [2]:
# Check current folder
os.getcwd()

'/'

In [3]:
path = "/Users/ernestocolacrai/Documents/GitHub/StatisticalLearning/data/"

try:
    # Attempt to load the MATLAB data file .mat
    data = scipy.io.loadmat(path + "localization.mat")

    print(
        f"Data ✓\n"
        f"Data Keys: {data.keys()}"
    )
except:
    print(f"Not found data! ({path})")

Data ✓
Data Keys: dict_keys(['__header__', '__version__', '__globals__', 'cell_coordinates', 'testdata', 'traindata'])


In [4]:
# Check train and test datasets shapes and types

print(
    f"Train dataset shape: \t{data['traindata'].shape}, type: {type(data['traindata'])}\n"
    f"Test dataset shape: \t{data['testdata'].shape}, type: {type(data['testdata'])}"
)

Train dataset shape: 	(7, 5, 24), type: <class 'numpy.ndarray'>
Test dataset shape: 	(7, 5, 24), type: <class 'numpy.ndarray'>


In [5]:
D = 7 # Features number (ROWS)
M = 5 # Measures number for each cell (class) (COLUMNS)
Nc = 24 # Classes number (cells number) (DEPTH)

## 1. Rearrange data

In [6]:
def rearrange(dataset:np.ndarray, rows:int, columns:int, depth:int) -> np.ndarray:
    """
    Reshapes a 3D NumPy array into a flattened 2D array.

    Parameters:
        dataset (numpy.ndarray): A 3D NumPy array representing the input dataset.
        rows (int): The number of rows in the input dataset.
        columns (int): The number of columns in the input dataset.
        depth (int): The number of depth dimensions in the input dataset.

    Returns:
        (numpy.ndarray): The rearranged dataset, represented as a 2-dimensional NumPy array with dimensions of `(columns * depth, rows + 1)`.
    """
    arranged = np.zeros([columns * depth, rows + 1]) # Initialize an empty 2D NumPy array to store the rearranged data
    label = 0 # Multi-class label (1,2,...,24 classes)
    for j in range(depth): # Iterate through the depth
        for i in range(columns): # Iterate through each column
            # Rearrange the data from the input dataset along the depth dimension into a temporary 1D array
            rearranged_data = dataset[:, i, j].T
            arranged[i + label, :-1] = rearranged_data # Append the rearranged data to the `arranged` array
            arranged[i + label, -1] = j + 1 # +1 since it starts from 0
        
        label += columns # Update the label within the `arranged` array
    
    # Return the final rearranged array
    return arranged

In [9]:
# Create the (rearranged) train and test datasets
train_data = rearrange(data['traindata'], D, M, Nc).astype(int)
test_data = rearrange(data['testdata'], D, M, Nc).astype(int)

# Random permutation
train_data = np.random.permutation(train_data)
test_data = np.random.permutation(test_data)

In [13]:
# Check the (rearranged) train and test datasets shapes and types

print(
    f"Train dataset shape: \t{train_data.shape}, type: {type(train_data)}\n"
    f"Test dataset shape: \t{test_data.shape}, type: {type(test_data)}"
)

Train dataset shape: 	(120, 8), type: <class 'numpy.ndarray'>
Test dataset shape: 	(120, 8), type: <class 'numpy.ndarray'>


## 2. kNN classification

In [119]:
k = 11 # Number of nearest-neighboors
bar = True # Show tqdm progress bar

M = len(test_data)
N = len(train_data)

D = np.zeros([M, N], dtype=float)  # Distance matrix
E = np.zeros([M, k], dtype=int)  # Array of nearest neighbors

infer_labels = np.zeros(M, dtype=int) # Inferred labels

for i in tqdm(range(M), colour='green', disable=bar): # For each test point
    for j in range(N): # For each training point
        D[i][j] = np.sqrt(np.sum((test_data[i] - train_data[j]) ** 2)) # Calculate euclidean distance between the points
    # Find indices of k nearest neighbors
    E[i] = np.argsort(D[i])[:k]

    infer_labels[i] = np.argmax(np.bincount(train_data[E[i]][:,-1].astype(int)))

In [120]:
infer_labels, E

(array([ 9,  2,  5,  2, 13,  6, 21,  3, 14,  5, 13, 11,  9,  1, 17,  9,  1,
        14,  2,  2, 14, 14, 21, 11, 17,  6, 22, 14, 20,  6,  2,  2,  2, 18,
        18,  3,  6,  6,  6, 17, 11, 17, 20,  5, 16,  2,  6,  3, 18,  9, 18,
         2,  6, 17, 13,  1, 20, 20,  6,  6, 21,  6, 22, 23,  6, 21,  6,  7,
        22, 13,  5, 18, 18, 14, 23, 18,  7,  6, 11,  6,  9,  2, 18, 10, 20,
        17,  9, 14,  6, 21, 10, 17,  9, 21,  2, 21, 13,  7, 22,  1, 14, 13,
        23, 21, 22,  2, 14,  7, 21, 18, 18, 17, 23,  9, 21, 14,  6,  3,  9,
        23]),
 array([[ 43, 108,  81, ...,  90,  30,  91],
        [ 97,  95,   6, ...,  96,  32, 114],
        [ 38,  58,  86, ..., 114,   8,  26],
        ...,
        [ 35, 113, 111, ...,   6,  62,  61],
        [  7,  90,  45, ...,  71,  49,  43],
        [112,  41,  18, ..., 105, 104,  65]]))

In [126]:
# Test to return inferred label and non-zero probability for each class
# random.seed(1)
idx = random.randint(0,119) # Select a random istance of dataset

values, frequencies = np.unique(train_data[E[idx]][:,-1], return_counts=True)
total_labels_number = len(train_data[E[idx]][:,-1])
probabilities = np.round(frequencies / total_labels_number, 4)

print(
    f"Values: \t{values}\n"
    f"Frequencies: \t{frequencies}\n"
    f"Probabilities: \t{probabilities}\n\n"
    f"Inferret label (with probability): {values[np.argmax(frequencies)], probabilities[np.argmax(frequencies)]}"
)

Values: 	[5 6 7 8]
Frequencies: 	[1 4 5 1]
Probabilities: 	[0.0909 0.3636 0.4545 0.0909]

Inferret label (with probability): (7, 0.4545)


In [127]:
train_data[E[idx]]

array([[-45, -35, -37, -49, -55, -67, -47,   7],
       [-44, -34, -36, -50, -55, -68, -49,   7],
       [-44, -35, -36, -50, -55, -68, -49,   7],
       [-44, -34, -37, -50, -55, -72, -49,   7],
       [-44, -34, -39, -53, -55, -68, -53,   5],
       [-44, -36, -32, -48, -62, -58, -33,   8],
       [-48, -34, -36, -52, -62, -56, -42,   6],
       [-49, -33, -35, -53, -62, -57, -42,   6],
       [-45, -35, -36, -49, -55, -80, -49,   7],
       [-49, -33, -35, -54, -62, -57, -42,   6],
       [-49, -34, -36, -53, -63, -57, -42,   6]])