# Evaluation on public dataset: Multi environment

This notebook uses the multi environment dataset to evaluate a preprocessing pipeline and a CNN-based classification model. This dataset was described in:

> Baha’A, A., Almazari, M. M., Alazrai, R., & Daoud, M. I. (2020). A dataset for Wi-Fi-based human activity recognition in line-of-sight and non-line-of-sight indoor environments. Data in Brief, 33, 106534.

The dataset contains data from three different environments, i.e., E1, E2 and E3. However, only E1 and E2 will be used (since they are line-of-sight).
The results obtained from the evaluation on both datasets are stored in `02_RESULTS/02_MULTI-ENV/01_MODEL-REPORTS/` and can be loaded in the [Section 1.6](#summary-multi) of this notebook.

## Imports

In [1]:
import copy
import os

import numpy as np
import pandas as pd

from math import sqrt

from alive_progress import alive_bar
from tensorflow import keras
from tensorflow.keras import layers

from functions.filters import dbscan_filtering, wavelet_filtering
from functions.json import load_json, save_json
from functions.ml import clear_backend_and_seeds, cross_validation, one_hot_encoding
from functions.report_metrics import metrics_summary

## Constants

In [2]:
DATA = os.path.join('01_DATA', '02_MULTI-ENV')
ENV1 = os.path.join(DATA, 'ENVIRONMENT 1')
ENV2 = os.path.join(DATA, 'ENVIRONMENT 2')

RESULTS = '02_RESULTS'

MULTI_ENV_REPORTS = os.path.join(RESULTS, '02_MULTI-ENV', '01_MODEL-REPORTS', '{0}-reports.json')
MULTI_ENV_LABELS = ['No movement', 'Falling', 'Walking', 'Sitting/Standing', 'Turning', 'Pick up pen']

ACTIVITY_MAPPING = {
    'A01': 'A1',
    'A02': 'A2',
    'A03': 'A1',
    'A04': 'A1',
    'A05': 'A2',
    'A06': 'A3',
    'A07': 'A5',
    'A08': 'A3',
    'A09': 'A5',
    'A10': 'A4',
    'A11': 'A4',
    'A12': 'A6',
}

MAPPING = {'A1': 0, 'A2': 1, 'A3': 2, 'A4': 3, 'A5': 4, 'A6': 5}
NUM_CLASSES = len(MULTI_ENV_LABELS)

FOLDS = 10
BATCH_SIZE = 256
EPOCHS = 30

## Functions

### Data loading

In [None]:
def load_data(environment):
    data = {}
    subject_dirs = os.listdir(environment)
    subject_dirs = list(filter(lambda x: x.startswith('Subject'), subject_dirs))
    with alive_bar(len(subject_dirs), title=f'Loading data from subjects', force_tty=True) as progress_bar:
        for subject_dir in subject_dirs:
            dfs = []
            subject = f'S{int(subject_dir.split(" ")[-1]):02d}'
            data[subject] = {}
            subject_dir_path = os.path.join(environment, subject_dir)
            for file in os.listdir(subject_dir_path):
                if not file.endswith('.csv'):
                    continue

                base_activity = file.split('_')[3]
                file_path = os.path.join(subject_dir_path, file)
                df = pd.read_csv(file_path)
                df = df.iloc[160:-160] #remove 0.5 sec after and before due to noise

                if base_activity not in data[subject]:
                    data[subject][base_activity] = df
                else:
                    data[subject][base_activity] = pd.concat([data[subject][base_activity], df])
            progress_bar()
    return data

In [None]:
def amplitude_from_raw_data(data):
    amplitudes = {}
    with alive_bar(len(data.keys()), title=f'Extracting amplitudes from subject\'s data', force_tty=True) as progress_bar:
        for subject in data:
            amplitudes[subject] = {}
            for activity in data[subject]:
                activity_data = data[subject][activity]
                activity_amplitudes = []
                for index, row in activity_data.iterrows():
                    instance_amplitudes = []
                    for antenna in range(1,4):
                        for subcarrier in range(1,31):
                            csi_data = row[f'csi_1_{antenna}_{subcarrier}']
                            real, imaginary = csi_data.split('+')
                            real = int(real)
                            imaginary = int(imaginary[:-1])

                            instance_amplitudes.append(sqrt(imaginary ** 2 + real ** 2))
                    activity_amplitudes.append(instance_amplitudes)
                amplitudes[subject][activity] = np.array(activity_amplitudes)
            progress_bar()
    return amplitudes

### Data windowing and filtering

In [None]:
def create_windows(amplitudes, window_size=320, window_overlap=160):
    windows = {}
    windows_labels = {}
    for subject_id in amplitudes:
        subject_windows = []
        subject_windows_labels = []
        for activity_id in amplitudes[subject_id]:
            activity_amplitudes = amplitudes[subject_id][activity_id].T

            n = activity_amplitudes.shape[1] // window_overlap
            for i in range(0, (n-1) * window_overlap, window_overlap):
                if i+window_size > activity_amplitudes.shape[1]:
                    break
                subject_windows.append(activity_amplitudes[:,i:i+window_size])
                subject_windows_labels.append(ACTIVITY_MAPPING[activity_id])

        windows[subject_id] = np.array(subject_windows)
        windows_labels[subject_id] = np.array(subject_windows_labels)
    return windows, windows_labels

In [None]:
def process_windows(windows):
    proc_windows = {}
    with alive_bar(len(windows.keys()), title=f'Processing subject\'s windows', force_tty=True) as progress_bar:
        for subject_id in windows:
            windows_copy = copy.deepcopy(windows[subject_id])
            for i in range(len(windows_copy)):
                windows_copy[i] = np.apply_along_axis(lambda x: wavelet_filtering(dbscan_filtering(x)),1, windows_copy[i])
            proc_windows[subject_id] = windows_copy
            progress_bar()
    return proc_windows

### ML

In [None]:
def build_model():
    clear_backend_and_seeds()
    
    model = keras.Sequential([
        layers.Conv2D(filters=8, kernel_size=(3,10), input_shape=(90, 320, 1)),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.MaxPooling2D(),
        layers.Conv2D(filters=8, kernel_size=(3,10)),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.MaxPooling2D(),
        
        layers.Flatten(),

        layers.Dense(512, activation='relu'),
        layers.Dropout(0.2),
        layers.Dense(512, activation='relu'),
        layers.Dense(512, activation='relu'),
        layers.Dense(6, activation='softmax')
    ])

    model.compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=0.0001), metrics=['accuracy'])
    return model

In [None]:
def combine_executions(execution_windows, windows_labels):
    x = np.vstack([ window for window in execution_windows.values() ])
    y = np.concatenate([ window_labels for window_labels in windows_labels.values() ])
    return x, y

## E1

### Data loading

Follow the instructions provided in [`01_DATA/02_MULTI-ENV/README.md`](./01_DATA/02_MULTI-ENV/README.md) and then execute the following cell to load the data.

In [None]:
data = load_data(ENV1)

In [None]:
amplitudes = amplitude_from_raw_data(data)

### Data windowing and processing

In [None]:
windows, windows_labels = create_windows(amplitudes)

In [None]:
proc_windows = process_windows(windows)

In [None]:
x, y = combine_executions(proc_windows, windows_labels)
y = one_hot_encoding(y, MAPPING)

**Free resources**: keep only **x** and **y**.

In [None]:
del data
del amplitudes
del windows
del proc_windows
del windows_labels

### 10-fold cross validation

The cell below performs the cross validation using the E1 dataset. 

> **WARNING**: Its execution can last several hours. You can instead load the results obtained by us in [Section 1.6](#summary-multi)

In [None]:
e1_reports = cross_validation(build_model, x, y, folds=FOLDS, batch_size=BATCH_SIZE, epochs=EPOCHS, labels=MULTI_ENV_LABELS)

In [None]:
save_json(e1_reports, MULTI_ENV_REPORTS.format('e1'))

## E2

### Data loading

Follow the instructions provided in [`01_DATA/02_MULTI-ENV/README.md`](./01_DATA/02_MULTI-ENV/README.md) and then execute the following cell to load the data.

In [None]:
data = load_data(ENV2)

In [None]:
amplitudes = amplitude_from_raw_data(data)

### Data windowing and processing

In [None]:
windows, windows_labels = create_windows(amplitudes)

In [None]:
proc_windows = process_windows(windows)

In [None]:
x, y = combine_executions(proc_windows, windows_labels)
y = one_hot_encoding(y)

**Free resources**: keep only **x** and **y**.

In [None]:
del data
del amplitudes
del windows
del proc_windows
del windows_labels

### 10-fold cross validation

The cell below performs the cross validation using the E2 dataset. 

> **WARNING**: Its execution can last several hours. You can instead load the results obtained by us in [Section 1.6](#summary-multi)

In [None]:
e2_reports = cross_validation(build_model, x, y, folds=FOLDS, batch_size=BATCH_SIZE, epochs=EPOCHS, labels=MULTI_ENV_LABELS)

In [None]:
save_json(e2_reports, MULTI_ENV_REPORTS.format('e2'))

<a id='summary-multi'></a>
## Summary

In [3]:
e1_reports = load_json(MULTI_ENV_REPORTS.format('e1'))
e2_reports = load_json(MULTI_ENV_REPORTS.format('e2'))

In [4]:
metrics_summary([e1_reports, e2_reports], ['Multi-env E1 CV', 'Multi-env E2 CV'])

Unnamed: 0,accuracy,precision,recall,f1-score
Multi-env E1 CV,0.878047,0.885003,0.878047,0.878841
Multi-env E2 CV,0.839214,0.84304,0.839214,0.83028


The information presented in the above table corresponds to the one included in the **Table III** (Multi-environment (E1/E2) > This work) of the paper.