# Machine Learning training: general description and parameters

## General principles

**This research develops Machine Learning (ML) algorithms trained on experimentally measured data, using magnetic fields as predictors and elevator's Z-positions as targets. As the general architecture, all ML models are based on supervised deep learning, and use dense and convolutional layers.**

The original magnetic fields $[B_X(t), B_Y(t), B_Z(t)]$ predictors are processed into **time windows** $[t-\Delta t,t]$, which are short timeseries given at a time $t$ with information about the last events during an interval $\Delta t$. This expanded predictors provide the ML algorithm with more information about the latest elevator's behavior and improves its performance (accuracy for predictions), compared to individual magnetic predictors. Each time window predictor, e.g. $B_X([t-\Delta t,t])$ is correlated with a Z-position label $Z(t)$, which represents the position of the elevator in real-time (i.e. the last point of the time window). 

The ML algorithm predicts the Z-position of the elevator based on three simultaneous time windows predictors (one for each magnetic field component) $[B_X(t), B_Y(t), B_Z(t)]$, which we describe as 3 one-dimensional channels. The predictions $Z_{pred}(t)$ are compared with the ground truth labels $Z_{true}(t)$, and the performance of the ML model is evaluated according to two different metrics. This double-metric criteria originates in the nature of the ground truth labels $Z_{true}(t)$: **our experimental measurements for the elevator's position guarantee with absolute certainty the time and position of the elevator when it is parked**, while the intermediate traveling positions are inferred according to either a linear approximation or a theoretical physical model based on the acceleration profile of the elevator. As a consequence, we define the following metrics:

1. **Parking accuracy** $A_{park}$: fraction of correct predictions $Z_{pred}(t)$ compared to the ground truth $Z_{true}(t)$ **only for those times in which the elevator is parked**, within the total number of predictions.
2. **(Approximate) Tracking accuracy** $A_{track}$: fraction of correct predictions $Z_{pred}(t)$ compared to the estimated elevator's Z-position $Z_{track}(t)$, which includes both the parking (certain) and traveling (estimated) events, within the total number of predictions.

For any of these metrics, a single prediction $P_i$ at a time $t_i$ is considered correct if the absolute difference between the predicted position $Z_{pred}(t_i)$ and the reference position $Z_{ref}(t_i)$, which is the ground truth $Z_{gt}(t_i)$ for $A_{park}$ or the approximation $Z_{track}(t_i)$ for $A_{track}$, is smaller than a fixed Z-position threshold $Z_{thres}$:

$$
P_i = \begin{cases}
1, & 
\text{if } |Z_{true}(t_i)-Z_{ref}(t_i)| \leq Z_{thres}
\\
0, &
\text{if } |Z_{true}(t_i)-Z_{ref}(t_i)| > Z_{thres}
\\
\end{cases}
$$

Finally, both accuracy criteria $A_{park}$ and $A_{track}$, represented as $A$, can be formally defined in the same way:

$$
A = \frac{1}{N} \sum^{N}_{i} P_i \, , 
$$

where $N$ is the total number of predictions.

As the accuracy is dependent on the Z-tolerance value, we have to establish a rule for evaluating the ML models. Our choice is to set a value $Z_{thres}=Z_0=1 \text{ m}$ and aim for an accuracy $A[Z_0=1\text{m}] \geq 80\%$, which we justify using the following reasoning:

1. **Adjacent levels are spaced 3.7m apart** (except for the lowest levels 1-2, with a distance of 4.1m), then the maximum position threshold that we can tolerate is $Z_{max}=3.7\text{m}/2=1.85\text{m}$, equal to half separation between adjacent levels. In this way, any correct prediction can determine which is the closest level at a given time. However, this value is just in the uncertainty limit, so we set an arbitrary smaller threshold $\mathbf{Z_{thres}=Z_0=1 \text{ m}}$, which is approximate one quarter of the inter-level separation and **indicates the closest level to the elevator for a correct prediction with great confidence**.

* Our objective is to track the elevator in real-time so we can precisely know its location, with a strong focus on the parking levels. Our measurement time resolution for the magnetic field predictors is 0.1s, which means that we record 10 points per second Compared to the **typical timescale of the elevator, which takes about 3s to move between adjacent levels**, we satisfy the real-time monitoring condition. Asking for an accuracy of 100% or a few points below is unnecessary, since we can tolerate a few incorrect position predictions within a clear movement trend and infer the true trajectory based on the majority of correct predictions. Based on our 10 points per second recording, **we estimate that $\mathbf{A \geq 80\%}$ (in average, 8 out of 10 predictions per second are correct) is more than enough to monitor the elevator's activity with good confidence**.

## Original and Rotated frames

In our ML monitoring application, we rely in magnetic field predictors that are experimentally measured by our diamond magnetometer. **Each predictor is a three-dimensional vector $(B_X,B_Y,B_Z)$, which is a full-information measurment and can be used to reconstruct any magnetic projection in a given direction and also the scalar magnetic field**. However, we must bare in mind that the **predictors are described in a given rotational frame**, and we choose the Laboratory frame as the default rotational frame, with the Z-axis parallel to the elevator axis and X-Y axes squared with the room.

As the **magnetic measurements are convoluted with a strongly directional environmental noise, some magnetic axes can show cleaner information compared to others, although the information carried by the full-vector predictor is always the same**. In our case, the X-axis in the Laboratory frame is much less affected by the noise compared to the Y-Z axes, and a clear correlation can be observed between the $B_X$ magnetic field and the elevator's position $Z(t)$. On the contrary, **the correlation between any magnetic component and $Z(t)$ is unclear in some rotational frames**, in which the noise is coupled more evenly with all magnetic axes.

This rotational perspective brings an interesting point to the analysis: from a human perspective, it's much easier to predict the elevator position from magnetic field predictors when they are described in a "preferential" rotational frame, which allows to read information in a more transparent way. Conversely, **an "unfavorable" rotational frame may hinder the magnetic-position correlations and makes the analysis much harder**. It could be even possible that human intuition guesses that the magnetic fields and the elevator position are two indendient quantities, without any correlation at all.

In contrast with human intuition and a preference for a clear rotational frame, **Machine Learning algorithms should be able to process the full magnetic information given in any rotational frame and establish the correlation with the elevator position if there is any**. This ability would enhance the value of the ML application, as there can be scenarios in which the knowledge of the system under study is limited, and the magnetic information is collected in a random rotational frame, not necessarily the "preferential" one.

**As a result of this discussion, we aim to train a ML algorithm that can be trained with magnetic predictors measured in any rotational frame and achieves a robust performance for the monitoring predictions, meaning similar accuracy results for any rotational frame.**

Technically speaking, this robustness against the choice of rotational frame means that the architecture of the ML model must be carefully designed, so the algorithm can be trained and evaluated with predictors in a any single rotational frame and achieve consistent accuracy results in the position predictions.

As explained before, the Laboratory frame is the one which minimizes the environmental noise in the X-axis. We label this frame as RF1, and we notice that there are $90^{\circ}$ and $180^{\circ}$ rotations that lead to equivalent situations, with either clear X-axis, Y-axis or Z-axis. The interesting challenge for the ML algorithm (as it is for humans) is to work with rotational frames in which the environmental noise is coupled with all three XYZ axes in a similar way, then correlations with the elevator's position become unclear. After studying different rotational frames, we identified two of them as "intermediate" and "hard", meaning the difficulty to establish an intuitive correlation with the elevator position. We describe each rotational frame with a rotation transformation applied to RF1, given by a rotational axis $\vec{n}$ (described in RF1 coordinates) and a (clockwise) rotation angle $\theta$:

* **RF1**: Laboratory Frame ("easy").
* **RF2**: "Hard"; $\vec{n}=(0.41,0.75,0.52)$, $\theta=90^{\circ}$; noise evenly coupled to all Cartesian components.
* **RF3**: "Intermediate"; $\vec{n}=(0.47,0.79,0.39)$, $\theta=120^{\circ}$; noise strongly coupled to $B_X$, mild coupling with $B_Y$ and $B_Z$.

## General information

During the ML training and evaluation stages, we will use general parameters for processing the experimental data, as well as for the deep learning architecture. They will be stored as a dictionary in a .json file and will be accessed in the training notebooks. We describe them below:

### Input data processing

* Our measurement time resolution is 0.1s (spacing within consequtive points within a time window).
* The elevator only travels in the z-direction, from level 1 (basement) to level 8.
* The distance between adjacent floors is 3.7m, except between Levels 0-1, with 4.1m distance.
* Magnetic data was recorded in different segments, one after the other but with time gaps, so they are not continuous.
* For Machine Learning purposes, it's much better to have normalized/standardized data: centered at 0, maximum absolute value 1 (normalized), or standard deviation equal to 1 (standardized). We normalize magnetic values by $1/\sqrt{3}$ of maximum value in the scalar field within the training dataset, then our method is more similar to the standardization:

$$ B_{\rm norm} = \frac{1}{\sqrt{3}} \, \max_{\text{Training}}(|\vec{B}|) $$

### Machine Learning architecture

The ML training is divided in several stages, each of them studying a different aspect to be optimized and carrying the optimal configuration for the next stage.

* Stage 1: Select magnetic components.
* Stage 2: Select time window's length.
* Stage 3: Select ML main architecture.
* Stage 4: Select ML global hyper-parameters.
* Stage 5: Select ML fine architecture.
* Stage 6: Train ML in many rotational frames.
* Stage 7: Train ML using data augmentation.

The following hyper-parameters are constant for any training process, with a slight modification for the final Stage 7:

| Parameter | Stages 1 to 6 | Stage 7 |
| :---: | :---: | :---: |
| Loss function | MAE | MAE |
| Batch size | 512 | 256 |
| Validation fraction | 0.25 | 0.25 |
| Max. Epochs | 200 | 400 |
| Early Stop acting dataset | Validation | Validation |
| Early Stop trigger | No improvement in MAE | No improvement in MAE |
| Early Stop patience | 15 epochs | 30 epochs |
| Early Stop minimum | 30 epochs | 60 epochs |
| Standard $Z_{thres}$ | 1m | 1m |
| Rotational frames | RF1, RF2, RF3 | 64 instances |

In [1]:
# Load general packages:
import json
import numpy as np
import MLQDM.general as ML_general
# Set output folder path:
json_path = 'ML_parameters/' 

In [2]:
# Prepare dictionaries with general parameters, Stages 1 to 6:
gen_pars_S1to6 = {
    "Loss_Function": "mae",
    "Last_Activation_Function": 'linear',
    "Batch_Size": 512,
    "Epochs": 200, 
    "Training_p_val": 0.25,
    "Early_Stop_Monitor": "val_loss",
    "Early_Stop_Min_Delta": 0, # Improvement criteria for early stop, in [m]
    "Early_Stop_Patience": 15,
    "Early_Stop_Start_From_Epoch":30,
    "Early_Stop_Restore_Best_Weights": True,    
    "z_thres": 1, # in [m]
}
# Prepare dictionaries with general parameters, Stage7:
gen_pars_S7 = {
    "Loss_Function": "mae",
    "Last_Activation_Function": 'linear',
    "Batch_Size": 256,
    "Epochs": 400, 
    "Training_p_val": 0.25,
    "Early_Stop_Monitor": "val_loss",
    "Early_Stop_Min_Delta": 0, # Improvement criteria for early stop, in [m]
    "Early_Stop_Patience": 30,
    "Early_Stop_Start_From_Epoch":60,
    "Early_Stop_Restore_Best_Weights": True,    
    "z_thres": 1, # in [m]
}
# Standard rotational frames, format {name: [rot. axis XYZ coordinates, rot. angle in degree, name]}:
standard_RF = {
    'RF1': [None,None,'RF1'], # Original RF
    'RF2': [[0.41,0.75,0.52],90,'RF2'], # Hardest RF
    'RF3': [[0.47,0.79,0.39],120,'RF3'], # Intermediate RF
}

# Convert dictionaries to JSON file:

# ML parameters for Stages 1 to 6:
with open(json_path+"ML_gen_pars_S1to6.json", "w") as outfile: 
    json.dump(gen_pars_S1to6, outfile)
    
# ML parameters for Stage 7:    
with open(json_path+"ML_gen_pars_S7.json", "w") as outfile: 
    json.dump(gen_pars_S7, outfile)

# Standard Rotational frames (Stages 1 to 6):
with open(json_path+"ML_RF_S1to6.json", "w") as outfile: 
    json.dump(standard_RF, outfile)

## Rotational frames for Stage 7

In the final training stage, we want to test the robustness of the ML model against arbitrary choices of rotational frames. Instead of using the three standard rotational frames (RF1, RF2, RF3), we generate 64 instances of random rotational frames. The rotation operation is performed by the function "rotate_3D(vector,n,rot_angle_deg)", which rotates each original magnetic vector $[B_X(t),B_Y(t),B_Z(t)]$ (in the Laboratory frame) around a normalized **n** axis by **rot_angle_deg** degrees. 

A way of visualizing each rotation operation is to picture where the laboratory $Z$-axis is sent, namely the converted $\hat{Z}$-axis. Our interest is to explore many points for the $\hat{Z}$-axis within one hemisphere, let's say $Z>0$, as the other hemisphere is equivalent to a general $(-1)$ multiplier in the magnetic information.

Our approach to cover the $Z>0$ hemisphere is to randomly generate a grid with azimuth $\varphi$ and polar $\theta$ angles, which will define a collection of rotating $\mathbf{\hat{n}}(\varphi,\theta)$ vectors. We will fix the rotating angle as $\alpha=90^\circ$. In this way, any rotation along an axis within the XY plane will take the $Z$-axis to the XY plane ($Z=0$), while any other rotating axis will take the $Z$-axis to the $z>0$ hemisphere.

In order to avoid similar rotations (using too close rotating axes), we will build a grid in which every angle $\theta$ have multiple possibilities for $\varphi$, randomly assigned, separated at least by $20^\circ$. Lower $\theta$ angles will have fewer $\varphi$ associated angles, since rotations close to the $Z$-axis ($\theta \approx 0$) will produce very similar $\hat{Z}$-axes. 

In [3]:
# Define conditions for theta-phi grid:
theta_pp = 10 # Number of points, roughtly equidistant, for theta angles
dphi_dtheta_pp = 10/90 # Number of phi points per theta degree
min_sep_phi = 20 # Minimum degree separation for phi points
rand_angle = 5 # Maximum random angle variation for theta and phi [deg] 

# Generate collection of rotational frames:
RF_S7 = ML_general.prepare_random_rot_frames(
    theta_pp,dphi_dtheta_pp=dphi_dtheta_pp,min_sep_phi=min_sep_phi,rand_angle=rand_angle
    )

# Convert dictionary of [Rotational frames for Stage 7] to JSON file:
with open(json_path+"ML_RF_S7.json", "w") as outfile: 
    json.dump(RF_S7, outfile)