# 3rd Group Presentation

## Reminder of The Project

The Idea behind this project is to study techniques from fourier analysis / digital signal processing in conjunction with machine learning methods in order to provide a good classification framework for time series based signals comming from accelerometers with suspected periodic behaviour.

The original context of classification was to detect changes in between two classes or states of the underlying subject from which the time series metrics originated.

## Data sets involved

### http://archive.ics.uci.edu/ml/datasets/Daily+and+Sports+Activities

Constitutes of 19 different sport/ daily activities. In which 25hz samples are obtainned from accelerometers, gyroscopes and magnetometers attached to LA, RA, T, LL, RL. 5 seconds per task 8 different people involved in this study a total of 
9120 data points. I have worked on this dataset so far planning on working with the others as soon as I finsih some aspects of my framework which need building/optimizing.

###  http://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions

This dataset is measured from the accelerometers and gyroscopes in smartphones. Class labels are 3 postures and 3 activities and the number of instances 10929. The advantage of this data set is that it is more timeseries in the sense that the samples contain the transitions from one activity to another which matches more the original data set and proposed problem.

### IceRobotics data set

There was a problem obtaining the dataset with 800 instances due to ownership issues involved in the work carried out on the dataset. This was a major set back. What the company proposed was to give a smaller dataset of ~ 100 samples but due to the small statistical value in this size I decided to work on other similar datasets and then later on if there is time test things out with this one.



## Choice of Technologoes and Frameworks

I played around with scikit learn and lasagne and quickly discovered that they have some limitations and can be quite sensitive when doing changes such as pre initliazing the weight matrix. So I decided to work the the framework we built in Machine learning practical since I worked on it for a very long period of time during the holidays and know it rather well.


In [2]:
from copy import deepcopy
from mlp.layers import MLP, Linear, Sigmoid, Softmax #import required layer types
from mlp.layers import * 
from mlp.optimisers import SGDOptimiser #import the optimiser

from mlp.costs import CECost, MSECost #import the cost we want to use for optimisation
from mlp.schedulers import LearningRateFixed
import numpy
import logging
from mlp.dataset import *


logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger.info('Initialising data providers...')

train_dp = MACLDataProvider(dset='train', batch_size=100, max_num_batches=-10, randomize=True)
valid_dp = MACLDataProvider(dset='valid', batch_size=1824, max_num_batches=1, randomize=False)



logger = logging.getLogger()
logger.setLevel(logging.INFO)
rng = numpy.random.RandomState([2015,10,10])


INFO:root:Initialising data providers...


(7296, 45, 125) (7296,)
(7296, 125)
(1824, 45, 125) (1824,)
(1824, 125)


## Baseline Experiment

The following experiment uses the right arm y-reading of the accelerometer as its input space on a double hidden layer one linear and one relu activation function based neural network. The choice of hidden units and topology of this baseline experiment was chosen to asses the performance of a fourier transform on the input space vs the original input space.

In [2]:
#some hyper-parameters
nhid = 100
learning_rate = 0.01
max_epochs = 200
cost = CECost()
    
stats = list()

test_dp = deepcopy(valid_dp)
train_dp.reset()
valid_dp.reset()
test_dp.reset()

# NETWORK TOPOLOGY:
model = MLP(cost=cost)
model.add_layer(Relu(idim=125, odim=125, irange=1.6, rng=rng))
model.add_layer(Softmax(idim=125, odim=19, rng=rng))

# define the optimiser, here stochasitc gradient descent
# with fixed learning rate and max_epochs
lr_scheduler = LearningRateFixed(learning_rate=learning_rate, max_epochs=max_epochs)
optimiser = SGDOptimiser(lr_scheduler=lr_scheduler)

logger.info('Training started...')
tr_stats_b, valid_stats_b = optimiser.train(model, train_dp, valid_dp)

logger.info('Testing the model on test set:')

tst_cost, tst_accuracy = optimiser.validate(model,test_dp )
logger.info('ACL test set accuracy is %.2f %%, cost (%s) is %.3f'%(tst_accuracy*100., cost.get_name(), tst_cost))



INFO:root:Training started...
INFO:mlp.optimisers:Epoch 0: Training cost (ce) for initial model is 16.038. Accuracy is 5.21%
INFO:mlp.optimisers:Epoch 0: Validation cost (ce) for initial model is 15.947. Accuracy is 5.21%
INFO:mlp.optimisers:Epoch 1: Training cost (ce) is 16.616. Accuracy is 14.22%
INFO:mlp.optimisers:Epoch 1: Validation cost (ce) is 6.826. Accuracy is 18.42%
INFO:mlp.optimisers:Epoch 1: Took 0 seconds. Training speed 45006 pps. Validation speed 182500 pps.
INFO:mlp.optimisers:Epoch 2: Training cost (ce) is 3.880. Accuracy is 21.50%
INFO:mlp.optimisers:Epoch 2: Validation cost (ce) is 3.121. Accuracy is 22.75%
INFO:mlp.optimisers:Epoch 2: Took 0 seconds. Training speed 45006 pps. Validation speed 91250 pps.
INFO:mlp.optimisers:Epoch 3: Training cost (ce) is 2.618. Accuracy is 25.68%
INFO:mlp.optimisers:Epoch 3: Validation cost (ce) is 2.544. Accuracy is 25.88%
INFO:mlp.optimisers:Epoch 3: Took 0 seconds. Training speed 48007 pps. Validation speed 91250 pps.
INFO:mlp.op

In [None]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

a = plt.plot(np.array(tr_stats)[:,1])
b = plt.plot(np.array(valid_stats)[:,1])
plt.legend([a[0], b[0]], ["train", "valid"], loc=3)
plt.xlabel("Epoch");
plt.ylabel("Accuracy");

In [None]:
stats1  = [(tr_stats, valid_stats, (tst_cost, tst_accuracy))]
me=78
plot_stats(stats1,'val cost vs train cost',
           shds=["basline"], corr=('val_acc', 'train_acc') ,max_epochs=me, figsize= (15,10))



## Fourier Layer

Instead of transforming the data in to fourier via fft or some other dft scipy function I decided to do this by creating a neural network layer whose weight matrix is initliazed to the DFT matrix  

![](https://raw.githubusercontent.com/franciscovargas/MLPHonoursExtension/master/1.png)

Where :

\begin{equation}
\omega_{j+1k+1}= e^{\frac{-2\pi i*j*k}{2}} \quad j,k \in \{0,1\}
\end{equation}


\begin{equation}
\beta_{1} = \beta_{2} = 0
\end{equation}

\begin{equation}
x_{0} = 1
\end{equation}
In our case example above where the input and the fourier space live in $\mathbb{R}^2$

### Advantages

* Due to the parallel nature of the neural network layer this structure allows for an efficient GPU computation of a fourier transform;
* One can extend the backpropagation algorithm to update the weights in the fourier layer which is a way of relaxing this transformation to the data. 

### Modification to the Network Architecture Due to Complex numbers

Since $f_{i} \in \mathbb{C}$ we need to represent them in a real form such that we can effectively carry out backwards and forward propagation. Taking the square magnitude of it makes the gradients rather ugly thus the easiest way is to map the complex number to a vector in the following form:

\begin{equation}
 x + iy \rightarrow (x,y)
\end{equation}

If we wish to represent this in a network architecture we need to double the number of neurons : 

![](https://raw.githubusercontent.com/franciscovargas/MLPHonoursExtension/master/2.png)

We do the mapping shown above for each element in the weight matrix and this is why we get double the number of neurons in the hidden fourier layer.


### Normalization Attempts

Neural networks are very sensitive to initializing the weights at particular values. Thus normalizing the DFT weight vectors to give equal mean squared magnitude to the DFT components was attempted in the following manner (normalizing factor for the $j^{th}$ fourier weight vector):

\begin{equation}
E[(XW_{j})^{2}] = \frac{1}{d \cdot n} \sum_{k=1}^{n}\sum_{i=1}^{d}(w_{jd} \cdot X_{kd})^{2}
\end{equation}

Where $n$ is the number of instances in the training set and $d$ is the dimensionality.
Dividing each fourier weight vector by the corresponding normalization factor ensures that that the means square sum of the weight matrix apllied to the inputs is equal to 1. 

This however yields lower accuracies.


## Bogus Complex gradients experiment:

In [3]:
#some hyper-parameters
nhid = 100
learning_rate = 0.01
max_epochs = 200
cost = CECost()
    
stats = list()

test_dp = deepcopy(valid_dp)
train_dp.reset()
valid_dp.reset()
test_dp.reset()

# Network topolpgy
model = MLP(cost=cost)
model.add_layer(ComplexRelu(idim=125, odim=125, irange=1.6, rng=rng))
model.add_layer(Softmax(idim=125*2, odim=19, rng=rng))

lr_scheduler = LearningRateFixed(learning_rate=learning_rate, max_epochs=max_epochs)
optimiser = SGDOptimiser(lr_scheduler=lr_scheduler)

logger.info('Training started...')
tr_stats_fr, valid_stats_fr = optimiser.train(model, train_dp, valid_dp, fft=False)

logger.info('Testing the model on test set:')

tst_cost_fr, tst_accuracy_fr = optimiser.validate(model,test_dp )
logger.info('ACL test set accuracy is %.2f %%, cost (%s) is %.3f'%(tst_accuracy_fr*100., cost.get_name(), tst_cost_fr))



INFO:root:Training started...
INFO:mlp.optimisers:Epoch 0: Training cost (ce) for initial model is 7.978. Accuracy is 5.01%
INFO:mlp.optimisers:Epoch 0: Validation cost (ce) for initial model is 8.063. Accuracy is 4.61%
INFO:mlp.optimisers:Epoch 1: Training cost (ce) is 4.206. Accuracy is 24.33%
INFO:mlp.optimisers:Epoch 1: Validation cost (ce) is 2.765. Accuracy is 34.10%
INFO:mlp.optimisers:Epoch 1: Took 0 seconds. Training speed 16002 pps. Validation speed 60833 pps.
INFO:mlp.optimisers:Epoch 2: Training cost (ce) is 2.278. Accuracy is 39.42%
INFO:mlp.optimisers:Epoch 2: Validation cost (ce) is 2.388. Accuracy is 39.64%
INFO:mlp.optimisers:Epoch 2: Took 0 seconds. Training speed 16002 pps. Validation speed 60833 pps.
INFO:mlp.optimisers:Epoch 3: Training cost (ce) is 1.834. Accuracy is 46.39%
INFO:mlp.optimisers:Epoch 3: Validation cost (ce) is 2.045. Accuracy is 41.28%
INFO:mlp.optimisers:Epoch 3: Took 0 seconds. Training speed 15654 pps. Validation speed 60833 pps.
INFO:mlp.optimi

In [None]:
a = plt.plot(np.array(tr_stats_fr)[:,1])
b = plt.plot(np.array(valid_stats_fr)[:,1])
plt.legend([a[0], b[0]], ["train", "valid"], loc=3)
plt.xlabel("Epoch");
plt.ylabel("Accuracy");

In [None]:
statsfr  = [(tr_stats_fr, valid_stats_fr, (tst_cost_fr, tst_accuracy_fr))]
me=78
plot_stats(statsfr,'val cost vs train cost',
           shds=["basline"], corr=('val_acc', 'train_acc') ,max_epochs=me, figsize= (15,10))


## Simple DFT Experiment

In [3]:
#some hyper-parameters
nhid = 100
learning_rate =0.005
max_epochs = 1000

cost = CECost()    
stats = list()

test_dp = deepcopy(valid_dp)
train_dp.reset()
valid_dp.reset()
test_dp.reset()

#define the model
model = MLP(cost=cost)
model.add_layer(DFTPLinear(idim=125, odim=125, irange=1.6, rng=rng))
# Every activation function from dft layer produces two values (x,y) for x+iy
model.add_layer(Relu(idim=125, odim=125, irange=1.6, rng=rng))
model.add_layer(Softmax(idim=125, odim=19, rng=rng))

# define the optimiser, here stochasitc gradient descent
# with fixed learning rate and max_epochs
lr_scheduler = LearningRateFixed(learning_rate=learning_rate, max_epochs=max_epochs)
optimiser = SGDOptimiser(lr_scheduler=lr_scheduler)

logger.info('Training started...')
tr_stats_f, valid_stats_f = optimiser.train(model, train_dp, valid_dp)

logger.info('Testing the model on test set:')

tst_costf, tst_accuracyf = optimiser.validate(model,test_dp )
logger.info('ACL test set accuracy is %.2f %%, cost (%s) is %.3f'%
            (tst_accuracyf*100., cost.get_name(), tst_costf))

INFO:root:Training started...
INFO:mlp.optimisers:Epoch 0: Training cost (ce) for initial model is 24.349. Accuracy is 5.11%
INFO:mlp.optimisers:Epoch 0: Validation cost (ce) for initial model is 24.198. Accuracy is 5.54%
INFO:mlp.optimisers:Epoch 1: Training cost (ce) is 7.493. Accuracy is 22.81%
INFO:mlp.optimisers:Epoch 1: Validation cost (ce) is 2.306. Accuracy is 32.62%
INFO:mlp.optimisers:Epoch 1: Took 0 seconds. Training speed 20003 pps. Validation speed 26071 pps.
INFO:mlp.optimisers:Epoch 2: Training cost (ce) is 2.066. Accuracy is 36.88%
INFO:mlp.optimisers:Epoch 2: Validation cost (ce) is 2.006. Accuracy is 35.96%
INFO:mlp.optimisers:Epoch 2: Took 1 seconds. Training speed 15002 pps. Validation speed 22812 pps.
INFO:mlp.optimisers:Epoch 3: Training cost (ce) is 1.827. Accuracy is 40.61%
INFO:mlp.optimisers:Epoch 3: Validation cost (ce) is 1.933. Accuracy is 36.62%
INFO:mlp.optimisers:Epoch 3: Took 0 seconds. Training speed 18464 pps. Validation speed 36500 pps.
INFO:mlp.opti

# Spectral Autoencoder

Here we learn the spectrum via pretraining and use it as a natural method of relaxing the weights of the spectral gradients. We base this on the universal approximation theorem.

In [5]:
#some hyper-parameters
# regular autoencoder setup
nhid = 100
learning_rate = 0.0001
max_epochs = 200

cost = CECost()    
stats = list()

test_dp = deepcopy(valid_dp)
train_dp.reset()
valid_dp.reset()
test_dp.reset()

#define the model
model = MLP(cost=cost)
model.add_layer(Sigmoid(idim=125, odim=125, irange=1.6, rng=rng))
model.add_layer(Sigmoid(idim=125, odim=125, irange=1.6, rng=rng))
model.add_layer(Softmax(idim=125, odim=19, rng=rng))

lr_scheduler = LearningRateFixed(learning_rate=learning_rate, max_epochs=max_epochs)
optimiser = SGDOptimiser(lr_scheduler=lr_scheduler)

mod, tr_stats_fr, valid_stats_fr = optimiser.pretrain(model, train_dp, valid_dp)

INFO:mlp.optimisers:Epoch 0: PreTraining cost (ce) for initial model is 2828.180. Accuracy is 0.81%
INFO:mlp.optimisers:Epoch 0: PreValidation cost (ce) for initial model is 2749.135. Accuracy is 0.88%
INFO:mlp.optimisers:Epoch 1: PreTraining cost (ce) is 2220.776. Accuracy is 0.81%
INFO:mlp.optimisers:Epoch 1: PreValidation cost (ce) is 1625.573. Accuracy is 0.82%
INFO:mlp.optimisers:Epoch 1: Took 0 seconds. PreTraining speed 26670 pps. Validation speed 45625 pps.
INFO:mlp.optimisers:Epoch 2: PreTraining cost (ce) is 1443.598. Accuracy is 0.78%
INFO:mlp.optimisers:Epoch 2: PreValidation cost (ce) is 1189.469. Accuracy is 0.60%
INFO:mlp.optimisers:Epoch 2: Took 0 seconds. PreTraining speed 31309 pps. Validation speed 91250 pps.
INFO:mlp.optimisers:Epoch 3: PreTraining cost (ce) is 1185.499. Accuracy is 0.61%
INFO:mlp.optimisers:Epoch 3: PreValidation cost (ce) is 1057.614. Accuracy is 0.71%
INFO:mlp.optimisers:Epoch 3: Took 0 seconds. PreTraining speed 31309 pps. Validation speed 60833

3


KeyboardInterrupt: 

In [14]:
#some hyper-parameters
# regular autoencoder setup
nhid = 100
learning_rate = 0.00001
# model_out.cost.get_name()
max_epochs = 500000

cost = CECost()    
stats = list()

test_dp = deepcopy(valid_dp)
train_dp.reset()
valid_dp.reset()
test_dp.reset()

#define the model
model = MLP(cost=cost)
model.add_layer(Sigmoid(idim=125, odim=125, irange=1.6, rng=rng))
model.add_layer(Sigmoid(idim=125, odim=125, irange=1.6, rng=rng))
model.add_layer(Softmax(idim=125, odim=19, rng=rng))

lr_scheduler = LearningRateFixed(learning_rate=learning_rate, max_epochs=max_epochs)
optimiser = SGDOptimiser(lr_scheduler=lr_scheduler)

mod, tr_stats_fr, valid_stats_fr = optimiser.spretrain(model, train_dp)
# model_out.cost.get_name()
# 123251

INFO:mlp.optimisers:Epoch 0: SpecPreTraining cost (mse) for initial model is 2879.289. Accuracy is 0.78%
INFO:mlp.optimisers:Epoch 1: PreTraining cost (mse) is 345792.200. Accuracy is 93.71%
INFO:mlp.optimisers:Epoch 2: PreTraining cost (mse) is 322435.919. Accuracy is 98.01%
INFO:mlp.optimisers:Epoch 3: PreTraining cost (mse) is 299123.692. Accuracy is 99.19%
INFO:mlp.optimisers:Epoch 4: PreTraining cost (mse) is 277564.811. Accuracy is 99.19%
INFO:mlp.optimisers:Epoch 5: PreTraining cost (mse) is 258249.406. Accuracy is 99.19%
INFO:mlp.optimisers:Epoch 6: PreTraining cost (mse) is 240936.351. Accuracy is 99.19%
INFO:mlp.optimisers:Epoch 7: PreTraining cost (mse) is 225790.035. Accuracy is 99.19%
INFO:mlp.optimisers:Epoch 8: PreTraining cost (mse) is 212559.300. Accuracy is 99.19%
INFO:mlp.optimisers:Epoch 9: PreTraining cost (mse) is 200963.493. Accuracy is 99.19%
INFO:mlp.optimisers:Epoch 10: PreTraining cost (mse) is 190773.544. Accuracy is 99.19%
INFO:mlp.optimisers:Epoch 11: PreT

KeyboardInterrupt: 

In [None]:
#some hyper-parameters
nhid = 100
learning_rate = 0.01
max_epochs = 78
cost = CECost()
    
stats = list()

test_dp = deepcopy(valid_dp)
train_dp.reset()
valid_dp.reset()
test_dp.reset()

#define the model
model = MLP(cost=cost)
model.add_layer(DFTLinear(idim=125, odim=125, irange=1.6, rng=rng))
#Every activation function from dft layer produces two values (x,y) for x+iy
model.add_layer(Relu(idim=125*2, odim=125, irange=1.6, rng=rng))
model.add_layer(Softmax(idim=125, odim=19, rng=rng))

# define the optimiser, here stochasitc gradient descent
# with fixed learning rate and max_epochs
lr_scheduler = LearningRateFixed(learning_rate=learning_rate, max_epochs=max_epochs)
optimiser = SGDOptimiser(lr_scheduler=lr_scheduler)

logger.info('Training started...')
tr_stats_f, valid_stats_f = optimiser.train(model, train_dp, valid_dp)

logger.info('Testing the model on test set:')

tst_costf, tst_accuracyf = optimiser.validate(model,test_dp )
logger.info('ACL test set accuracy is %.2f %%, cost (%s) is %.3f'%(tst_accuracyf*100., cost.get_name(), tst_costf))

In [None]:
#some hyper-parameters
nhid = 100
learning_rate = 0.01
max_epochs = 78
cost = CECost()
    
stats = list()

test_dp = deepcopy(valid_dp)
train_dp.reset()
valid_dp.reset()
test_dp.reset()


#define the model
model = MLP(cost=cost)
model.add_layer(DFTAugLinear(idim=125, odim=125, irange=1.6, rng=rng))
model.add_layer(Relu(idim=125*3, odim=125, irange=1.6, rng=rng))
model.add_layer(Softmax(idim=125, odim=19, rng=rng))

# define the optimiser, here stochasitc gradient descent
# with fixed learning rate and max_epochs
lr_scheduler = LearningRateFixed(learning_rate=learning_rate, max_epochs=max_epochs)
optimiser = SGDOptimiser(lr_scheduler=lr_scheduler)

logger.info('Training started...')
tr_stats, valid_stats = optimiser.train(model, train_dp, valid_dp)

logger.info('Testing the model on test set:')

tst_cost, tst_accuracy = optimiser.validate(model,test_dp )
logger.info('MNIST test set accuracy is %.2f %%, cost (%s) is %.3f'%(tst_accuracy*100., cost.get_name(), tst_cost))

## Assesing Performance of Devices (Shapley Value)

Since the data set has a 5 different positioning of 3 different devices all with 3 different readings it provides a high combination all together there is space to study how devices cooperate with each other.


The Shapely value from game theory is a useful metric to study how players cooperate in a particular game. Seeing the devices as players this value provides a way of distributing the total gain in the classification process (game) to every device involved.

So far the implementation I have for this is slow and needs some work. I managed to test it on the right hand device where each player was an axes of the device. This small experiment yielded that the y-axis was the one with the highest gain. Which is something one would expect.

## Next Steps

* Making well supported conclusions on the fourier transform based attempts. Need more supporting plots and test on the other data set;
* Use standard pretraining techniques and other methods to otpimize both baselines and the fourier network;
* Apply certain timeseries models (ARMA, ARIMA, ...) for feature extraction purposes and contrast against a convet;
* STFT Layer in a convnet;
* Extend Shapley value to look at devices as players in the classification process;
* Look in to seasonal trends in timeseries and at the SANN network architecture;


## STFT arch

![](https://raw.githubusercontent.com/franciscovargas/MLPHonoursExtension/master/3.png)

### Motivation

For dynamic processes that change in time it sometimes makes sense to window the fourier transform of subsegments since the periodicity may also evolve in time and thus a spectogram becomes a good candiate for a representation in the frequency domain.