# Practical 2: Features
Before you start this practical, you should follow the the same steps in the practical 1 to annoate your own camera data first.

In [None]:
%pylab inline
import pandas as pd
import numpy as np
from IPython.display import set_matplotlib_formats
from scipy import stats as st
import warnings
warnings.filterwarnings("ignore")

### Loading Data

First we will load the raw accelerometer data stored in your data folder. 

In [None]:
# Define the paths for where you have stored 'myAcc.csv' 
# and the annotations file, change where necesssary.

dataDir= '../data/'
rawPath = dataDir + 'myAcc.csv.gz'
annoPath = dataDir + 'my-annotations.csv'

In [None]:
# read the raw acceleration file.
# if you only have myAcc.csv.gz, you need to run 'gunzip myAcc.csv.gz' in your terminalfirst.
#date_parser = lambda ts: pd.to_datetime([s[:-4] for s in ts])
#raw = pd.read_csv(rawPath, parse_dates=['time'], date_parser=date_parser)
raw = pd.read_csv(rawPath)
raw['time'] = pd.to_datetime(raw['time'].str.split('+').str[0])

# read the annotation file you made.
annoData = pd.read_csv(annoPath, parse_dates=['startTime', 'endTime'], date_parser=date_parser)

## Part 1: Exploration

We first want to explore the raw data we have, a good first step is to plot the acceleration traces. 

### Plot traces

In [None]:
# Plot traces for your first 5 minutes of wearing the accelerometer
timesteps = 100 * 60 * 5 # i.e. 100Hz x 60 seconds/min x 5 mins

set_matplotlib_formats('pdf', 'svg')
pylab.rcParams['figure.figsize'] = (12, 3)

acc = raw[['x', 'y', 'z']][:timesteps]
acc.plot()

### Merging annotations 


We can combine the annotations you made using the browser with the raw acceleration traces by comparing their timestamps. 

In [None]:
raw['time'] = pd.to_datetime(raw['time'].str.split('+').str[0])

In [None]:
raw['annotation'] = 'undefined'

for i, row in annoData.iterrows():
    start, end = row['startTime'].tz_localize(None), row['endTime'].tz_localize(None)
    raw.loc[(raw['time'] > start) & (raw['time'] < end), 'annotation'] = row['annotation']

### Exercise
Can you pick an activity and plot an example acceleration trace for it? 

## Part 2: Preprocessing

In this part, we attempt to process your acceleration traces into typical features used in machine learning. 

<!-- The following exercises consists of dividing your acceleration trace into fixed time windows and creating features out of them.  -->

### 1. Sliding window

Let's start by running a sliding window function over your acceleration time series. We can produce data in windows of 30 seconds.

In [None]:
import preprocessing as pre

freq = 100 # 100Hz
SLIDING_WINDOW_SIZE = 30 * freq # 30 seconds
SLIDING_WINDOW_STEP = SLIDING_WINDOW_SIZE # no overlapping between windows.

cols  = list(raw)
sliding_datasets = {}
slide = pre.sliding_window(raw.values, (SLIDING_WINDOW_SIZE,1),  
                         ss=(SLIDING_WINDOW_STEP,1), flatten=False)

# reshape sliding output to desired shape: (# windows, # features, # window length)
slide = slide.reshape(-1, len(cols), SLIDING_WINDOW_SIZE)

print('Data shape: ', slide.shape)

We can verify that this should be equal to the length of our raw accelerometer trace divided into 30-second windows:

In [None]:
print('Our raw accelerometer trace is {} seconds long, which gives gives {} 30-second windows.'.format(
    int(len(raw)/freq), int(len(raw)/freq/30)))

We now want to extract from each window the sensor time series, timestamp, and annotation.

In [None]:
# sensor readings windows
sensor_cols = [cols.index(i) for i in ['x', 'y', 'z']]
windows = slide[:, sensor_cols, :].astype(np.float)

# timestamp: we use the start time of each window
windows_time = slide[:, cols.index('time'), 0]

# annotation: we use the annotation at the start of each window
windows_anno = slide[:, cols.index('annotation'), 0]

print('Shape of windowed sensor readings: {}'.format(windows.shape))
print('Shape of timestamps: {}, annotations: {}'.format(windows_time.shape, windows_anno.shape))

### Task:
Define windows_anno instead by selecting the _dominant_ annotation within each window, i.e. the label that occurs the most number of times within each 30-second window. 

### 2. Processing into features

A common way to handle acceleration time series is to perform Discrete Fourier Transform (DFT) on the raw data. The transformed waveform will be used in the feature extraction stage.

In [None]:
windows_fft = np.fft.fft(windows)
fftr = windows_fft.real.astype(np.float)

### Extracting features

With the windowed raw and QFT-transformed data prepared, we are ready to extract features from them.

Some commonly used features are listed below. 
1. The mean of the raw signal
2. The standard dev. of the raw signal
3. The median of the transformed signal
4. The interquartiles (Q1) of the transformed signal
5. The interquartiles (Q3) of the transformed signal
6. The skewness of the transformed signal
7. The kurtosis of the transformed signal

### Task
Extract the above feature for your data. Present them as a dataframe with 22 columns (7 features * 3 axis + 1 annotation), and time as its index column, which should look like the following. Save your dataframe as a csv at your 'wearables/data/my_dataset.csv'

<img src="./df.png">

