# Timeseries
## Instructions:
* Go through the notebook and complete the tasks. 
* Make sure you understand the examples given. If you need help, refer to the Essential readings or the documentation link provided, or go to the Topic 8 discussion forum. 
* Save your notebooks when you are done.

This notebook introduces you to the Pandas library, which is useful for when dealing with timeseries in python. For further details on Pandas, have a read through the Getting Started guide: https://pandas.pydata.org/pandas-docs/stable/. The '10 minutes to Pandas' guide provides a good introduction to the library: https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html


**Task 1:**

Load in some time series data (you can use the artificial 'sit_stand_walk.csv' and plot it alongside it's ground truth


In [None]:
# You just need to run this cell

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('sit_stand_walk.csv')
# The original data is sampled at 5Hz (period of 200ms). Let's say it starts on 1-1-2020 at 10am
data.index = pd.bdate_range(start='01-01-2020 10:00:00', periods=len(data), freq='200L')

series = data['acc_y']
ground = data['ground']
classes = { c:i for i,c in enumerate(np.unique(ground))} 
print('class map: %s' % classes)
# replace character ground truth labels with integers
ground.replace(classes, inplace=True)

series.plot(linestyle='-', color='k')
plt.legend()
plt.show()


**Task 2:**
Create a sliding (rolling) window over the data to extract features like mean and standard deviation. Use the pandas function, ```rolling``` and experiment with different window lengths.


In [None]:
F = pd.DataFrame()
wnd_samples = 10 # samples
F['mean'] = series.rolling(window=wnd_samples).mean()
#F['std'] = # Your code here

# We might need to fill in any NaN values left over from the rolling window (use .fillna(..))
# Your code here

F.plot()
plt.show()

**Task 3:** 
Use these features to build a simple classifier (e.g. using ```sklearn```). Split the dataset in half to simulate separate training and test sets. But do not randomly mix the data, as the sequencing information is important.



In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report

# create indices for test data and train data -- cast these into timestamps
test_idx = F.index[np.arange(0,int(0.5*len(F)))]

# ...Your code here

# Create a simple decision tree, and fit the training data to it

# ...Your code here

# Generate predictions and plot them against the ground truth

# ...Your code here

# Also generate precision and recall values

# ... Your code here

**Task 4:** Now go back to the original data (without features extracted) and attempt to train a similar classifier directly on that data (rememeber to split the test and training sets). Plot the prediction output alongside the ground truth. And also run a precision-recall evaluation. 

In [None]:

X_original_test = pd.DataFrame(series[test_idx])
X_original_train = pd.DataFrame(series[train_idx])

# ... Your code here


**Task 5:**
Compare the results from both of your classifiers -- the one using the original raw data, and the one using the sliding-window based features. What do you notice about the differences in precision-recall? And if you plot the outputs, what do you notice about the temporal distribution of the errors? Which method would seem more appropriate for a classifier that aims to robustly detect sitting, standing, and walking? 

Use the discussion forum for this topic discuss your answers to this. 