<a href="https://colab.research.google.com/github/L4ncelot1024/Learn_Deep_Learning_Le_Wagon/blob/main/Day5/Project_A_Time_Series_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

In this project you will build a model for human activity recognition.

Human activity recognition is the problem of classifying sequences of accelerometer data recorded by specialized harnesses or smart phones into known well-defined movements.

In [None]:
  %tensorflow_version 2.x

In [None]:
import pandas as pd
import numpy as np

from tensorflow.keras.utils import to_categorical

# Data

In [None]:
# Download the data
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.zip

# Unzip the data 
!unzip UCI\ HAR\ Dataset.zip

--2021-05-20 21:33:45--  https://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.zip
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 60999314 (58M) [application/x-httpd-php]
Saving to: ‘UCI HAR Dataset.zip.2’


2021-05-20 21:33:46 (73.7 MB/s) - ‘UCI HAR Dataset.zip.2’ saved [60999314/60999314]

Archive:  UCI HAR Dataset.zip
replace UCI HAR Dataset/.DS_Store? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace __MACOSX/UCI HAR Dataset/._.DS_Store? [y]es, [n]o, [A]ll, [N]one, [r]ename: no
replace UCI HAR Dataset/activity_labels.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: no
replace __MACOSX/UCI HAR Dataset/._activity_labels.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: no
replace UCI HAR Dataset/features.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

In [None]:
# list the content of the unzip dataset folder
!mv UCI\ HAR\ Dataset HARDataset

In [None]:
# Helper functions to load the data


# load a single file as a numpy array
def load_file(filepath):
	dataframe = pd.read_csv(filepath, header=None, delim_whitespace=True)
	return dataframe.values


# load a list of files into a 3D array of [samples, timesteps, features]
def load_group(filenames, prefix=''):
	loaded = list()
	for name in filenames:
		data = load_file(prefix + name)
		loaded.append(data)
	# stack group so that features are the 3rd dimension
	loaded = np.dstack(loaded)
	return loaded

# load a dataset group, such as train or test
def load_dataset_group(group, prefix=''):
	filepath = prefix + group + '/Inertial Signals/'
	# load all 9 files as a single array
	filenames = list()
	# total acceleration
	filenames += ['total_acc_x_'+group+'.txt', 'total_acc_y_'+group+'.txt', 'total_acc_z_'+group+'.txt']
	# body acceleration
	filenames += ['body_acc_x_'+group+'.txt', 'body_acc_y_'+group+'.txt', 'body_acc_z_'+group+'.txt']
	# body gyroscope
	filenames += ['body_gyro_x_'+group+'.txt', 'body_gyro_y_'+group+'.txt', 'body_gyro_z_'+group+'.txt']
	# load input data
	X = load_group(filenames, filepath)
	# load class output
	y = load_file(prefix + group + '/y_'+group+'.txt')
	return X, y

# load the dataset, returns train and test X and y elements
def load_dataset(prefix=''):
	# load all train
	trainX, trainy = load_dataset_group('train', prefix + 'HARDataset/')
	print(trainX.shape, trainy.shape)
	# load all test
	testX, testy = load_dataset_group('test', prefix + 'HARDataset/')
	print(testX.shape, testy.shape)
	# zero-offset class values
	trainy = trainy - 1
	testy = testy - 1
	# one hot encode y
	trainy = to_categorical(trainy)
	testy = to_categorical(testy)
	print(trainX.shape, trainy.shape, testX.shape, testy.shape)
	return trainX, trainy, testX, testy

In [None]:
trainX, trainy, testX, testy = load_dataset()

(7352, 128, 9) (7352, 1)
(2947, 128, 9) (2947, 1)
(7352, 128, 9) (7352, 6) (2947, 128, 9) (2947, 6)


There are three main signal types in the raw data: total acceleration, body acceleration, and body gyroscope. Each has 3 axises of data. This means that there are a total of nine variables for each time step.

Further, each series of data has been partitioned into overlapping windows of 2.56 seconds of data, or 128 time steps. These windows of data correspond to the windows of engineered features (rows) in the previous section.

This means that one row of data has (128 * 9), or 1,152 elements.

We provide you with the helper functions to read the data and to build all the elements needed for your project:
- the list of time series of your train and test input, as a `np.ndarray`;
- the list of their corresponding labels

# Time to play around with the data !

Here is a suggested step-by-step workflow to follow to address a modeling project:

* **Display your dataset (visually!)**: plot/print your "observations" - what are observations here ? what are labels ? How to display them ?
* **Task**: Think about your task and how you want to solve it !
* **Modeling**: first, start with a simple baseline to have an end-to-end working model, you'll think later on how to improve it
* **Fit**: Your model might have some hyper-parameters you want to optimize so keep a validation set out of your training data !
* **Assess model performance**: which metrics should I use ?
* **Test your model**: you have a test data set for that !
* **Improve your model**: why my model is not yet good enough? Is it because of under-fitting or because of over-fitting? See below for a list of things you could try, 
* **Challenge your results**: What else could you do ? Which information are you discarding which could be important?
* **Present your results**:
* **Think beyond this project !**: Find out another application, using the same kind of model, which could be interesting, and share it with others !