# **Final Project ML for Time Series**

### **Subject**: *A Symbolic Representation of Time Series, with Implications for Streaming Algorithms*, Jessica Lin, Eamonn Keogh, Stefano Lonardi, Bill Chiu

#### **Authors**: Tom Rossa and Naïl Khelifa

## **Table des Matières**
1. [Introduction](#introduction)
2. [Importation des Bibliothèques et des Données](#importation-des-bibliothèques-et-des-données)
3. [Exploration des Données](#exploration-des-données)
   - [Aperçu des Données](#aperçu-des-données)
   - [Statistiques Descriptives](#statistiques-descriptives)
   - [Visualisation des Données](#visualisation-des-données)
4. [Prétraitement des Données](#prétraitement-des-données)
   - [Gestion des Valeurs Manquantes](#gestion-des-valeurs-manquantes)
   - [Normalisation et Transformation](#normalisation-et-transformation)
   - [Encodage des Variables Catégoriques](#encodage-des-variables-catégoriques)
5. [Optimisation et Tuning des Hyperparamètres](#optimisation-et-tuning-des-hyperparamètres)
6. [Clustering](#clustering)
   - [Hierarchical Clustering](#hierarchical)
   - [Partitional Clustering](#partitional)
   - [Other](#other)
   - [Conclusion clustering](#conclusion-clustering)
7. [Classification](#classification)
   - [Nearest Neighbor Classification](#neighbor)
   - [Decision Tree Classification](#tree)
   - [Other](#other)
   - [Conclusion clustering](#conclusion-classification)
8. [Query by content (indexing)](#indexing)
9. [Other](#other-data-mining)
   - [Anomaly Detection](#anomaly)
   - [Motif discovery](#motif)
   - [Other](#other)
   - [Conclusion clustering](#conclusion-other-data-mining)
10. [Résultats et Interprétation](#résultats-et-interprétation)
11. [Conclusion et Perspectives](#conclusion-et-perspectives)


## **Introduction**

L'objet de ce travail est de reproduire et d'étendre les expériences réalisées dans le papier *A Symbolic Representation of Time Series, with Implications for Streaming Algorithms* (Lin et al.). 

## **Importation des Bibliothèques et des Données**

Plusieurs bases de données sont au coeur des expériences de ce travail et de l'article sur lequel il repose: 
- [*The UCR time series data mining archive*](https://arxiv.org/abs/1810.07758) for the query by content (indexing) and hyper-parameter tuning 
- [*Synthetic Control Chart Time Series*](https://archive.ics.uci.edu/dataset/139/synthetic+control+chart+time+series) for hierarchical clustering and nearest-neighbor classification
- [*Space Shuttle telemetry*](https://ntrs.nasa.gov/citations/19880025321) for partitional clustering 
- [*Cylinder-Bell-Funnel (CBF)*](https://www.timeseriesclassification.com/description.php?Dataset=CBF) for nearest-neighbor classification

In [1]:
import pandas as pd 
import numpy as np
import scipy.stats as stats # for the breakpoints in SAX
import utils 

In [2]:
## loading Control Charts dataset
CC_path = "/Users/badis/MVA_Times_Series_ML_Homeworks-1/Final_project/datasets/control_charts.data" ## (chemin local ?)
cc_df = utils.load_controal_charg_dataset(CC_path)

In [3]:
cc_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,51,52,53,54,55,56,57,58,59,Label
0,28.7812,34.4632,31.3381,31.2834,28.9207,33.7596,25.3969,27.7849,35.2479,27.1159,...,24.5556,33.74310,25.0466,34.93180,34.98790,32.4721,33.3759,25.46520,25.8717,Normal
1,24.8923,25.7410,27.5532,32.8217,27.8789,31.5926,31.4861,35.5469,27.9516,31.6595,...,31.0205,26.64180,28.4331,33.65640,26.42440,28.4661,34.2484,32.10050,26.6910,Normal
2,31.3987,30.6316,26.3983,24.2905,27.8613,28.5491,24.9717,32.4358,25.2239,27.3068,...,26.5966,25.53870,32.5434,25.57720,29.98970,31.3510,33.9002,29.54460,29.3430,Normal
3,25.7740,30.5262,35.4209,25.6033,27.9700,25.2702,28.1320,29.4268,31.4549,27.3200,...,28.7261,28.29790,31.5787,34.61560,32.54920,30.9827,24.8938,27.36590,25.3069,Normal
4,27.1798,29.2498,33.6928,25.6264,24.6555,28.9446,35.7980,34.9446,24.5596,34.2366,...,27.9601,35.71980,27.5760,35.33750,29.99930,34.2149,33.1276,31.10570,31.0179,Normal
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
595,29.6254,25.5034,31.5978,31.4663,33.5488,28.2935,28.9244,30.6922,25.3301,26.8728,...,18.8795,21.33210,23.6915,22.30970,19.13610,15.2851,22.5278,20.65720,24.1289,Downward shift
596,27.4144,25.3973,26.4600,31.9782,26.1251,27.4629,30.4888,34.9292,27.5580,30.6863,...,11.4546,16.88800,18.2691,11.58310,14.11760,20.2289,11.1314,9.98019,10.7201,Downward shift
597,35.8990,26.6719,34.1911,35.8270,25.1009,24.8564,25.8141,30.6301,34.2124,32.5874,...,16.0021,15.28790,16.9459,17.53380,16.84640,16.5460,15.9268,18.08430,17.4747,Downward shift
598,24.5383,24.2802,28.2814,27.1316,26.6623,32.1100,32.8100,30.4829,35.8586,25.3866,...,11.5238,15.41850,12.6699,13.11640,8.23496,12.0419,19.3096,12.99850,17.4599,Downward shift
