# DEMO

This demo is designed to give the user a quick tour over the software's funcionalities. Below is a list of all the things one could see in this demo:
 - Downloading a dataset of 2000 multivariate time series (mvts) instances.
 - Getting some basic statistics about your data.
 - Extracting a list of statistical features from the mvts instances.
 - ...

In [1]:
import os
import yaml
from data_retriever import DataRetriever  # for downloading data
import CONSTANTS as CONST

## Download the Dataset
There is a dictionary of datasets available in `./datasets_configs.yml`. Let's download the one with id `1`.

In [2]:
dr = DataRetriever(1)

In [3]:
print('URL:\t\t{}'.format(dr.dataset_url))
print('NAME:\t\t{}'.format(dr.dataset_name))
print('TYPE:\t\t{}'.format(dr.get_compression_type()))
print('SIZE:\t\t{}'.format(dr.get_total_size()))

URL:		https://bitbucket.org/gsudmlab/mvtsdata_toolkit/downloads/petdataset_01.zip
NAME:		petdataset_01.zip
TYPE:		application/zip
SIZE:		32M


Ready to download? (This may take a few seconds, depending on your internet bandwidth.)

In [4]:
where_to = './xxx'
dr.retrieve(target_path = where_to)

Extracting: 100%|██████████| 2001/2001 [00:01<00:00, 1293.42it/s]


OK. Let's see how many files are available to us now:

In [5]:
dr.get_total_number_of_files()

2000

## MVTS Data Analysis
Using `data_analysis.mvts_data_analysis.py` we can get an idea about the dataset we are going to work with. We start with creating an instance of a `MVTSDataAnalysis`.

In [6]:
from data_analysis.mvts_data_analysis import MVTSDataAnalysis
path_to_config = './configs/feature_extraction_configs.yml'
mvda = MVTSDataAnalysis(path_to_config)
mvda.print_stat_of_directory()

----------------------------------------
Directory:					/home/azim/CODES/PyWorkspace/mvtsdata_toolkit/data/petdataset_01/
Total number of mvts files:	2000
Total size:					453M
Total average:				232K
----------------------------------------


Let's now get some stats from the content of the files. Without loss of generality, we analyze only 3 parameters (namely `'TOTUSJH'`, `'TOTBSQ'`, and `'TOTPOT'`), and only on the first 50 mvts files.

In [7]:
params = ['TOTUSJH', 'TOTBSQ', 'TOTPOT']
n = 50
mvda.compute_summary(parameters_list=params, first_k=n)

-->	[50/50] 		 File: lab[C]1.0@7869_id[3836]_st[2014-03-17T13:24:00]_et[2014-03-18T01:12:00].csv

In [8]:
mvda.print_summary()

  Parameter-Name Count Null-Count           min          25th          50th          75th           max
0        TOTUSJH  3000          0  3.494185e+00  3.398446e+01  8.002865e+01  3.650795e+02  3.162777e+03
1         TOTBSQ  3000          0  1.983268e+07  2.536142e+08  6.862917e+08  7.454094e+09  3.848284e+10
2         TOTPOT  3000          0  1.205181e+20  1.853174e+21  4.816468e+21  3.869989e+22  7.108347e+23


Which says the length of the time series, across the 50 mvts is 3000, with no `NA/NAN` or missing values. In addition, `min`, `max`, and three quantiles are calculated for each time series.

## Feature Extraction
Now that we have an idea about our raw data, let's extract some features from the data. A list of ~50 features are implemented in `features.feature_collection.py`. Let's take a look.

In [9]:
import features.feature_collection as fc
help(fc)

Help on module features.feature_collection in features:

NAME
    features.feature_collection

FUNCTIONS
    get_average_absolute_change(uni_ts:pandas.core.series.Series) -> float
        :return: the average absolute first difference of a univariate time series.
    
    get_average_absolute_derivative_change(uni_ts:pandas.core.series.Series) -> float
        :return: the average absolute first difference of a derivative of univariate time series.
    
    get_avg_mono_decrease_slope(uni_ts:pandas.core.series.Series) -> float
        :return: the average slope of monotonically decreasing segments.
    
    get_avg_mono_increase_slope(uni_ts:pandas.core.series.Series) -> float
        :return: the average slope of monotonically increasing segments.
    
    get_dderivative_kurtosis(uni_ts:pandas.core.series.Series, step_size:int=1) -> float
        :return: the kurtosis of the difference derivative of univariate time series within the
                 function we use step_size to find 

Time to extract a few features now.

In [10]:
from features.feature_extractor import FeatureExtractor

fe = FeatureExtractor(path_to_config)
fe.calculate_all(params_index_list=[5, 6, 7, 8], first_k= 50)
fe.store_extracted_features('extracted_features.csv')

/home/azim/CODES/PyWorkspace/mvtsdata_toolkit/data/petdataset_01/


	-----------------------------------
		Total No. of Features:		43
		Total No. of time series:	50
		Output TS dimensionality (50 X 43):	2150
	-----------------------------------

	 >>> Total Processed: 50 / 50 <<<
	50 files have been processed.
	As a result, a dataframe of dimension 50 X 176 is created.

	The dataframe is stored at: /home/azim/CODES/PyWorkspace/mvtsdata_toolkit/data/extracted_features/extracted_features.csv


In [12]:
fe.df_all_features.shape

(50, 176)