## Exporting and labeling the series

After identification of healthy-unhealthy intervals, we are left with two lists of time series with variable length.

To deal with this variable length, we present our different approaches.

### The Exporter class

Initialize it with the desired unit length of time series (by default 300), and choose a stride so that we don't save time series that are two similar (by default 10).

- The `simple_export_ts` function only takes for each interval the first time series available ( `ts.ix[:self.unit_ts_length]` ) and doesn't take into account the stride and padding arguments.
- The `export_ts` function however, creates the full dataset. And takes padding and strides into account.
    - **Note** : we don't much need padding if we take care of it in the split_healthy_unhealthy function (0.1,0.95)
    
### Raw Time Series

We might want to do the processing of features afterwards, so lets save it all as a numpy array with different row width for each series.

In a separate file, we will put the labels.

In [1]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

In [None]:
import csv
from observation import Observation

prefixes = [site+tranche for site in ["A","B","C","D","E","F","G","H"] for tranche in ["1","2"]]
suffix = "DEB1-1"
fnames = [prefix+"-"+suffix+".txt" for prefix in prefixes]
PATH = "../../Data/GMPP_IRSDI/"
export_PATH = "../exported-datasets/"

observations = [Observation(PATH,[fname],[tag],format="%Y-%m-%dT%H:%M:%S.000Z",ncol=2) for fname,tag in zip(fnames,prefixes)]
healthy=(0.05, 0.4)
unhealthy=(0.6, 0.95)

for observation in observations:
    healthy_ts, unhealthy_ts = observation.split_healthy_unhealthy(healthy=healthy,unhealthy=unhealthy)
    healthy_ts = [ts.values.ravel() for ts in healthy_ts]
    unhealthy_ts = [ts.values.ravel() for ts in unhealthy_ts]
    all_ts = all_ts + healthy_ts + unhealthy_ts
    labels = np.concatenate([labels,np.zeros(len(healthy_ts)),np.ones(len(unhealthy_ts))])
    assert len(all_ts)==len(labels), "Number of ts different from number of labels"

Save the time series and the labels

In [None]:
with open(export_PATH+"values_"+suffix, "w") as test_f:
    test_writer = csv.writer(test_f)
    for ts in all_ts:
        test_writer.writerow(ts)
np.save(export_PATH+"labels_"+suffix+".npy",labels)

# with open(export_PATH+"values_"+suffix, "r") as test_f:
#     all_ts_bis = [np.array(series, dtype=np.float64) for series in list(csv.reader(test_f))]
# labels_bis = np.load('test_labels.npy')