# Sequence classification with Neural Networks
## Part 1: Data generation methodology

### Input data
Recall from the README, we are using a single variable(feature) in our data: **speed** at each timestep.

Each data sample consists of many timesteps, with walk and train modes alternating each other, thus forming a time series.

In [None]:
import altair as alt

import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from tmdprimer.datagen import generate_sample

## Sample without outliers
The sample is generated as a mix of train and walk segments as follows:
* There are 2 train and 2 walk segments in each sample, the order is random. It could happen that two walk segments follow each other.
* For walk segment, the speed is always constant and equals to 5 (km/h).
* Each train segment consists of one or more subsegments of a length from 20 to 100 timesteps. The train has a constant acceleration, and reaches it's maximum speed in 20 timesteps. It then needs the same amount of timesteps to stop. That means, if a subsegment is shorter then 40 timesteps, train does not reach it's maxumim speed before starting to decelerate.

The randomization in data generation process is made to ensure the RNN model cannot learn patterns of the specific sample allocation.

In [6]:
sample = generate_sample()
fig = sample.get_figure()
fig.properties(title="Sample without outliers", width=800)

## Train sample with outliers

When the data contains no outliers, Sequential models won't have advantages over traditional ML models.
Real sensor data, however, always have outliers of some sorts. In our case we are going to introduce outliers in train segments at a random point in time, with a given probability. The **outlier speed is the same as for walk segment (5km/h)**, so that it would be indistinguishable given a single observation.

In [7]:
sample = generate_sample(outlier_prob=0.05)
fig = sample.get_figure()
fig.properties(title="Sample with outliers", width=800)