# Sequence classification with Neural Networks
## Part 1: Data generation methodology

### Input data
Recall from the README, we are using a single variable(feature) in our data: **speed** at each timestep.

Each data sample consists of many timesteps, with walk and train modes alternating each other, thus forming a time series.

In [4]:
import altair as alt

import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from rnnprimer.datagen import generate_sample

## Sample without outliers
Our sample has speed generated as follows:
* For walk segment, the speed is always constant and equals to 5 (km/h).
* For train segment, the speed quickly reaches 100 (km/h), stays constant, and then decreases to 0 again. There can be more than 1 consequtive train segment, which models stops of a train.

We randomly separate 5 train samples into two separated with a walk. This is not relevant for traditional model where we feed one sample at a time.

In [2]:
sample = generate_sample()
fig = sample.get_figure()
fig.properties(title="Sample without outliers", width=800)

## Train sample with outliers

When the data contains no outliers, Sequential models won't have advantages over traditional ML models.
Real sensor data, however, always have outliers of some sorts. In our case we are going to introduce outliers in train segments at a random point in time, with a given probability. The **outlier speed is the same as for walk segment (5km/h)**, so that it would be indistinguishable given a single observation.

In [3]:
sample = generate_sample(outlier_prob=0.05)
fig = sample.get_figure()
fig.properties(title="Sample with outliers", width=800)