# Keras: Times Series Classify Binary

*Binary Detection of Epileptic Seizures Using a Cohort of Sequence of Electroencephalography (EEG) Readings.*

![waves](../images/waves.png)

Sequence data structures contain many observations (rows) for each sample (e.g. site, sensor, or patient). They are often used for grouping time-based observations into what is called a time series. However, sequences can also represent biological sequences like DNA and RNA.

The cardinality of *many observations per sample* changes the dimensionality of the data from 2D to 3D. This effectively adds an additional layer of complexity to all aspects of data preparation. In this notebook, you'll see that, once a `Dataset.Sequence` has been ingested, the AIQC API allows you to work with multivariate 3D data as easily as if it were 2D. As an example, you can still apply encoders by dtype and column_name.

---

In [2]:
import aiqc
from aiqc import datum

---

## Example Data

This dataset is comprised of:
    
- *Features* = a sequence of electroencephalogram (EEG) readings.
- *Label* = presence of an epileptic seizure.

In [4]:
df = datum.to_pandas('epilepsy.parquet')

In [5]:
df.head()

Unnamed: 0,sensor_0,sensor_1,sensor_2,sensor_3,sensor_4,sensor_5,sensor_6,sensor_7,sensor_8,sensor_9,...,sensor_169,sensor_170,sensor_171,sensor_172,sensor_173,sensor_174,sensor_175,sensor_176,sensor_177,seizure
0,232,183,125,47,-32,-73,-105,-99,-72,-33,...,-202,-303,-365,-389,-406,-401,-366,-251,-143,1
1,284,276,268,261,254,241,232,223,212,206,...,64,15,-19,-57,-91,-118,-131,-140,-148,1
2,373,555,580,548,502,433,348,276,216,182,...,-1032,-1108,-803,-377,-13,172,246,206,156,1
3,791,703,538,76,-535,-1065,-1297,-1018,-525,-13,...,-396,135,493,601,559,400,193,3,-141,1
4,436,473,508,546,587,615,623,615,596,574,...,637,644,646,650,656,653,648,628,608,1


In [None]:
from sklearn.preprocessing import StandardScaler
import keras
from keras.models import Sequential
from keras.layers import LSTM, Dense
from keras.callbacks import History

---

## a) High-Level API

Reference [High-Level API Docs](api_high_level.ipynb) for more information.

In [6]:
label_df = df[['seizure']]

In [7]:
seq_ndarray3D = df.drop(columns=['seizure']).to_numpy().reshape(1000,178,1)

In [9]:
splitset = aiqc.Pipeline.Sequence.make(
    # --- Label preprocessing ---
    label_df_or_path = label_df
    , label_dtype = None
    , label_column = 'seizure'
    , label_interpolater = None
    , label_encoder = None
    
    # --- Feature preprocessing ---
    , feature_ndarray3D_or_npyPath = seq_ndarray3D
    , feature_dtype = None
    , feature_cols_excluded = None
    , feature_interpolaters = None
    , feature_window = None
    , feature_encoders = [
        dict(columns='0', sklearn_preprocess=StandardScaler())
    ]
    , feature_reshape_indices = None

    # --- Stratification ---
    , size_test = 0.12
    , size_validation = 0.22
    , fold_count = None
    , bin_count = None
)

‚è±Ô∏è Ingesting Sequences üß¨: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1000/1000 [00:05<00:00, 183.79it/s]



=> Info - System overriding user input to set `sklearn_preprocess.copy=False`.
   This saves memory when concatenating the output of many encoders.


___/ featurecoder_index: 0 \_________

=> The column(s) below matched your filter(s) featurecoder filters.

['0']

=> Done. All feature column(s) have featurecoder(s) associated with them.
No more Featurecoders can be added to this Encoderset.



In [11]:
def fn_build(features_shape, label_shape, **hp):    
    model = Sequential()
    model.add(LSTM(
        hp['neuron_count']
        , input_shape=(features_shape[0], features_shape[1])
    ))
    model.add(Dense(units=label_shape[0], activation='sigmoid'))
    return model

In [12]:
def fn_train(model, loser, optimizer, samples_train, samples_evaluate, **hp):
    model.compile(
        loss=loser
        , optimizer=optimizer
        , metrics=['accuracy']
    )
    model.fit(
        samples_train['features'], samples_train['labels']
        , validation_data = (samples_evaluate['features'], samples_evaluate['labels'])
        , verbose = 0
        , batch_size = hp['batch_size']
        , epochs = hp['epochs']
        , callbacks = [History()]
    )
    return model

In [13]:
hyperparameters = {
    "neuron_count": [25]
    , "batch_size": [8]
    , "epochs": [5, 10]
}

In [14]:
queue = aiqc.Experiment.make(
    # --- Analysis type ---
    library = "keras"
    , analysis_type = "classification_binary"
    
    # --- Model functions ---
    , fn_build = fn_build
    , fn_train = fn_train
    , fn_lose = None #auto
    , fn_optimize = None #auto
    , fn_predict = None #auto
    
    # --- Training options ---
    , repeat_count = 2
    , hyperparameters = hyperparameters
    , search_percent = None
    
    # --- Data source ---
    , splitset_id = splitset.id
    , foldset_id = None
    , hide_test = False
)

In [15]:
queue.run_jobs()

üîÆ Training Models üîÆ: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [02:24<00:00, 36.01s/it]


For more information on visualization of performance metrics, reference the [Visualization & Metrics](visualization.html) documentation.

---

## b) Low-Level API

In [16]:
df = datum.to_pandas('epilepsy.parquet')

In [17]:
label_df = df[['seizure']]

In [18]:
seq_ndarray3D = df.drop(columns=['seizure']).to_numpy().reshape(1000,178,1)

In [19]:
dataset_tabular = aiqc.Dataset.Tabular.from_pandas(label_df)

In [20]:
label = dataset_tabular.make_label(columns='seizure')

In [21]:
dataset_sequence = aiqc.Dataset.Sequence.from_numpy(
    ndarray3D_or_npyPath = seq_ndarray3D
    , column_names = ['EEG']
)

‚è±Ô∏è Ingesting Sequences üß¨: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1000/1000 [00:05<00:00, 194.69it/s]


In [22]:
feature = dataset_sequence.make_feature()

In [23]:
encoderset = feature.make_encoderset()

In [24]:
encoderset = encoderset.make_featurecoder(
    sklearn_preprocess = StandardScaler()
    , columns = ['EEG']
)


=> Info - System overriding user input to set `sklearn_preprocess.copy=False`.
   This saves memory when concatenating the output of many encoders.


___/ featurecoder_index: 0 \_________

=> The column(s) below matched your filter(s) featurecoder filters.

['EEG']

=> Done. All feature column(s) have featurecoder(s) associated with them.
No more Featurecoders can be added to this Encoderset.



In [25]:
splitset = aiqc.Splitset.make(
    feature_ids = [feature.id]
    , label_id = label.id
    , size_test = 0.22
    , size_validation = 0.12
)

In [26]:
def fn_build(features_shape, label_shape, **hp):    
    model = Sequential()
    model.add(LSTM(
        hp['neuron_count']
        , input_shape=(features_shape[0], features_shape[1])
    ))
    model.add(Dense(units=label_shape[0], activation='sigmoid'))
    return model

In [27]:
def fn_train(model, loser, optimizer, samples_train, samples_evaluate, **hp):
    model.compile(
        loss=loser
        , optimizer=optimizer
        , metrics=['accuracy']
    )
    model.fit(
        samples_train['features'], samples_train['labels']
        , validation_data = (samples_evaluate['features'], samples_evaluate['labels'])
        , verbose = 0
        , batch_size = hp['batch_size']
        , epochs = hp['epochs']
        , callbacks = [History()]
    )
    return model

In [28]:
algorithm = aiqc.Algorithm.make(
    library = "keras"
    , analysis_type = "classification_binary"
    , fn_build = fn_build
    , fn_train = fn_train
)

In [29]:
hyperparameters = {
    "neuron_count": [25]
    , "batch_size": [8]
    , "epochs": [5, 10]
}

In [30]:
hyperparamset = algorithm.make_hyperparamset(
    hyperparameters = hyperparameters
)

In [31]:
queue = algorithm.make_queue(
    splitset_id = splitset.id
    , hyperparamset_id = hyperparamset.id
    , repeat_count = 2
)

In [32]:
queue.run_jobs()

üîÆ Training Models üîÆ: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [01:53<00:00, 28.32s/it]


Reference [Low-Level API Docs](api_high_level.ipynb) for more information including how to work with non-tabular data and defining optimizers.