# Stickleback

In this notebook we'll train a classifier to find *point behaviors* in bio-logging sensor data using the `stickleback` module.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from datetime import datetime
from IPython.display import display
import numpy as np
import pandas as pd
from stickleback.stickleback import Stickleback
print(datetime.now().strftime("%H:%M:%S"))

We begin by reading the example data (bio-logger sensor data and events) and visualizing them.

In [None]:
# Read example data
breath_sb = Stickleback(
    sensors=pd.read_pickle("../data/multi_prh.pkl"), 
    events=pd.read_pickle("../data/multi_breaths.pkl").index,
    win_size=50, min_period=5
)

breath_sb.plot_sensors_events("bw180828-49", interactive=True)

We create the training dataset using all known events and an equal sample size of randomly selected non-events.

In [None]:
breath_sb.sample_nonevents()
print("+: {}\n-: {}".format(breath_sb.event_idx, breath_sb.nonevent_idx))
breath_sb.extract_training_data()
display(breath_sb.clf_data.head())
display(breath_sb.clf_data.tail())
print("labels: {} ... {}".format(breath_sb.clf_labels[0:5], breath_sb.clf_labels[-5:]))

Using the training data: fit the model, make predictions, and assess in-sample accuracy.

In [None]:
breath_sb.fit()
pred_proba, pred_idx = breath_sb.predict_self(nth=5)

In [None]:
outcomes = breath_sb.assess(pred_idx, tol=pd.Timedelta("5s"))
outcomes_by_deployid = pd.pivot_table(outcomes.reset_index(), index=["deployid"], columns=["outcome"], aggfunc=len)
display(outcomes_by_deployid.droplevel(0, axis=1)[["TP", "FP", "FN"]])

In [None]:
breath_sb.plot_predictions("mn190228-42", pred_proba, pred_idx, outcomes, interactive=True)

The randomly sampled non-events are unlikely to contain much useful information for differentiating events from things that *almost* look like events. That's why the first round of predictions have many true positives and few false negatives, but many false positives as well. The false positives are *almost* events that we use to refine the model.

In [None]:
false_positive_idx = outcomes[outcomes == "FP"].index
breath_sb.refit(false_positive_idx, [Stickleback.nonevent] * len(false_positive_idx))
pred_proba, pred_idx = breath_sb.predict_self(nth=5)
outcomes = breath_sb.assess(pred_idx, tol=pd.Timedelta("5s"))
outcomes_by_deployid = pd.pivot_table(outcomes.reset_index(), index=["deployid"], columns=["outcome"], aggfunc=len)
display(outcomes_by_deployid.droplevel(0, axis=1)[["TP", "FP", "FN"]])

In [None]:
breath_sb.plot_predictions("bw180828-49", pred_proba, pred_idx, outcomes, interactive=True)

In [None]:
breath_sb.plot_predictions("mn170810-42", pred_proba, pred_idx, outcomes, interactive=True)

In [None]:
breath_loo = breath_sb.loo(nth=5, tol=pd.Timedelta("5s"))
print(datetime.now().strftime("%H:%M:%S"))