# Cortex Case Example: Anomaly Detection
It's easy to get started with analysis on data collected from mindLAMP. In this example, we'll walk through using [Cortex](https://docs.lamp.digital/cortex) with [Luminol](https://github.com/linkedin/luminol), an anomaly detection library, and [Altair](https://altair-viz.github.io), an interactive visualization library, to tag and visualize survey scores for a particular patient.

In [1]:
import cortex
import luminol
import pandas as pd
import numpy as np
import altair as alt
from luminol.anomaly_detector import AnomalyDetector



## Preparing the data using Cortex
First, call `cortex.run(...)` with your Participant ID of interest. Then, we'll need to rearrange the resultant data frame by setting the index to the `timestamp` and adding an `anomaly` column for later.

In [2]:
df = cortex.run(
    'U1089294357', ['survey_scores'], 
    start=0, end=cortex.now()
)['survey_scores']
df.index = df.timestamp.astype(int) // 10**3
df['anomaly'] = 0 # default to no anomaly

[INFO:feature_types:_wrapper2] Processing primary feature "cortex.survey_scores"...
[INFO:feature_types:_wrapper2] Cortex caching directory set to: /home/_data/cortex_cache
[INFO:feature_types:_wrapper2] Processing raw feature "lamp.survey"...
[INFO:feature_types:_wrapper2] No saved raw data found, getting new...
[INFO:feature_types:_wrapper2] Saving raw data as "/home/_data/cortex_cache/survey_U1089294357_0_1621449536000.cortex"...


In addition to the survey `score` column, we also have a `category` column that's derived from custom survey grouping. The Cortex feature `survey_scores` automatically scores each question for you, whether it's a Likert scale, list of options, True/False, and so on. Then, it groups together questions from a single survey, such as "Weekly Survey" by predefined categories, like "Mood" and "Anxiety" to better understand symptom domains.

In [3]:
df

Unnamed: 0_level_0,id,category,timestamp,score,anomaly
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1546087152000000,U1089294357,Sleep and Social,2018-12-29 12:39:12,2.000000,0
1545836461000000,U1089294357,Sleep and Social,2018-12-26 15:01:01,1.500000,0
1545696447000000,U1089294357,Sleep and Social,2018-12-25 00:07:27,1.600000,0
1545449535000000,U1089294357,Sleep and Social,2018-12-22 03:32:15,1.400000,0
1545262330000000,U1089294357,Sleep and Social,2018-12-19 23:32:10,1.333333,0
...,...,...,...,...,...
1535656477000000,U1089294357,Psychosis and Social,2018-08-30 19:14:37,1.000000,0
1535480282000000,U1089294357,Psychosis and Social,2018-08-28 18:18:02,1.000000,0
1535480042000000,U1089294357,Psychosis and Social,2018-08-28 18:14:02,1.250000,0
1535479779000000,U1089294357,Psychosis and Social,2018-08-28 18:09:39,2.000000,0


## Detecting anomalies using Luminol
Now, we feed the Luminol detector our `score` column. It then processes the data and returns anomalous time windows tagged with an anomaly score. We'll tag the actual survey scores in our DataFrame that lie within these windows with their respective anomaly score. We need to iterate over each category and tag anomalies within the category independent of survey scores from other categories.

In [5]:
for cat in df.category.unique():
    sub_df = df.loc[df.category == cat, 'score'].fillna(0).to_dict()
    detector = AnomalyDetector(sub_df, score_threshold=1.5)
    for a in detector.get_anomalies():
        ts = (df.index >= a.start_timestamp) & (df.index <= a.end_timestamp)
        df.loc[ts & (df.category == cat), 'anomaly'] = a.anomaly_score

## Visualizing the anomalies using Altair
We'll use the Altair interactive plotting library to break question categories out into their own sub-charts. We'll also bring extra attention to anomalous survey score data points by increasing their size and changing their color.

In [6]:
alt.Chart(df).mark_point(filled=True).properties(width=500, height=50).encode(
    
    # The timestamp column was already converted by Cortex into a human-readable Date.
    x=alt.X('timestamp', title="Date"),
    
    # We know the score is clamped between [1 <= score <= 3] for this patient.
    y=alt.Y('score', title="Score", scale=alt.Scale(domain=[1, 3])),
    
    # Color anomalies non-linearly by severity (redder is worse).
    color=alt.Color('anomaly', title='Severity', scale=alt.Scale(type='sqrt', range=['#29629E', '#CA2C21'])),
    
    # Resize anomalies non-linearly by severity (larger is worse).
    size=alt.Size('anomaly', title='Severity', scale=alt.Scale(type='sqrt', range=[25, 500]))
).facet(
    
    # By 'faceting' the plot by the category column, we can split each survey category out into its own subplot.
    row='category'
)