In [1]:
%reload_ext autoreload
%autoreload 2

# Visualizing individual trajectories

When studying sequences of events (e.g care trajectories, drug sequences, ...), it might be useful to visualize individual sequences. To that end, we provide the `viz_trajectories` function to plot individual sequences given an events dataframe.

In [2]:
from eds_scikit.plot.viz_trajectories import plot_trajectories
import pandas as pd
import numpy as np
import datetime

    To improve performances when using Spark and Koalas, please call `eds_scikit.improve_performances()`
    This function optimally configures Spark. Use it as:
    `spark, sc, sql = eds_scikit.improve_performances()`
    The functions respectively returns a SparkSession, a SparkContext and an sql method
    


## Create synthetic dataset

In [3]:
synthetic_drug_exposure_periods = {
    "person_id": {
        0: 1,
        1: 1,
        2: 1,
        3: 1,
        4: 1,
        5: 1,
        6: 1,
        7: 2,
        8: 2,
        9: 2,
        10: 2,
    },
    "event_family": {
        0: "A",
        1: "B",
        2: "C",
        3: "D",
        4: "E",
        5: "F",
        6: "G",
        7: "A",
        8: "B",
        9: "C",
        10: "D",
    },
    "event_start_datetime": {
        0: "01/01/2020",
        1: "03/01/2020",
        2: "05/01/2020",
        3: "05/01/2020",
        4: "11/01/2020",
        5: "12/01/2020",
        6: "15/01/2020",
        7: "01/01/2020",
        8: "03/01/2020",
        9: "05/01/2020",
        10: "10/01/2020",
    },
    "event_end_datetime": {
        0: "04/01/2020",
        1: "06/01/2020",
        2: "08/01/2020",
        3: "09/01/2020",
        4: "13/01/2020",
        5: "13/01/2020",
        6: "17/01/2020",
        7: "08/01/2020",
        8: "06/01/2020",
        9: "10/01/2020",
        10: "12/01/2020",
    },
}

synthetic_drug_exposure = {
    "person_id": {
        0: 1,
        1: 1,
        2: 1,
        3: 1,
        4: 1,
        5: 1,
        6: 1,
        7: 1,
        8: 1,
        9: 1,
        10: 1,
        11: 1,
        12: 2,
        13: 2,
        14: 2,
        15: 2,
        16: 2,
        17: 2,
    },
    "event_family": {
        0: "A",
        1: "A",
        2: "B",
        3: "C",
        4: "C",
        5: "D",
        6: "D",
        7: "D",
        8: "E",
        9: "F",
        10: "G",
        11: "G",
        12: "A",
        13: "B",
        14: "C",
        15: "C",
        16: "C",
        17: "D",
    },
    "event": {
        0: "a1",
        1: "a2",
        2: "b1",
        3: "c1",
        4: "c2",
        5: "d1",
        6: "d1",
        7: "d2",
        8: "e1",
        9: "f1",
        10: "g1",
        11: "g1",
        12: "a3",
        13: "b1",
        14: "c3",
        15: "c3",
        16: "c3",
        17: "d1",
    },
    "event_start_datetime": {
        0: "01/01/2020",
        1: "03/01/2020",
        2: "03/01/2020",
        3: "05/01/2020",
        4: "06/01/2020",
        5: "05/01/2020",
        6: "06/01/2020",
        7: "07/01/2020",
        8: "11/01/2020",
        9: "12/01/2020",
        10: "15/01/2020",
        11: "17/01/2020",
        12: "01/01/2020",
        13: "03/01/2020",
        14: "05/01/2020",
        15: "06/01/2020",
        16: "08/01/2020",
        17: "10/01/2020",
    },
    "event_end_datetime": {
        0: "02/01/2020",
        1: "04/01/2020",
        2: "06/01/2020",
        3: np.nan,
        4: "08/01/2020",
        5: "08/01/2020",
        6: "09/01/2020",
        7: "09/01/2020",
        8: np.nan,
        9: "13/01/2020",
        10: "17/01/2020",
        11: "17/01/2020",
        12: "08/01/2020",
        13: "06/01/2020",
        14: "07/01/2020",
        15: "09/01/2020",
        16: "10/01/2020",
        17: "12/01/2020",
    },
}

def get_synthetic_data(mode):
    if mode == "drug_exposure_periods":
        df = pd.DataFrame(synthetic_drug_exposure_periods)
    if mode == "drug_exposure":
        df = pd.DataFrame(synthetic_drug_exposure)
    df.event_start_datetime = pd.to_datetime(
        df.event_start_datetime, dayfirst=True
    )
    df.event_end_datetime = pd.to_datetime(
        df.event_end_datetime, dayfirst=True
    )
    df["index_date"] = datetime.datetime(2020,1,1)
    return df

In [4]:
df_events = get_synthetic_data("drug_exposure")

The `df_events` dataset contains occurrences of 12 events, derived from 7 events' families ("A", "B", "C", "D", "E", "F", "G).  
Events can be both one-time and continuous.  
An `index_date` is also provided and refers to the inclusion date of each patient in the cohort.

In [5]:
df_events.head()

Unnamed: 0,person_id,event_family,event,event_start_datetime,event_end_datetime,index_date
0,1,A,a1,2020-01-01,2020-01-02,2020-01-01
1,1,A,a2,2020-01-03,2020-01-04,2020-01-01
2,1,B,b1,2020-01-03,2020-01-06,2020-01-01
3,1,C,c1,2020-01-05,NaT,2020-01-01
4,1,C,c2,2020-01-06,2020-01-08,2020-01-01


## Plot individual sequences

Sequences can be plotted as-is, by providing the events dataframe.

In [6]:
plot_trajectories(
    df_events
)

Further configuration can be provided, including :
- dim_mapping : dictionary to set colors and labels for each event type.
- family_col: column name of events' families.
- list_person_ids: List of specific `person_id`
- same_x_axis_scale: boolean to set all individual charts to the same scale

In [7]:
dim_mapping = {
    "a1":{"color":(255, 200, 150), "label":"eventA1"},
    "a2":{"color":(235, 200, 150), "label":"eventA2"},
    "a3":{"color":(215, 200, 150), "label":"eventA3"},
    "b1":{"color":(200, 200, 150), "label":"eventB1"},
    "c1": {"color":(50, 255, 255), "label":"eventC1"},
    "c2": {"color":(50, 200, 255), "label":"eventC2"},
    "c3": {"color":(50, 255, 200), "label":"eventC3"},
    "d1": {"color":(180, 180, 0), "label":"eventD1"},
    "d2": {"color":(180, 100, 0), "label":"eventD2"},
    "e1": {"color":(130, 60, 10), "label":"eventE1"},
    "f1": {"color":(255, 0, 0), "label":"eventF1"},
    "g1": {"color":(100, 0, 200), "label":"eventG1"},
}

In [8]:
plot_trajectories(
    df_events,
    family_col = 'event_family',
    dim_mapping = dim_mapping,
    same_x_axis_scale=True,
    title="Event sequences",
)