In [1]:
%reload_ext autoreload
%autoreload 2
%config Completer.use_jedi = False
%reload_ext lab_black

# Playing with synthetic data

EDS-TeVa can generate synthetic data that mimics OMOP data structure and the behavior of real hospital data. It is here to help you to test the different functionalities and why not allow you to test your custom probes and custom models.

!!!success "Contribution"
    If you managed to implement your own component, or even if you just thought about a new component do not hesitate to share it with the community by following the [contribution guidelines][contributing]. Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

## Load the data

In [5]:
from edsteva.io import SyntheticData
from datetime import datetime

data = SyntheticData(
    mean_visit=1000,
    t_min=datetime(2012, 1, 1),
    t_max=datetime(2022, 1, 1),
)

## Administrative records modeling

### Compute the Visit Probe

The [``VisitProbe``][edsteva.probes.visit.VisitProbe] computes $c_{visit}(t)$ the availability of administrative data related to visits for each care site and each stay type according to time:

$$
c_{visit}(t) = \frac{n_{visit}(t)}{n_{99}}
$$

Where $n_{visit}(t)$ is the number of visits, $n_{99}$ is the $99^{th}$ percentile of visits and $t$ is the month.

!!!info ""
    If the $99^{th}$ percentile of visits $n_{99}$ is equal to 0, we consider that the completeness predictor $c(t)$ is also equal to 0.

In [7]:
from edsteva.probes import VisitProbe

visit = VisitProbe()
visit.compute(
    data,
    stay_types={
        "All": ".*",
        "Urg_Hospit": "urgence|hospitalisés",
    },
    care_site_levels=["Hospital", "Pole", "UF"],
)
visit.save()
visit.predictor.sample(frac=1).head()

2022-10-06 19:25:46.740 | INFO     | edsteva.probes.base:cache_predictor:305 - Cache the predictor, you can reset the predictor to this state with the method reset_predictor
2022-10-06 19:25:46.743 | INFO     | edsteva.probes.utils:save_object:383 - It has been saved in /home/adam/.cache/edsteva/edsteva/probes/visitprobe.pickle


Unnamed: 0,care_site_level,care_site_id,care_site_short_name,stay_type,date,n_visit,c
1147,Pôle/DMU,22,Pôle/DMU-22,Urg_Hospit,2013-01-01,0.0,0.0
1484,Unité Fonctionnelle (UF),111,Unité Fonctionnelle (UF)-111,All,2017-09-01,0.0,0.0
866,Pôle/DMU,12,Pôle/DMU-12,All,2020-03-01,19.0,0.779327
1765,Unité Fonctionnelle (UF),112,Unité Fonctionnelle (UF)-112,Urg_Hospit,2019-12-01,5.0,0.474834
1024,Pôle/DMU,21,Pôle/DMU-21,All,2017-05-01,16.0,0.503145


### Fit the Step Fucntion Model

The [``StepFunction``][edsteva.models.step_function.step_function.StepFunction] fits a step function $f_{t_0, c_0}(t)$ with coefficients $\Theta = (t_0, c_0)$ on a completeness predictor $c(t)$:

$$
\begin{aligned}
f_{t_0, c_0}(t) & = c_0 \ \mathbb{1}_{t \geq t_0}(t) \\
c(t) & = f_{t_0, c_0}(t) + \epsilon(t)
\end{aligned}
$$

- the characteristic time $t_0$ estimates the time after which the data is available.
- the characteristic value $c_0$ estimates the stabilized routine completeness.

In [8]:
from edsteva.models.step_function import StepFunction

visit_model = StepFunction()
visit_model.fit(
    probe=visit,
)

## Visualize the model

EDS-TeVa library provides dashboards and plots to visualize the temporal evolution of [Probes][probe] along with fitted [Models][model]. Visualization functionalities can be used to explore the database and set thresholds relative to selection criteria.

### Show interactive dashboard

A **Dashboard** is an interactive [Altair](https://altair-viz.github.io/) chart that lets you visualize variables aggregated by any combination of columns included in the [Probe][probe]. In the library, the dashboards are divided into two parts:

- On the top, there is the plot of the aggregated variable of interest.
- On the bottom, there are interactive filters to set. Only the selected data is aggregated to produce the plot on the top.

In [None]:
from edsteva.viz.dashboards import predictor_dashboard

predictor_dashboard(
    probe=visit,
    fitted_model=visit_model,
    care_site_level="UF",
)

Click to see the [Predictor Dashboard](../assets/charts/synth_predictor_dashboard.html)

### Plot the model

This static plot can be exported in png or svg. As it is less interactive, you may specify the filters in the inputs of the functions as follow:

In [11]:
import altair as alt
from edsteva.viz.plots import plot_probe

chart = plot_probe(
    probe=visit,
    fitted_model=visit_model,
    care_site_level="Hospital",
    stay_type="All",
)

```vegalite
{
"schema-url": "../../assets/charts/synth_fitted_visit.json"
}
```

## Set the thresholds

The estimates dashboard provides:

- on the top, a representation of the overall deviation from the Model.
- on the bottom, interactive sliders that allows you to vary the thresholds.

One use could be to set the thresholds that keep the most care sites while having an acceptable overall deviation.

In [None]:
from edsteva.viz.dashboards import estimates_dashboard

estimates_dashboard(
    probe=visit,
    fitted_model=visit_model,
    care_site_level="UF",
)

Click to see the [Estimates Dashboard](../assets/charts/synth_threshold_dashboard.html).