In [None]:
# default_exp setup

# Setup

> Setup GPU, default paths & global variables.

In [None]:
#hide
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:85% !important; }</style>"))

In [None]:
#hide
%reload_ext autoreload
%autoreload 2

In [None]:
#export
from fastai.imports import * 
from addict import Dict

In [None]:
#hide
from nbdev.showdoc import *

Every file in the library imports this, so all global set up required everywhere can be added here.
1. Sets up device to GPU if available.
2. Defines default paths for different stores - so that they are out of version control by default.
3. Global scope variables - for convenience in other modules.

## GPU

In [None]:
#exports
def get_device():
    '''Checks to see if GPU is available and sets device to GPU or CPU'''
    use_cuda = torch.cuda.is_available()
    if use_cuda:
        assert torch.backends.cudnn.enabled == True
        torch.backends.cudnn.benchmark = True #Enable cuDNN auto-tuner - perf benefit for convs
        device = torch.device("cuda")
    else:
        device = torch.device("cpu")
    return device

## Settings File

A YAML file called `settings.yaml` is created (from a template) the first time the library is used.

In [None]:
#export
def read_settings():
    '''Read settings file at "~/.lemonade/settings.yaml", if doesnt exist, create it from template'''
    settings_dir = f'{Path.home()}/.lemonade'
    settings_file = Path(f'{settings_dir}/settings.yaml')

    if not settings_file.exists():
        print('No settings file found, so creating from template ..')
        with open('./templates/settings_template.yaml', 'r') as t:
            template = Dict(yaml.full_load(t))
        template.STORES.DATA_STORE       = f'{Path.home()}/.lemonade/datasets'
        template.STORES.LOG_STORE        = f'{Path.home()}/.lemonade/logs'
        template.STORES.MODEL_STORE      = f'{Path.home()}/.lemonade/models'
        template.STORES.EXPERIMENT_STORE = f'{Path.home()}/.lemonade/experiments'
        
        settings = template
        Path.mkdir(Path(settings_dir), exist_ok=True)
        with open(settings_file, 'w') as s:
            yaml.dump(settings.to_dict(), s)
    else:
        with open(settings_file, 'r') as s:
            settings = Dict(yaml.full_load(s))

    return settings

## Global Scope Variables

In [None]:
#exports
DEVICE = get_device()
settings = read_settings()

DATA_STORE         = settings.STORES.DATA_STORE
LOG_STORE          = settings.STORES.LOG_STORE
MODEL_STORE        = settings.STORES.MODEL_STORE
EXPERIMENT_STORE   = settings.STORES.EXPERIMENT_STORE

PATH_1K   = f'{DATA_STORE}/synthea/1K'
PATH_10K  = f'{DATA_STORE}/synthea/10K'
PATH_20K  = f'{DATA_STORE}/synthea/20K'
PATH_100K = f'{DATA_STORE}/synthea/100K'

FILENAMES = settings.FILENAMES

SYNTHEA_DATAGEN_DATES = settings.SYNTHEA_DATAGEN_DATES

CONDITIONS = settings.CONDITIONS

LABELS = settings.LABELS

LOG_NUMERICALIZE_EXCEP = settings.LOG_NUMERICALIZE_EXCEP

- These are global variables used (in this initial release of the library) for convenience in other places; this will be cleaned up in future releases.
- `CONDITIONS` and `LABELS` go hand-in-hand
    - These are the labels we are trying to predict with the deep learning models.
    - `CONDITIONS` is how they appear in the preprocessed dataset.
    - `LABELS` is a convenient list used everywhere for display & plotting purposes (again will be cleaned up in future).
    
**The following 2 things need to be changed in the `~/.lemonade/settings.yaml` file based on your specific needs**

### Change `SYNTHEA_DATAGEN_DATES`

In [None]:
SYNTHEA_DATAGEN_DATES

{'100K': '4-4-2020',
 '10K': '03-16-2021',
 '1K': '03-15-2021',
 '20K': '11-5-2020',
 '250K': '11-16-2018'}

- Sample dates are copied over from the settings template file and serve as examples.
- Please update these based on when you generate a particular dataset.
- These dates are important to calculate patient age.

### Change - Default STORE Paths

In [None]:
DATA_STORE, MODEL_STORE, EXPERIMENT_STORE, LOG_STORE

('/home/vinod/.lemonade/datasets',
 '/home/vinod/.lemonade/models',
 '/home/vinod/.lemonade/experiments',
 '/home/vinod/.lemonade/logs')

**Please change these paths to defaults in your specific configuration**

- All of these artifacts need to be in some form of failsafe storage, but not all need to be in version control.
- Also, some of them are likely to get big and version control might not be the ideal location (e.g. data, logs and models).
    - Experiments on the other hand, as designed here, tend to be small-sized enough and can be stored in github or some other version control system (VCS).
    - Each Experiment will keep track of the model it runs and saves it separately in the model store.
    - Given the nature of the dataset in this release of the library (synthetic / Synthea), it can be easily re-generated in case of a loss.
    
So, its left to the user to decide which store needs to be where, depending upon your decision, change the default paths here.<br>
**Recommendation** is to store experiments in some VCS and data & models in some type of failsafe storage; logs are used minimally and not that important (atleast in this release).

## Setup Synthea

Set up Synthea so you can generate different types of synthetic EHR data per your need.<br>
[Synthea - Wiki](https://github.com/synthetichealth/synthea/wiki) has details about the project and how to get started and generate the data.<br>

Here are condensed instructions for [basic setup of Synthea](https://github.com/synthetichealth/synthea/wiki/Basic-Setup-and-Running) for getting you up and running quickly. They also have an option for a developer setup, instructions for which are on the same webpage.

### Download Synthea
- Download the binary (from the basic setup link above) to a local directory
    - Don't run it yet
- Create a file in the same directory called `synthea.properties` and add the following lines into it and save it
```
exporter.years_of_history = 0
exporter.fhir.export = false
exporter.fhir.transaction_bundle = false
exporter.hospital.fhir.export = false
exporter.practitioner.fhir.export = false
exporter.csv.export = true
``` 

### Generate Data
- Once Synthea is set up, the following script will generate the data. 
- Its important to record the run dates (data generation dates each time you generate a new dataset with Synthea) as mentioned above, we will need this during preprocessing.
    - Basic setup run command is: `java -jar synthea-with-dependencies.jar` 
    - Developer setup run command is: `./run_synthea` 
- Run with the `-p` switch to control population of patients generated as shown in examples below. 

**For example to generate 10,000 patients ..**

`java -jar synthea-with-dependencies.jar -c synthea.properties -s 12345 -p 10000`
- run date: 03/16/2021
- Records: total=11833, alive=10000, dead=1833

### Copy Into DataStore
- Synthea will save the generated dataset into the `output` directory in the same location (for basic setup).
- Copy the `csv` directory to the location pointed to by the `DATASTORE` global variable
    - for example `~/.lemonade/datasets`
- Rename the `csv` directory to `raw_original`, make sure the directory structure looks like this ..
    - for 10K data - `~/.lemonade/dataset/synthea/10K/raw_original`
    - **Note** - Synthea outputs all csv files in a folder called `csv`; after copying into the datastore, the csv files must be in the `raw_original` folder, where this library expects it for preprocessing.
    

### Update `settings.yaml`
- Go to your lemonade settings file (~/.lemonade/settings.yaml) and add an entry (or update the entry) for the dataset you just generated
- For example for 10K data
    - Under `SYNTHEA_DATAGEN_DATES` create the following
    - `'10K': '12-19-2019'`

## Export -

In [None]:
#hide
from nbdev.export import *
notebook2script()

Converted 00_setup.ipynb.
Converted 01_preprocessing_clean.ipynb.
Converted 02_preprocessing_vocab.ipynb.
Converted 03_preprocessing_transform.ipynb.
Converted 04_data.ipynb.
Converted 05_metrics.ipynb.
Converted 06_learn.ipynb.
Converted 07_models.ipynb.
Converted 08_experiment.ipynb.
Converted 99_quick_walkthru.ipynb.
Converted 99_running_exps.ipynb.
Converted index.ipynb.
