# Data Visualization: Building web-applications with DASH and Python
## An approach to DASH with the visualization of clinical repeated measurement data
<br>

#### Author: Alexander Ullmann

## Outline
1. Introduction to DASH
2. Installation
3. Data description: repeated measurement data (Chronic Kidney Disease)
4. Getting started: Your first application
5. Exercises 

## 1. Introduction to DASH
`Dash` is an Open Source Framework for building reactive, web-based applications. It enabled you to create analytical dashboards with ``python``. These apps can be run in your local web-browser, on a server of your company or the internet, making it accessible to a large audience. If you're familiar with the statistical [programming language `R`](https://cran.r-project.org/) then you can think of DASH as the equivalent to the [RShiny framework](https://shiny.rstudio.com/) but this time for `python`. Dash is an open sourced and released under the permissive MIT license.
<br>
<br>
***
Our goals with this tutorial is on the one hand to give an introduction into DASH and on the other hand to present a way on how to approach repeated measurement data in clinical science with visualizations. 
***

## 2. Installation
Open your terminal (linux/ubuntu) or Anaconda prompt (Windows) and install Jupyter-Dash with the following commands. Use of a ``virtualenv`` environment is strongly recommended but not mandatory, you can also just use the base environment that comes with the installation of Anaconda. Read about creating and managing environments with conda here -> [docs.conda.io](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). 
<br>
<br>
Use conda or pip, but not both to install jupyter-dash:
- ``conda install -c plotly jupyter-dash``<br>
OR:
<br>
- ``pip install jupyter_plotly_dash``
<br>


## 3. Data description: repeated measurement data
The data which we will use is put online by the PSI Organisation within the [Wonderful Wednesdays Workshops](https://www.psiweb.org/sigs-special-interest-groups/visualisation/welcome-to-wonderful-wednesdays). This data is publicly available and made free for usage in their [github repository](https://github.com/VIS-SIG/Wonderful-Wednesdays/tree/master/data/2020/2020-06-10) under the CC0 1.0 Universal license. 
<br>
<br>
The background of the data set is Chronic Kidney Disease (CKD). Patients with CKD often suffer from symptoms of anaemia. Medical counter interventions aim to raise 
haemoglobin (Hgb) concentration in blood plasma. This data comes from a simulated trial of 300 CKD patients with two treatment arms (new experimental treatment (E) and current standard of care (C)). For each patient data is measured repeatedly from Baseline to Week 24 on a 4-weekly basis.
<br>
<br>
The primary objective is to demonstrate efficacy by achieving a mean Hgb within the target range of 10 – 11.5 g/dL at week 24. Additionally patients benefit from reduced Hgb variability, and in an ideal case the treatment would produce a smooth increase in Hgb to the target range and keep stable Hgb levels there. 
<br>
<br>
### Data is provided as follows:
* USUBJID Subject unique identifier (character)
* TRT01PN Treatment group _(numeric, two groups)_
* TRT01P Treatment group _(character, two groups)_
* AVISITN Visit identifier (numeric) -_7 Visits from Baseline to Week 24 (4-Week periods)_
* AVISIT Visit identifier (character)-_7 Visits from Baseline to Week 24 (4-Week periods)_
* AVAL Hgb concentration (g/dL) _(numeric)_
<br>

### Aim:
We aim to provide an appropriate data visualization that can show a potential benefit of the experimental over the the control treatment with regard to achieving the target range, variability and stability of Hgb. In this example we will limit the graphical representation to boxplots.
<br>


## 4. Getting started: Your first application

### 4.1 Read packages
If you get an error running the packages below, please check that you have the mandatory packages installed in your python-anaconda environment. Please feel free to search for the installation of a particular missing package online. 

In [1]:
from jupyter_dash import JupyterDash
import dash
import dash_core_components as dcc # for layout: higher-level components like controls (sliders, buttons, graphs)
import dash_html_components as html # for layout: HTML tags
from dash.dependencies import Input, Output # for callbacks

import pandas as pd # Python Data Analysis Library
import plotly.express as px # express version of plotly, use it for creating plots

### 4.2 Read data

In [2]:
file = "https://raw.githubusercontent.com/VIS-SIG/Wonderful-Wednesdays/master/data/2020/2020-06-10/hgb_data.csv"
df = pd.read_csv(file)
df.head()

Unnamed: 0,USUBJID,TRT01PN,TRT01P,AVISITN,AVISIT,AVAL
0,ABC123456.000001,1,Treatment E,10,Baseline,10.197338
1,ABC123456.000001,1,Treatment E,20,Week 4,11.272717
2,ABC123456.000001,1,Treatment E,30,Week 8,12.288091
3,ABC123456.000001,1,Treatment E,40,Week 12,13.508947
4,ABC123456.000001,1,Treatment E,50,Week 16,12.195863


### 4. 3 Structure of a DASH app
A basic DASH app consists of 2 main components. 
1. Layout
2. Callbacks
<br>

The ``layout`` describes how your app will look like. Here you can use predefined HTML tags and dash core components. Alternatively, create your own components with JavaScript and React.js (this will not be covered here).<br>
The second fundamental part are the ``callbacks`` which provide interactivity. These are closely linked with python functions that update the application automatically once a change is detected. This is basically the main purpose of a dashboard or app.


Next we show how to create a basic app in DASH. A hello world example if you like. 

### 4.4 Basic DASH app 

In [3]:
# basic app 1: make a basic layout
app = JupyterDash(__name__)

app.layout = html.Div([   
    html.Div([
        html.H1('Introduction to DASH'),
        html.H2('An approach to DASH with visualization of repeated measurement data.'),
        html.P('Haemoglobin in Anaemia'),
    ]),
    dcc.Graph(),
])

# Run app and display result inline in the notebook
app.run_server(mode='external', port=9999)

OSError: Address 'http://127.0.0.1:9999' already in use.
    Try passing a different port to run_server.

### Let's break down the above:
0. At the beginning create an instance of JupyterDash with ```app = JupyterDash(__name__)```.
1. Next, we define a new section with ``html.Div()``. Then provide a subsection and create two headers with ``H1``, ``H2`` and a paragraphs with ``p``. 
2. Then we use the ``dcc.Graph()`` component so that we can link a graphic to it and render it for our app.  
3. Finally we make a call to the server with ``.run_server`` and specify the port number. This is helpful in case you want to run multiple dash applications. Note that you can only run one Dash app per port. 

So far if you submit the code you will see the basic html-layout and an empty coordinate system from the ``Graph`` component. Go ahead and try it out. You can  also change the output mode to ``mode='inline'`` if you prefer to display the results within the same window.

### Adding a visualization:
Next let's add some salt and pepper to the app by creating boxplots. A boxplot is a common means of visualization for numerical data and is widely used in clinical data science. It is very versatile because it shows multiple things about your data in one graphic, for example:
* symmetry/skewness (i.e. symmetry if median is roughly in the middle of the box, else left/right skewness), 
* variability (i.e. quartiles and interquartile range: $Q_{0.75} - Q_{0.25}$), 
* indication of centrality or location (median, i.e. $Q_{0.5}$ line inside the box) 
* and extreme values or outliers (i.e. extension of the whiskers/tails and values beyond).

You can read about the python implementation of the boxplot that we use here -> [plotly.express.box](https://plotly.github.io/plotly.py-docs/generated/plotly.express.box.html)

In [None]:
# basic app 2:  add a figure
app = JupyterDash(__name__)

# create a boxplot
fig = px.box(
    data_frame=df,    
    x="AVISIT",
    y="AVAL",
    color="TRT01P"
    )


app.layout = html.Div([   
    html.Div([
        html.H1('Introduction to DASH'),
        html.H2('An approach to DASH with visualization of repeated measurement data.'),
        html.P('Haemoglobin in Anaemia'),
    ]),
    dcc.Graph(figure=fig),
])

# Run app and display result inline in the notebook
app.run_server(mode='external', port=9999)

### Breakdown of above:
We added a boxplot figure to the ``.Graph`` component in the layout part of the app by referencing a figure object that we created from our dataframe ``df``. Notice that we used the ``color`` option to produce parallel boxplots representing the two treatment arms. 
<br>
<br>
Interpretation of the results:
If you look at the boxplots then several things catch the eye. First, we notice that the medians for treatment ``E`` are slightly above those for control. Secondly, the variability of the central 50% of the data points seems to be lower in the ``E`` group then in the ``C`` group, which can be deduced from the width of the boxes in the figure. Additionally, the tails for the Control group seem to be much longer then for the experimental treatment. So we can infer from the figures that treatment ``E`` supports a potential benefit of the experimental treatment over the control in terms of a lower variability and a higher stability of Hgb values across the visits. 

We tested the output with Firefox and Chrome Browser. There, the page is updated automatically once code is submitted. In the Internet Explorer we had to refresh the webpage manually. 

### Adding components and callbacks
So far we didn't do anything that would warrant the use of a DASH app over a common plotly-figure. To go one step further we need to add callbacks to our code and thus make the app click-able. First, let's implement a click-able checkbox so that the user can choose for which visits the boxplots are displayed.

In [None]:
# basic app 3:  add a checkbox
app = JupyterDash(__name__)

# create a boxplot
fig = px.box(
    data_frame=df,    
    x="AVISIT",
    y="AVAL",
    color="TRT01P"
    )

app.layout = html.Div([   
    html.Div([
        html.H1('Introduction to DASH'),
        html.H2('An approach to DASH with visualization of repeated measurement data.'),
        html.P('Haemoglobin in Anaemia'),
    ]),
    dcc.Graph(figure=fig),
    html.H3('Choose Visit:'),
    html.Div(        
            children=[
                dcc.Checklist(id='checkbox-options',
                    options=[
                       {'label': k, 'value': k} for k in df["AVISIT"].unique()
                    ], # options end
                    value=['Baseline', 'Week 24']
                ) # checklist end
            ] # children end
    ),  
]) # layout end

# Run app and display result inline in the notebook
app.run_server(mode='external', port=9999)

### Breakdown of above:
If you run the code above, you will notice the checkboxes below the figure. Here you can select the visits. 
* We paste the ``dcc.Checklist`` in a separate subsection as a nested children element. Here we give it an ``id='checkbox-options'`` so it can later be referenced in the callback. Hint: you can name it whatever you want to. 
* Next, in the ```options``` parameter under ``Checklist``, two arguments need to be provided: the labels and the values for the boxes. Use a for loop as a convenient way to fill both. Alternatively one could hard-code the labels and values, but that's a bad programming practice and is therefore not recommended. 
* Finally, we fill the value parameter with the default values that should be presented at the start of the app. These values are used to catch changes made on the checkboxes by the user and ultimately constitute the reference from layout to the callback (you don't need to understand it all here because we will cover callbacks below). 
<br>

If you submit the code and check the app output then it doesn't do anything yet, the figure doesn't get updated by clicking on the boxes and because of that we will implement the callbacks next. 
<br>
<br>
For more on checklists follow the link here [dcc.Checklist](https://dash.plotly.com/dash-core-components/checklist).

In [None]:
# basic app 4:  add a callback + its function
app = JupyterDash(__name__)

app.layout = html.Div([   
    html.Div([ 
        html.H1('Introduction to DASH'),
        html.H2('An approach to DASH with visualization of repeated measurement data.'),
        html.P('Haemoglobin in Anaemia'),
    ]),
    dcc.Graph(id='graph-output', figure=fig),
    html.H3('Choose Visit:'),
    html.Div(        
            children=[
                dcc.Checklist(id='checkbox-options',
                    options=[
                       {'label': k, 'value': k} for k in df["AVISIT"].unique()                        
                    ], # options end
                    value=['Baseline', 'Week 24']
                ) # checklist end
            ] # children end
    ),  
]) # layout end

# define callback and its functions
@app.callback(
    Output(component_id='graph-output', component_property='figure'),
    [Input(component_id='checkbox-options', component_property='value')])
def update_my_boxplots(selected_visits):    
    # filter dynamically based on the boxes selected in the app by the user
    filtered_df = df[df["AVISIT"].isin(selected_visits)]
    
    # create a boxplot
    fig = px.box(
        data_frame=filtered_df,    
        x="AVISIT",
        y="AVAL",
        color="TRT01P"
    )  
    
    return fig

# Run app and display result inline in the notebook
app.run_server(mode='external', port=9999)

### Breakdown of above:
Here we finally implement our callback plus its associate python function. The ``@app.callback`` is referred to as the callback decorator and it takes two arguments ``Output`` and ``Input``. Both of them have two parameters ``component_id`` and ``component_property``.
<br>
* Let's look at the ``Input`` argument first. The callback ``Input`` needs to know where in the ``app.layout`` to take its values from and what these values are. So we just go ahead and reference the ``component_id`` to the same ``id`` we provided to our ``dcc.Checklist``, namely ``checkbox-options``. Next, for the ``component_property`` fill in the name of the parameter in the dash layout component where the input values shall come from. In this case we simple tell it to take them from the value parameter in ``dcc.Checklist``. Now the callback knows which element in the layout to look at and what its input values are.
<br>
<br>
* Now let's examine the ``Output`` argument and the function ``update_my_boxplots`` which is stated directly below the callback decorator. Please note that this user defined function is called at the start of the app with the default values provided by the Checklist and is always called when the input property changes, in our case a mouse-click on the checkboxes. <br>
Once a change is made to the checkboxes, Dash provides these new values to the ``Input`` and after this to the python function below the callback decorator, which in this case is defined to be ``update_my_boxplots``. Then Dash updates the property of the output component with whatever was returned by this function. In our case it is a figure object. <br> 
Similar as we linked the ``Input`` to the layout, here we also link the ``Output`` to the layout by referencing the component where it should go to. We plug the ``component_id`` to the ``id='graph-output'`` of the layout component ``dcc.Graph`` and say that its property is ``figure``, because we want to render a graphic. Conveniently, our callback function ``update_my_boxplots`` returns a graphics-object, namely ``fig`` and this is channeled through the ``Output`` property to the ``figure`` parameter in ``dcc.Graph``, ultimately allowing for an update of the whole picture.
***
For more details on callbacks please refer to [basic-callbacks documentation](https://dash.plotly.com/basic-callbacks), as for more would go beyond the scope of this  tutorial.


## 5. Exercises A
Congratulations! You completed your first DASH application. Now you are ready to sail out into the open water to meet new challenges. But wait! Not until you tried and finished some exercises that will help you on your journey:
*** 
1. Patients with anaemia need to achieve a target range of Hgb values between 10 and 11.5 g/dL. Let's add two lines for the range to our figure so we can better judge visually the proportions of patients who achieve that goal. You can go ahead and search your way through the web to find out on how to accomplish this in python or just use the hints below:
<br>
1.1 You will need a dynamic variable for controlling the x-axis ranges. Note that we added the -0.5 to it so the right side is adjusted more evenly in the figure, it is not mandatory: 
<br>

    `length_1 = len(filtered_df["AVISIT"].unique())-0.5`
<br>    
1.2 Next you can update the figure-object with ``fig.update_layout`` and use shapes for drawing the lines. The lines are pasted as dictionaries and you will need 2 of them, each per line. Change the ``dash="dashdot`` to ``solid`` or ``dot`` or play  with the color if you want a different style for your second line. Notice how the ``x1``-value is updated dynamically with the variable we created in 1.1. You may want to play with the values to better understand what is happening and also feel free to find your own solution which may be more suitable (hint within a hint: use ``fig.add_shape`` instead of ``update_layout``). 

    ```
    fig.update_layout(
            shapes=[
                dict(
                  type= 'line',
                  yref= 'y1', y0= 10, y1= 10,
                  xref= 'x1', x0= -0.3, x1=length_1-0.2,     
                  line= dict(
                            color="Green",
                            width=2,
                            dash="dashdot",
                  )
                ), # 1st dict end   
            ], #shape end       
    ) # update_layout end 
     ```
     
<br>
2. Add a title, change the labels of the axis and add a footnote to your figure in order to let the reader know more about the context of the data and what one may infer from this visualization. 
<br>

Hint: [How to use axes](https://plotly.com/python/axes/).
      [More on styling with plotly-express](https://plotly.com/python/styling-plotly-express/).
      As for the footnote, you can either put it into a html.Div container or break the line in the x-axis text with ``<br>``.

<br>

This is a bigger exercise:
3. Now, let's do some calculations and annotations. We suggest to add the mean and standard deviation over the boxplots for each visit and treatment arm. This may substantiate the visual inference which was previously made with hard statistical measures. Now be assured, this is not an easy task if you're unfamiliar with plotly and python but it is achievable with some enthusiasm. Here are some hints to help you out:
<br>

***
* You may need to import numpy for statistical calculations:

```import numpy as np``` 
***
* For the calculation of the statistics the data has to be grouped accordingly and the aggregate function applied i.e. median, mean, standard deviation. Here is some dummy code which requires you to fill in the correct variables. 

```dataframe_name.groupby(["variable_name", "variable_name"]).agg({"numeric_variable": "statistical_function"}) # calculate statistic ```

To be able to calculate the standard deviation in the same manner as the mean (above piece of code with agg{}), you need to define a separate function. 

```
# define sample standard deviation as a function of a variable
def st_dev(x):
    return np.std(x)
```

This time don't use quotation marks in the ```.agg({"numeric_variable": my_own_function})``` part. We need them only with inbuilt module functions like the ``mean`` in python.
***

* You will need to store the x-axis values somewhere. This way you know where to put the text in the figure.
For example, you can get the x-axis tick values with the following:

```np.arange(0,len(dataframe_name["variable_name"].unique()), 1)```

Because we have two grouped boxplots at each x-axis tick, the above may need some adjustments, maybe two variables i.e. one for each treatment, one left and one right of the main tick values. 
***

 * Once you have the x-values and the calculated y-values (means or standard deviations for each visit and treatment arm), you add the annotations with something like this:

```  # loop through the xi, yi values and add them to the figure at the proper positions
    for xi, yi in zip(xvalues, y_means):
        fig.add_annotation(x=xi, y=20, text=yi, showarrow=False)
```

You may need to create two loops, one for the means and one for the standard deviations.
***
Additional hints:<br>

Note that your dataframe with the calculated statistic (mean or standard deviation) needs to be sorted according to the order of the boxplots in the figure. If you don't do this, then the values will not correspond to the proper boxplot. To achieve this task, one could merge the numerical values (i.e. AVISITN, TRT01PN) from the original dataframe to the one with the statistical calculation. 
<br>

Use the web to search for help and python functionality, maybe you can come up with something really short and beautiful to achieve this task and let us know. 

#### Last but not least, if not mentioned before check out the official DASH documentation page for more information: [dash.plotly.com](https://dash.plotly.com/)

In [None]:
# basic app 5:  with solutions

# import numpy for statistical calculations
import numpy as np 

# define sample standard deviation as a function of a variable
def st_dev(x):
    return np.std(x)

# build the app
app = JupyterDash(__name__)

app.layout = html.Div([   
    html.Div([ 
        html.H1('Introduction to DASH'),
        html.H2('An approach to DASH with visualization of repeated measurement data.'),
        html.P('Haemoglobin in Anaemia'),
    ]),
    dcc.Graph(id='graph-output', figure=fig),
    html.H3('Choose Visit:'),
    html.Div(        
            children=[
                dcc.Checklist(id='checkbox-options',
                    options=[
                       {'label': k, 'value': k} for k in df["AVISIT"].unique()                        
                    ], # options end
                    value=['Baseline', 'Week 24']
                ) # checklist end
            ] # children end
    ), 
    
]) # layout end
      
# define callback and its functions
@app.callback(
    Output(component_id='graph-output', component_property='figure'),
    [Input(component_id='checkbox-options', component_property='value')])
def update_my_boxplots(selected_visits):    
    
    # prefilter the dataframe
    filtered_df = df[df["AVISIT"].isin(selected_visits)]
    
    # get dynamic length of xaxis
    length_1 = len(filtered_df["AVISIT"].unique())-0.5
    
    # create a boxplot
    fig = px.box(
        data_frame=filtered_df,    
        x="AVISIT",
        y="AVAL",
        color="TRT01P",
        range_x = [-0.5,length_1],
        title = "Comaprison of Hgb-values in patients with Anaemia (treatment E vs. treatment C).",        
    )  
    
    # add xaxis label
    fig.update_xaxes(title_text=('Visit <br><br> The figures support a potential benefit of the experimental treatment E over the control C <br>'
                                ' in terms of a lower variability and a higher stability of Hgb within the target range across the visits.'))
    
    # update layout with horizontal lines
    fig.update_layout(    
        
        shapes=[
            dict(
              type= 'line',
              yref= 'y1', y0= 10, y1= 10,
              xref= 'x1', x0= -0.3, x1=length_1-0.2,     
              line= dict(
                        color="Green",
                        width=2,
                        dash="dashdot",
              )
            ), # 1st dict end              
            dict(
              type= 'line',
              yref= 'y1', y0= 11.5, y1= 11.5,
              xref= 'x1', x0= -0.3, x1=length_1-0.2,     
              line= dict(
                        color="Green",
                        width=2,
                        dash="dot",
              )
            ) # 2nd dict end               
        ], #shape end       
    ) # update_layout end
    
    # add annotation    
    
    # get uniques for AVISITN and TRT01PN, this is important for being able to sort the dataframe
    df_uniques = df[['AVISITN', 'AVISIT', 'TRT01PN', 'TRT01P']].drop_duplicates(subset = ["AVISITN", "TRT01PN"])
    
    # For the mean:
    mean = filtered_df.groupby(["AVISIT", "TRT01P"]).agg({"AVAL": "mean"}) # calculate statistic    
    result_mean = pd.merge(mean, df_uniques, on=['TRT01P', 'AVISIT'], how='left') # Merge the numerical unique values to the statistics dataframe
    result_mean.sort_values(by=['AVISITN', 'TRT01PN'], inplace=True, ascending=True) # sort in the right order based on numerical values

    # For the sd:
    sd = filtered_df.groupby(["AVISIT", "TRT01P"]).agg({"AVAL": st_dev}) # calculate statistic    
    result_sd = pd.merge(sd, df_uniques, on=['TRT01P', 'AVISIT'], how='left') # Merge the numerical unique values to the statistics dataframe
    result_sd.sort_values(by=['AVISITN', 'TRT01PN'], inplace=True, ascending=True) # sort in the right order based on numerical values
    
    x_vals_e = np.arange(0,len(filtered_df["AVISITN"].unique()), 1)-0.18 # create xaxis sequence for trt E and adjust for the left boxplot
    x_vals_c = np.arange(0,len(filtered_df["AVISITN"].unique()), 1)+0.18 # create xaxis sequence for trt C and adjust for the right boxplot
    xvalues = [*x_vals_e, *x_vals_c] # put the values for trt E and trt C in one list
    xvalues.sort() # sort the list values in ascending order
    
    # create label vectors for mean and sd, this is what will she shown in the figure
    y_means = "Mean: " + result_mean["AVAL"].round(2).astype(str)
    y_sd = "SD: " + result_sd["AVAL"].round(2).astype(str)

    # loop through the xi, yi values and add them to the figure at the proper positions
    for xi, yi in zip(xvalues, y_means):
        fig.add_annotation(x=xi, y=20, text=yi, showarrow=False)
    for xi, yi in zip(xvalues, y_sd):
        fig.add_annotation(x=xi, y=19, text=yi, showarrow=False)
            
    return fig

# Run app and display result inline in the notebook
app.run_server(mode='external', port=9999)