<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# Dashboards for Big Data

Now that we've scaled our data to read across all of our .csvs for a single day, let's scale it even further by allowing users to interact with our map and select which day they wish to view.

## Objects

* Learn how to arrange elements on a Ploty Dash dashboard
* Learn how to make a dashboard interactive

## Planning Ahead

There are many different ways to build a dashboard. It can be easy to get lost sometimes, so before we write any code, let's sketch out the features and layout that we'd like to see. In the end, our dashboard will look something like this, but with your style as defined in the previous lab.

<center><img src=images/Sample_Map.png width=400px /></center>

[Plotly Dash](https://plotly.com/dash/) is a framework that extends Plotly so we can turn our figures into a web service that serves a dashboard. A little bit of [HTML](https://www.w3schools.com/html/) and [CSS](https://www.w3schools.com/css/) knowledge will be useful here, but for the most part, this dashboard is built using Python.

Here's how the web service will work:
* When a user visits our site, they can select a date to view precipitation for
* Plotly has a mode to do the calculation on the client (user's computer), but not everyone has a supercomputer to breeze through data calculation
* The selected date will be sent to the server (our computer), so our GPU can handle filtering the data
* Our GPU filters the data for the date, sends it back to our host (CPU + RAM), so Plotly can generate a new graph and send it to the client.

Let's get the base of our server setup. We will be using a version of Dash built for Jupyter notebooks called [Jupyter Dash](https://github.com/plotly/jupyter-dash). The below cell will define our server.

In [1]:
from jupyter_dash import JupyterDash

app = JupyterDash(__name__)
server = app.server

## Preparing the Data

Next, we'll need to decide how much data we want to compute in advance and how much we'll calculate on the fly when our users query. There is often a trade-off between how much space we have available and how quickly a dashboard can respond.

First, let's load our data.

In [2]:
import dask_cudf
import numpy as np

df = dask_cudf.read_csv(
    "data/*.csv",
    usecols=["STATION", "LATITUDE", "LONGITUDE", "DlySum", "DATE"],
    dtype={
        "STATION": "object",
        "LATITUDE": np.float32,
        "LONGITUDE": np.float32,
        "DlySum": np.uint32,
        "DATE": str,
    },
    na_values=["-9999"],
)

**TODO**. Let's also convert from hundredths of an inch to inches in order to make the data more human-readable. Feel free to add any other columns you think might be useful, such as a column for hover text. We've added the `%%time` cell magic to keep track of how long these operations take.

In [None]:
%%time
# Define columns
df["Inches"] = FIXME
df["TEXT"] = df["STATION"]

In [3]:
%%time
df["Inches"] = df["DlySum"] / 100
df["TEXT"] = df["STATION"] + ": " + df["Inches"].astype(str) + " inches"

CPU times: user 424 ms, sys: 12 ms, total: 436 ms
Wall time: 437 ms


Certain operations are going to be faster in Dask's parallel computing environment, but some will be faster in a single thread environment. Knowing which is which takes a little theory and a whole lot of practice (Check out Dask's [Best Practices](https://docs.dask.org/en/latest/dataframe-best-practices.html) page).

As for the theory, the goal is to keep communication between parallel threads to a minimum. For instance, creating the `Inches` and `TEXT` columns above are row-independent operations, so Dask can be used to its full advantage.

On the other hand, computing the minimum and maximum date below requires the partitions to pull their results together in order to determine which partition has the correct minimum and maximum.

[set_index](https://docs.rapids.ai/api/cudf/stable/api.html#cudf.core.dataframe.DataFrame.set_index) is also an expensive operation in the Dask environment as it sorts the values to set the index. It may take some time to calculate now, but it's going to dramatically improve the speed at which users can filter on a date for the dashboard. Time is precious, but if you have extra time, we encourage removing the index and seeing how much of a difference it makes when filtering on the date.

In order to pull these operations out of the parallel programming environment, we'll go ahead and pull the DataFrame into vanilla cuDF using [compute](https://docs.rapids.ai/api/cudf/stable/dask-cudf.html) method.

In [4]:
%%time
df = df.compute()

CPU times: user 28.6 s, sys: 5.46 s, total: 34.1 s
Wall time: 34.1 s


<sub>Even though we're using `DATE` as a string, this still returns a correct result since the date is in a YYYY-MM-DD format.</sub>

In [5]:
%%time
# Faster to do before compute
date_min = df["DATE"].min()
date_max = df["DATE"].max()

CPU times: user 4 ms, sys: 12 ms, total: 16 ms
Wall time: 14.7 ms


In [6]:
%%time
df = df.set_index('DATE')

CPU times: user 20 ms, sys: 36 ms, total: 56 ms
Wall time: 52.4 ms


## Inputs

Let's focus on the layout next. Dash has a number of [Inputs](https://dash.plotly.com/dash-core-components/input) that we can use as inputs to a Python function. Since our users would like to filter by the date, let's use the [DatePickerSingle](https://dash.plotly.com/dash-core-components/datepickersingle) input. (There's a [DatePickerRange](https://dash.plotly.com/dash-core-components/datepickerrange) input to allow filtering on multiple dates, but for our geo map, that would cause points to overlap each other.)

Another question we might ask ourselves is whether or not we want to include points with zero precipitation. On the one hand, it's interesting seeing the locations of the stations even without precipitation. On the other hand, it can add a bit of confusion if we're only interested in areas with precipitation. We'll let the user decide what's best for them with a switch. [Dash DAQ](https://dash.plotly.com/dash-daq) extends the Dash library by providing even more input widgets. In our case, we'll be using a [Boolean Switch](https://dash.plotly.com/dash-daq/booleanswitch).

In order to organize these inputs and make them look snazzy, Dash has replicated a number of [HTML](https://dash.plotly.com/dash-html-components) tags. One of the most common ones is [div](https://dash.plotly.com/dash-html-components/div) which acts as an invisible box around the content inside it in order to group it together. The following layout in the code cell below will result in something like this:

<center><img src=images/Divs.png width=200px /></center>

In [7]:
import dash_daq as daq
import dash_html_components as html
import dash_core_components as dcc

initial_date = "2021-01-01"

# Uncomment if app needs to be redefined
#app = JupyterDash(__name__)
#server = app.server

app.layout = html.Div([
    html.Div([
        dcc.DatePickerSingle(
            id='my-date-picker-single',
            min_date_allowed=date_min,
            max_date_allowed=date_max,
            initial_visible_month=initial_date,
            date=initial_date
        ),
        daq.BooleanSwitch(
            id='show-zeros',
            on=True,
            label="Show Zeros",
            style={'display': 'inline-block'}
        )
    ]),
    dcc.Graph(id='precipitation-map')
])

## Outputs

Finally, where the magic comes in. A few of the elements above have an `id`. We are going to use these `id`s to map the user's inputs to a Python function that will generate our graph. These `id`s can also be outputs, like what we'll be doing with the [dcc.Graph](https://dash.plotly.com/dash-core-components/graph) element.

Below, we've created an `update_graph` function as a [Plotly Callback](https://dash.plotly.com/basic-callbacks). It expects the following:

* A single `dash.dependencies.Output`. We've linked our [dcc.Graph](https://dash.plotly.com/dash-core-components/graph) by adding it's id (`'precipitation-map'`) and the property we want to output to (`figure`).
* A list of `dash.dependencies.Input`s. Again, we've linked the input by their ids (`my-date-picker-single` and `show-zeros`).
 * We can use almost any of the element's properties as inputs. We can view them with Python's `help` function or at the end of their [documentation page](https://dash.plotly.com/dash-core-components/datepickersingle).

In [8]:
help(dcc.DatePickerSingle)

Help on class DatePickerSingle in module dash_core_components.DatePickerSingle:

class DatePickerSingle(dash.development.base_component.Component)
 |  DatePickerSingle(id=undefined, date=undefined, min_date_allowed=undefined, max_date_allowed=undefined, initial_visible_month=undefined, day_size=undefined, calendar_orientation=undefined, is_RTL=undefined, placeholder=undefined, reopen_calendar_on_clear=undefined, number_of_months_shown=undefined, with_portal=undefined, with_full_screen_portal=undefined, first_day_of_week=undefined, stay_open_on_select=undefined, show_outside_days=undefined, month_format=undefined, display_format=undefined, disabled=undefined, clearable=undefined, style=undefined, className=undefined, loading_state=undefined, persistence=undefined, persisted_props=undefined, persistence_type=undefined, **kwargs)
 |  
 |  A DatePickerSingle component.
 |  DatePickerSingle is a tailor made component designed for selecting
 |  a single day off of a calendar.
 |  
 |  The Da

Our inputs will be fed into our [decorated function](https://www.python.org/dev/peps/pep-0318/) in the same order as the list.

* `date_value` corresponds to `my-date-picker-single`'s `date`
* `show_zeros` corresponds to `show-zeros`'s `on`

After this setup, we can create our function in Python as normal and do pretty much whatever we want, so long as the function ends up returning something to match what our output expects, in this case, a figure.

**TODO:** Add your style from the previous lab. We've made a copy of the cell below in the following `...` as a reference.

In [9]:
import dash
import plotly.graph_objects as go
import pandas as pd

@app.callback(
    dash.dependencies.Output('precipitation-map', 'figure'),
    [dash.dependencies.Input('my-date-picker-single', 'date'),
    dash.dependencies.Input('show-zeros', 'on')])
def update_graph(date_value, show_zeros):
    dff = df[df.index==date_value]
    dff = dff if show_zeros else dff[dff["Inches"] != 0]
    dff = dff.to_pandas()

    fig = go.Figure([go.Scattergeo(
        lon=dff['LONGITUDE'],
        lat=dff['LATITUDE'],
        mode='markers',
        marker_color=dff['Inches'],
        marker = dict(
            reversescale = True,
            autocolorscale = False,
            colorscale = 'Blues',
            cmin = 0,
            color = dff['Inches'],
            cmax = dff['Inches'].max(),
            colorbar_title="Precipitation in Hundredths of an Inch"
        ),
        text=dff['TEXT'])])

    fig.update_layout(
        title = 'USA Precipitation for ' + str(date_value),
        geo = dict(
            scope='usa',
            projection_type='albers usa',
            landcolor = "rgb(225, 225, 225)",
            subunitcolor = "rgb(200, 200, 200)",
        ),
    )

    return fig

In [11]:
import dash
import plotly.graph_objects as go
import pandas as pd

@app.callback(
    dash.dependencies.Output('precipitation-map', 'figure'),
    [dash.dependencies.Input('my-date-picker-single', 'date'),
    dash.dependencies.Input('show-zeros', 'on')])
def update_graph(date_value, show_zeros):
    dff = df[df.index==date_value]
    dff = dff if show_zeros else dff[dff["Inches"] != 0]
    dff = dff.to_pandas()

    fig = go.Figure([go.Scattergeo(
        lon=dff['LONGITUDE'],
        lat=dff['LATITUDE'],
        mode='markers',
        marker_color=dff['Inches'],
        marker = dict(
            reversescale = True,
            autocolorscale = False,
            colorscale = 'Blues',
            cmin = 0,
            color = dff['Inches'],
            cmax = dff['Inches'].max(),
            colorbar_title="Precipitation in Hundredths of an Inch"
        ),
        text=dff['TEXT'])])

    fig.update_layout(
        title = 'USA Precipitation for ' + str(date_value),
        geo = dict(
            scope='usa',
            projection_type='albers usa',
            landcolor = "rgb(225, 225, 225)",
            subunitcolor = "rgb(200, 200, 200)",
        ),
    )

    return fig

The below cell will start the server. The [debug](https://dash.plotly.com/devtools) parameter will add a blue button on the bottom right of dashboard that will keep track of our errors as opposed to displaying them in our terminal.

In [12]:
if __name__ == '__main__':
    app.run_server(host='0.0.0.0', debug=True)

Dash app running on http://0.0.0.0:8050/


  func()


Ready to see your dashboard in action? Copy and paste the URL (web address) for this notebook and set it to my_url below. Click on it to check out the dashboard!

In [None]:
from IPython.core.display import display, HTML
my_url = "COPY_NOTEBOOK_URL"
my_url = my_url.rsplit(".com", 1)[0] + ".com/plotly"
display(HTML('<a href="' + my_url + '">To the dashboard!</a>'))

Happy with the result? Want to make some changes? The app will need to be rebuilt in order to properly display any updates. `app._terminate_server_for_port("localhost", 8050)` will shut down the app and `del app` will destroy the app so it can be redefined.

This way, we do not have to reload our data.

**There will be a duplicate callback error if the cell definining the callbacks was ran more than once. Delete `app` and redefine it to remove the error.**

If you're feeling ready for the assessment, please run the cell below again to free up resources for the assessment.

In [None]:
app._terminate_server_for_port("localhost", 8050)
del app

Congrats on getting through the course! All that's left is the assessment. Good luck!

Interested in taking your dashboarding skills even further? This exercise was inspired by a partnership between Plotly and NVIDIA to make an interactive [COVID Cases Dashboard](https://medium.com/plotly/plotly-and-nvidia-partner-to-integrate-dash-and-rapids-8a8c53cd7daf). All the code for it is available freely on [GitHub](https://github.com/rapidsai/plotly-dash-rapids-census-demo/tree/covid-19).

In [None]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>