<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#End-to-End-Example:-Working-with-Tabular-Datasets" data-toc-modified-id="End-to-End-Example:-Working-with-Tabular-Datasets-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>End-to-End Example: Working with Tabular Datasets</a></span><ul class="toc-item"><li><span><a href="#Load-data-from-5-weather-stations-in-Illinois" data-toc-modified-id="Load-data-from-5-weather-stations-in-Illinois-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Load data from 5 weather stations in Illinois</a></span></li><li><span><a href="#Declaring-dimensions" data-toc-modified-id="Declaring-dimensions-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Declaring dimensions</a></span></li><li><span><a href="#Mapping-dimensions-to-elements" data-toc-modified-id="Mapping-dimensions-to-elements-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Mapping dimensions to elements</a></span></li><li><span><a href="#Faceting-dimensions" data-toc-modified-id="Faceting-dimensions-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Faceting dimensions</a></span><ul class="toc-item"><li><span><a href="#Overlay-of-Curves" data-toc-modified-id="Overlay-of-Curves-1.4.1"><span class="toc-item-num">1.4.1&nbsp;&nbsp;</span>Overlay of Curves</a></span></li><li><span><a href="#Layout-of-HeatMaps" data-toc-modified-id="Layout-of-HeatMaps-1.4.2"><span class="toc-item-num">1.4.2&nbsp;&nbsp;</span>Layout of HeatMaps</a></span></li><li><span><a href="#Adjoint-Layout-of-Scatter" data-toc-modified-id="Adjoint-Layout-of-Scatter-1.4.3"><span class="toc-item-num">1.4.3&nbsp;&nbsp;</span>Adjoint Layout of Scatter</a></span></li></ul></li><li><span><a href="#HoloMap-of-Points" data-toc-modified-id="HoloMap-of-Points-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>HoloMap of Points</a></span></li><li><span><a href="#GridMatrix-of-Bars" data-toc-modified-id="GridMatrix-of-Bars-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>GridMatrix of Bars</a></span></li><li><span><a href="#Attribution" data-toc-modified-id="Attribution-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Attribution</a></span></li><li><span><a href="#Onwards" data-toc-modified-id="Onwards-1.8"><span class="toc-item-num">1.8&nbsp;&nbsp;</span>Onwards</a></span></li></ul></li></ul></div>

<br>
<img src="https://i.imgur.com/gvrbAjo.png" width="50%" style="margin: 0px 25%">

# End-to-End Example: Working with Tabular Datasets

In this tutorial all the information you have learned in the previous sections will finally really pay off. We will discover how to facet data and use different element types to explore and visualize the data contained in a real dataset.

In [None]:
from holoext.bokeh import Mod
import numpy as np
import pandas as pd
import holoviews as hv
import warnings

warnings.filterwarnings('ignore')  # bokeh deprecation warnings
hv.extension('bokeh')

## Load data from 5 weather stations in Illinois

In [None]:
df = pd.read_parquet('../datasets/weather_station_data.parquet')
df.head()

## Declaring dimensions

Mathematical variables can usually be described as **dependent** or **independent**. In HoloViews these correspond to value dimensions and key dimensions (respectively).

In this dataset ``'station'`` and ``'date'`` are independent variables or key dimensions, while the remainder are automatically inferred as value dimensions:


In [None]:
ds = hv.Dataset(df, kdims=['station', 'date', 'year', 'month'])

In [None]:
ds

## Mapping dimensions to elements

Once we have a ``Dataset`` with multiple dimensions we can map these dimensions onto elements using the ``.to`` method. The method takes four main arguments:

1. The element you want to convert to
2. The key dimensions (or independent variables to display)
3. The dependent variables to display
4. The dimensions to group by


As a first simple example let's go through such a declaration:

1. We will use a ``Curve``
2. Our independent variable will be the 'date'
3. Our dependent variable will be 'precip_cumsum_in', 'precip_in'
4. We will ``groupby`` the 'station'

In [None]:
curve = ds.to(hv.Curve, kdims=['date'], vdims=['precip_cumsum_in', 'precip_in'], groupby='station')
Mod(xlabel='Date', ylabel='Cumulative Precip [in]').apply(curve)

## Faceting dimensions

In the previous section we discovered how to facet our data using the ``.overlay``, ``.grid`` and ``.layout`` methods. Instead of working with more abstract FM modulation signals, we now have concrete variables

###  Overlay of Curves

In [None]:
STATIONS = df.station.unique() # a list of unique stations
STATIONS

In [None]:
curve_list = []
for station in STATIONS:
    curve = hv.Curve(
        ds.select(station=station),
        kdims=['date'],
        vdims=['precip_cumsum_in', 'precip_in'],
        label=station)
    curve = curve.opts(style=dict(line_alpha=0.75))  # style the curves
    curve_list.append(curve)

In [None]:
curves_overlay = hv.Overlay(curve_list)
Mod(xlabel='Date',
    ylabel='Cumulative Precip [in]',
    tools=['ypan', 'hover', 'ywheel_zoom', 'save', 'reset']).apply(curves_overlay)

### Layout of HeatMaps

In [None]:
max_hmap = ds.to(
    hv.HeatMap,
    kdims=['year', 'month'],
    vdims=['max_temp_f'],
    groupby='station',
    label='Max Temp [F]').redim.range(max_temp_f=(10, 110))

min_hmap = ds.to(
    hv.HeatMap,
    kdims=['year', 'month'],
    vdims=['min_temp_f'],
    groupby='station',
    label='Min Temp [F]').redim.range(min_temp_f=(0, 80))

hmaps = max_hmap + min_hmap

Mod(xlabel='Year', ylabel='Month', width=1500).apply(hmaps).cols(1)

### Adjoint Layout of Scatter

In [None]:
# define plot and style options for different elements
scatter_opts = dict(width=500, height=500)
scatter_style = dict(alpha=0.35, size=5, color='red')
hist_style = dict(alpha=0.75, line_color=None)

opts = {
    'Scatter': {
        'plot': scatter_opts,
        'style': scatter_style
    },
    'Histogram': {
        'style': hist_style
    }
}

In [None]:
scatter = ds.to(
    hv.Scatter,
    kdims=['avg_wind_speed_kts'],
    vdims=['avg_wind_drct'],
    groupby='station')

scatter_hists = scatter.hist(
    num_bins=100, dimension=['avg_wind_speed_kts',
                           'avg_wind_drct']).opts(opts).redim.label(
                               avg_wind_speed_kts='Wind Speed [kts]',
                               avg_wind_drct='Wind Dir. [deg]',
                               avg_wind_speed_kts_frequency='Freq.',
                               avg_wind_drct_frequency='Freq.')


Mod().apply(scatter_hists)

## HoloMap of Points

In [None]:
points = ds.to(
    hv.Points,
    kdims=['date', 'max_temp_f'],
    vdims=['precip_in'],
    groupby='station',
    group='Temperature and Precipitation for').opts(
        style={'Points': dict(alpha=0.35)})

Mod(tools=['hover', 'save', 'ypan', 'ywheel_zoom'],
    ylabel='Max Temp [F]',
    xlabel='Date',
    width=1000,
    num_xticks=15,
    xrotation=35,
    size_index=2,
    color_index=2,
    scaling_factor=25,
    colorbar_n=5,
    colorbar_title='[in]').apply(points).redim.range(precip_in=(0, 5))

## GridMatrix of Bars

In [None]:
def make_box(month, year):
    """Creates and returns a box and whisker plot for given month and year"""
    sub_ds = ds.select(month=month, year=year)
    text = hv.Text('', 45, '{0:2d}/{1:s}'.format(
        month,
        str(year)[2:])).opts(
            style=dict(text_alpha=0.5, text_font_size='12px'))
    box = hv.BoxWhisker(sub_ds, 'station', 'max_rh')
    return box * text


boxes = {
    (month, year): make_box(month, year)
    for month in df['month'].unique()[
        2::4]  # to speed it up the computation, select a few months
    for year in df['year'].unique()[-4:]  # Slice the last 4 years
}

gridmatrix = hv.GridMatrix(boxes)
title = 'Max Relative Humidity Spread'
Mod(
    axiswise=False,
    xlabel='Station',
    ylabel='Max RH [%]',
    autosize=True,
    title_format=title,
    label_scaler=0.6,
    merge_tools=True,
    tools=['save', 'hover'],
    logo=False,
    plot_size=250  # width/height doesn't work in grid matrix
).apply(gridmatrix)

## Attribution

The content for this section is adapted from [holoext gallery](https://holoext.readthedocs.io/en/latest/examples/examples.html).

## Onwards
 
* Go through the Tabular Data [getting started](http://build.holoviews.org/getting_started/Tabular_Datasets.html) and [user guide](http://build.holoviews.org/user_guide/Tabular_Datasets.html).
* Learn about slicing, indexing and sampling in the [Indexing and Selecting Data](http://holoviews.org/user_guide/Indexing_and_Selecting_Data.html) user guide.

The next section shows a similar approach, but for working with gridded data, in multidimensional array formats.
