# HvPlot with Pandas

<img src='img/hvplot-wm.png' width=10% align='right'>

Pandas operations followed by `.hvplot()` is a powerful way to interact with your data.

In this section we'll cover just a few basic plotting techniques. More will come later in this course.

# Table of Contents
* [HvPlot with Pandas](#HvPlot-with-Pandas)
* [Using HvPlot .hvplot()](#Using-HvPlot-.hvplot%28%29)
	* [Line plots](#Line-plots)
		* [Timeseries](#Timeseries)
	* [Overlays](#Overlays)
	* [Scatter](#Scatter)
	* [Box Plots](#Box-Plots)
	* [Histograms](#Histograms)
	* [2D Histograms](#2D-Histograms)
	* [Bar Charts](#Bar-Charts)
    * [Geographic data](#Geographic-data)


In [None]:
import numpy as np
import pandas as pd

# Using HvPlot .hvplot()

Pandas plotting with `.hvplot` is driven by Bokeh by default. Many plot options are set by default. Some plot styles perform statistical operations.

There are several plot types available using the `.hvplot.<type>()` methods. See [hvplot documentation](https://hvplot.pyviz.org/index.html).

In [None]:
import hvplot.pandas

## Line plots

The default is to plot all columns at once. It can crowd the plot.

In [None]:
degrees = pd.read_csv('data/percent-bachelors-degrees-women-usa.csv', 
                      index_col='Year')
degrees.hvplot()

We can make a plot with a subset of the columns.

<div class='alert alert-info'>
<img src='img/topics/Essential-Concept.png' align='left' style='padding:10x'>
<br>
    <big>Use <tt>width=</tt> and <tt>height=</tt> to adjust the width and height of the plot in pixels</big>
<br><br>
</div>

In [None]:
stem=['Computer Science', 
      'Math and Statistics', 
      'Engineering', 
      'Physical Sciences', 
      'Biology']
degrees[stem].hvplot.line(width=900, height=400)

### Timeseries

Timeseries formatting is handled gracefully with Pandas.

<div class='alert alert-info'>
<img src='img/topics/Essential-Concept.png' align='left' style='padding:10x'>
<br>
<big><tt>subplots=True</tt> works for any plot over more than one column of data</big>
<br><br>
</div>

In [None]:
aapl = pd.read_csv('data/AAPL.csv', parse_dates=True, index_col='Date')

In [None]:
(aapl
     .loc['jan 2007', ['Close','Volume']]
     .hvplot.line(subplots=True, shared_axes=False)
     .cols(1)
)

## Overlays

<div class='alert alert-info'>
<img src='img/topics/Essential-Concept.png' align='left' style='padding:10x'>
<br>
<big>Use the <tt>*</tt> operator to overlay two separate calls to <tt>.hvplot</tt>.
<br><br>
</div>

In [None]:
goog = pd.read_csv('data/goog.csv', parse_dates=True, index_col='Date')

goog_returns = goog.loc['2010':'2010', 'Close'].pct_change()
aapl_returns = aapl.loc['2010':'2010', 'Close'].pct_change()

`legend=True` ensures that labels are printed.

In [None]:
goog_plot = goog_returns.hvplot.line(label='GOOG', legend=True)
aapl_plot = aapl_returns.hvplot.line(label='AAPL', legend=True)

goog_plot * aapl_plot

## Scatter

Plot bivariate `x` and `y` data stored in columns.

In [None]:
auto = pd.read_csv('data/auto-mpg.csv')

auto.hvplot.scatter(x='hp', y='mpg')

<div class='alert alert-info'>
<img src='img/topics/Essential-Concept.png' align='left' style='padding:10x'>
<br>
<big>Scatter plots accept <tt>by=</tt>, to color each point by the value in the column and <tt>s=</tt> to change the size of each glyph.</big>
<br><br>
</div>

By default all plotted columns appear in the hover tool. Use `hover_cols=` to add more columns.


`padding=` helps move points away from the edges of the plot.

In [None]:
auto.hvplot.scatter(x='hp', y='mpg', by='origin',
                    hover_cols=['origin','name','cyl'],
                    s=auto['weight']**2 / 100000,
                    width=900, padding=0.05)

## Box Plots

Box Plots provide a quick statistical overview of column data.

<div class='alert alert-info'>
<img src='img/topics/Essential-Concept.png' align='left' style='padding:10x'>
<br>
<big><tt>.box()</tt> supports grouping</big>
<br><br>
</div>

In [None]:
auto.hvplot.box('mpg', by='origin')

## Histograms

In [None]:
pit = pd.read_csv('data/pittsburgh2013.csv', 
                  parse_dates=['Date'], 
                  index_col='Date')

In [None]:
pit['Max TemperatureF'].hvplot.hist()

In [None]:
pit['Max TemperatureF'].hvplot.hist(bins=100)

The Kernel Density Plot is closely related to a histogram.

In [None]:
pit['Max TemperatureF'].hvplot.kde()

<div class='alert alert-info'>
<img src='img/topics/Essential-Concept.png' align='left' style='padding:10x'>
<br>
    <big>Plots options can be changed with <tt>.opts()</tt>
<br><br>
</div>

See the [Holoviews Customization documentation](http://holoviews.org/user_guide/Applying_Customizations.html) for more options.

To overlay the two it's best to start with two plot objects.

In [None]:
hist = pit['Max TemperatureF'].hvplot.hist(normed=True)
kde = pit['Max TemperatureF'].hvplot.kde().opts(bandwidth=0.15)

overlay = hist * kde.opts(fill_color=None, line_color='red', line_width=5, line_alpha=0.8)
overlay

## 2D Histograms

2D histograms (also known as hexbins) have a reasonable default bin size and can be changed with `gridsize=`.

In [None]:
auto.hvplot.hexbin(x='hp', y='mpg', gridsize=12)

## Bar Charts

Bar charts can be constructed from one or more columns of numeric data. The important part is that the X-axis data must be in the Index.

In [None]:
medals = pd.read_csv('data/medals.csv', index_col='name')
medals.head()

Pandas operations like pivot help prepare data for plotting.

In [None]:
won = medals['count'] > 0

to_plot = (medals
           .loc[won]
           .pivot(columns='medal', values='count')
           [['bronze','silver','gold']]
)

(to_plot
 .assign(total=to_plot.sum(axis='columns'))
 .sort_values('total')
 .drop(columns='total')
 .hvplot.bar(stacked=True, rot=45, padding=0.02, cmap=['saddlebrown','silver','goldenrod'])
)

## Geographic data

HvPlot utilizes [GeoViews](http://geoviews.org/index.html) and [GeoPandas](http://geopandas.org/) to support geographica plotting.

The file `state.json` contains a `POLYGON` entry for each state indicating its borders. The `water_percent` column indicates the amount of the state's area that is taken up by water.

In [None]:
import geopandas as gpd
states = gpd.read_file('data/state.json')

states['water_percent'] = states['AWATER'] / (states['ALAND'] + states['AWATER']) * 100

states.head()

Projections are changed with fucntions provided by [cartopy](https://scitools.org.uk/cartopy/docs/latest/).

In [None]:
from cartopy import crs

states.hvplot.polygons(c='water_percent', colorbar=True,
                       projection=crs.Orthographic(-100, 30))

GeoViews also supports plotting by Latitude and Longitude. The `airports.csv` file contains information on airports in the US.

In [None]:
airports = pd.read_csv('data/airports.csv')

borders = states[['geometry']].hvplot.polygons(color='white', project=True)

airport_plot = airports.hvplot.points('Longitude', 'Latitude', geo=True,
                                      hover_cols=['Name', 'IATA'],
                                      alpha=0.2, height=500,
                                      project=True)

(
  (borders * airport_plot)
   .opts(projection=crs.PlateCarree(), xlim=(-130, -65), ylim=(20, 55))
)

If you have an internet connection Geoviews provides access to [tile sources](http://geoviews.org/user_guide/Working_with_Bokeh.html) to enable overlays with map providers.

In [None]:
# import geoviews.tile_sources as gts

# (gts.StamenTerrain * airport_plot.opts(color='red'))

<font color='grey'><i>Copyright Anaconda 2012-2019 All Rights Reserved.</i></font>