# Visualisatie - Voorbeelden

In dit notebook vind je voorbeelden van visualisaties met:
 - <a href="#matplotlib"> Matplotlib </a>
 - <a href="#holoviews"> HoloViews </a>
 - <a href="#seaborn"> Seaborn </a>

<a id="matplotlib"></a>
## Visualisatie met Matplotlib


Hieronder een samenvatting van de belangrijkste onderdelen van Jake VanderPlas Ch4: Visualization with Matplotlib

Andere bronnen:

- https://matplotlib.org/users/pyplot_tutorial.html
- https://www.datacamp.com/community/tutorials/matplotlib-tutorial-python
- http://pandas.pydata.org/pandas-docs/stable/visualization.html

In [None]:
# zorgen dat plots in het notebook komen
%matplotlib inline

# importeren van de matplotlib
import matplotlib.pyplot as plt

# andere bibliotheken
import numpy as np
import pandas as pd

In [None]:
# een eerste voorbeeld
x = np.linspace(0,10,100)   # 100 getallen tussen de 0 en 10
y = x + 1
plt.plot(x,y)
plt.show()

In [None]:
# nu met 2 lijnen
y2 = -x + 10
plt.plot(x,y)
plt.plot(x,y2)
plt.show()

In [None]:
# zelf soort lijn en kleuren bepalen
plt.plot(x,y,color='red', linestyle = '-')  # rode doorlopende lijn
plt.plot(x,y2,color='green', linestyle = '--')  # blauwe gestreepte lijn
plt.show()

In [None]:
# idem, verkorte schrijfwijze
plt.plot(x,y,'-r')  # rode doorlopende lijn
plt.plot(x,y2,'--g')  # blauwe gestreepte lijn
plt.show()

In [None]:
# assen indelen
plt.xlim(-1,11)
plt.ylim(0,15)
plt.plot(x,y,'-r')  # rode doorlopende lijn
plt.show()

In [None]:
# toewijzen Labels
plt.xlim(-1,11)
plt.ylim(0,15)

plt.plot(x,y,'-r', label='x + 1')  # rode doorlopende lijn
plt.plot(x,y2,'--g', label='-x + 10')  # blauwe gestreepte lijn

# Titel boven de grafiek:
plt.title('Lijnen')
# X-as label
plt.xlabel('X-waarde')
# Y-as label
plt.ylabel('Y-waarde')
# afdrukken legenda
plt.legend()

# annotatie
plt.annotate('Midden', xy=(5,7),xytext=(7,12), arrowprops=dict(facecolor='black',shrink=0.05))

plt.show()

In [None]:
import matplotlib.pylab as pylab

params = {'axes.labelsize': 'xx-large',
          'xtick.labelsize': 'xx-large',
          'ytick.labelsize': 'xx-large'}
 
pylab.rcParams.update(params)

In [None]:
# De OO-interface

# Hierboven wordt de standaard matplotlib functies gebruikt. 
# Maar Matplotlib kan ook meer OO-gericht aangestuurd worden.

# object figure: container voor alle grafische objecten
fig = plt.figure(figsize=(9,6), dpi=150)

# object axes: de grafiekbasis
ax = plt.axes()

ax.set(xlim=(-1,11),
       ylim=(0,15),
       xlabel='X-waarde',
       ylabel='Y-waarde',
       title='Lijnen')

ax.plot(x,y,'-r')
ax.plot(x,y2,'--g')

ax.legend(['x+1','-x+10'])
ax.annotate('Midden', xy=(5,7),xytext=(7,12), arrowprops=dict(facecolor='black',shrink=0.05))

In [None]:
# Variaties in ticks (as-gegevens) (OO interface)
fig = plt.figure()
ax = plt.axes()
ax.set(xlim=(-1,11),
       ylim=(0,15),
       xlabel='X-waarde',
       ylabel='Y-waarde',
       title='Lijnen')

# beperk X-as tot 4 punten:
ax.xaxis.set_major_locator(plt.MaxNLocator(4))

ax.plot(x,y,'-r')
ax.plot(x,y2,'--g')


In [None]:
# Subplots (standaard interface)

plt.subplot(1,2,1)  # 1 rij, 2 kolommen, kolom 1
plt.xlim(-1,11)
plt.ylim(0,15)
plt.plot(x,y,'-r')  # rode doorlopende lijn

plt.subplot(1,2,2)  # 1 rij, 2 kolommen, kolom 1
plt.xlim(-1,11)
plt.ylim(0,15)
plt.plot(x,y2,'--g')  # groene gestreepte lijn

plt.show()

In [None]:
# subplots (OO-interface)

# 2 rijen 3 kolommen
fig, ax = plt.subplots(2,3,sharex='col', sharey='row')

# plt.subplots: 0-based array

# 1e rij, kolom 1
ax[0,0].plot(x,y,'-r')

# 2e rij, kolom 3
ax[1,2].plot(x,y2,'--g')

## Andere grafieken

In [None]:
# Scatter grafiek
xs = np.linspace(0,10,11)   # 11 getallen tussen de 0 en 10

ys = xs + 1
plt.scatter(xs,ys,marker='o')   # marker kent heel veel instellingsmogelijkheden

In [None]:
# Errorbar grafiek, met seaborn opmaak
plt.style.use('seaborn-white')
xe = np.linspace(0,10,11)   # 10 getallen
ye = xe * xe    # kwadrateren
dy = ye * 0.2   # error: +/-20% van de berekende waarde
plt.errorbar(xe, ye, yerr=dy, fmt='.k')
plt.show()

In [None]:
# Histogram, verdeling van de waarden

yh = np.random.randn(1000)   # 1000 getallen rond de 0, die normaal verdeeld zijn
plt.hist(yh,bins=9)          # histogram van 9 'balken'
plt.show()

In [None]:
# Bar chart
xb = np.linspace(0,10,11)   # 10 getallen
yb = xb * xb    # kwadrateren
plt.bar(xb,yb)
plt.show()

In [None]:
# Pie chart
yp = np.array([25,50,35,70])
plt.pie(yp)
plt.show()

<a id="holoviews"></a>
## HoloViews


### Why HoloViews?

HoloViews is an [open-source](https://github.com/ioam/holoviews/) Python 2 and 3 library for data analysis and visualization. Python already has excellent tools like numpy, pandas, and xarray for data processing, and bokeh and matplotlib for plotting, so why yet another library?

**HoloViews helps you understand your data better, by letting you work seamlessly with both the data *and* its graphical representation.**

HoloViews focuses on bundling your data together with the appropriate metadata to support both analysis and visualization, making your raw data *and* its visualization equally accessible at all times.  This process can be unfamiliar to those used to traditional data-processing and plotting tools, and this getting-started guide is meant to demonstrate how it all works at a high level.  More detailed information about each topic is then provided in the [User Guide](../user_guide/).

With HoloViews, instead of building a plot using direct calls to a plotting library, you first describe your data with a small amount of crucial semantic information required to make it visualizable, then you specify additional metadata as needed to determine more detailed aspects of your visualization. This approach provides immediate, automatic visualization that can be effortlessly requested at any time as your data evolves, rendered automatically by one of the supported plotting libraries (such as Bokeh or Matplotlib). 


### Tabulated data: subway stations

To illustrate how this process works, we will demonstrate some of the key features of HoloViews using a collection of datasets related to transportation in New York City. First let's run some imports to make [numpy](http://numpy.org) and [pandas](http://pandas.pydata.org) accessible for loading the data. Here we start with a table of subway station information loaded from a CSV file with pandas:

In [None]:
import pandas as pd
import numpy as np
import holoviews as hv
hv.extension('bokeh')

This is the standard way to make the numpy and pandas libraries available in the namespace. We recommend always importing HoloViews as ``hv`` and if you haven't already installed HoloViews, check out the install instructions on our [homepage](www.holoviews.org).

Note that after importing HoloViews as ``hv`` we run ``hv.extension('bokeh')`` to load the bokeh plotting extension, allowing us to generate visualizations with [Bokeh](http://bokeh.pydata.org/). In the next section we will see how you can use other plotting libraries such as [matplotlib](http://matplotlib.org) and even how you can mix and match between them.

Now let's load our subway data using pandas:

In [None]:
station_info = pd.read_csv('visualisatie_holoviews_stations.csv')
station_info.head()

We see that this table contains the subway station name, its latitude and longitude, the year it was opened, the number of services available from the station and ther names, and finally the yearly ridership (in millions for 2015).

### ``Elements`` of visualization

We can immediately visualize some of the the data in this table as a scatter plot. Let's view how ridership varies with the number of services offered at each station:

In [None]:
scatter = hv.Scatter(station_info, 'services', 'ridership')
scatter

Here we passed our dataframe to [``hv.Scatter``](../reference/elements/elements/bokeh/Scatter.ipynb) to create an *object* called `scatter`, which is independent of any plotting library. HoloViews provides a wide range of Element types, all visible in the [Reference Gallery](http://holoviews.org/reference/index.html).

In this example, `scatter` is a simple wrapper around our dataframe that knows that the 'services' column is the independent variable, normally plotted along the x-axis, and that the 'ridership' column is a dependent variable, plotted on the y-axis. These are our *dimensions* which we will describe in more detail a little later.

Given that we have the handle ``scatter`` on our ``Scatter`` object, we can show that it is indeed an object and not a plot by printing it:

In [None]:
print(scatter)

The bokeh plot above is simply the rich, visual representation of ``scatter`` which is plotted automatically by HoloViews and displayed automatically in the [Jupyter notebook](https://jupyter.org/). Although HoloViews itself is independent of notebooks, this convenience makes working with HoloViews easiest in the notebook environment.

### Compositional ``Layouts``

The class [``Scatter``](../reference/elements/elements/bokeh/Scatter.ipynb) is a subclass of ``Element``.  As shown in our [element gallery](http://holoviews.org/reference/index.html), Elements are the simplest viewable components in HoloViews. Now that we have a handle on ``scatter``, we can demonstrate the compositionality of these objects:

In [None]:
layout = scatter + hv.Histogram(np.histogram(station_info['opened'], bins=24), kdims=['opened'])
layout

In a single line using the ``+`` operator, we created a new, compositional object called a ``Layout`` built from our scatter visualizations and a ``Histogram`` that shows how many subway stations opened in Manhattan since 1900. Note that once again, all the plotting is happening behind the scenes. The ``layout`` is a not a plot, it's a new object that exists independently of any given plotting system:

In [None]:
print(layout)

### Array data: taxi dropoffs

So far we have visualized data in a [pandas ``DataFrame``](http://pandas.pydata.org/) but ``HoloViews`` is as agnostic to data formats as it is to plotting libraries; see [Customizing Plots](../user_guide/03-Customizing_Plots.ipynb) for more information. This means we can work with array data as easily as we can work with  tabular data. To demonstrate this, here are some [numpy arrays](http://www.numpy.org/) relating to taxi dropoff locations in New York City:

In [None]:
taxi_dropoffs = {hour:arr for hour, arr in np.load('visualisatie_holoviews_taxi.npz').items()}
#print('Hours: {hours}'.format(hours=', '.join(taxi_dropoffs.keys())))
print('Taxi data contains {num} arrays (one per hour).\nDescription of the first array:\n'.format(num=len(taxi_dropoffs)))
np.info(taxi_dropoffs['0'])

As we can see, this dataset contains 24 arrays (one for each hour of the day) of taxi dropoff locations (by latitude and longitude), aggregated over one month in 2015. The array shown above contains the accumulated dropoffs for the first hour of the day.

### Compositional  ``Overlays``

Once again, we can easily visualize this data with HoloViews by passing our array to [``hv.Image``](../reference/elements/elements/bokeh/Image.ipynb) to create the ``image`` object.  This object has the spatial extent of the data declared as the ``bounds``, in terms of the corresponding range of latitudes and longitudes.

In [None]:
bounds = (-74.05, 40.70, -73.90, 40.80)
image = hv.Image(taxi_dropoffs['0'], ['lon','lat'], bounds=bounds)

HoloViews supports ``numpy``, ``xarray``, ``iris``, and ``dask`` arrays when working with array data (see [Gridded Datasets](../user_guide/Gridded_Datasets.ipynb)).  We can also compose elements containing array data with those containing tabular data. To illustrate, let's pass our tabular station data to a [``Points``](../reference/elements/elements/bokeh/Points.ipynb) element which is used to mark positions in two-dimensional space:

In [None]:
points = hv.Points(station_info, ['lon','lat'])
image + image * points


On the left, we have the visual representation of the ``image`` object we declared. Using ``+`` we put it into a ``Layout`` together with a new compositional object created with the ``*`` operator called an ``Overlay``. This particular overlay displays the station positions on top of our image which works correctly as both elements contain data that exist in the same space, namely New York City.

This overlay on the right lets us see the location of all the subway stations in relation to our midnight taxi dropoffs. Of course,  HoloViews allows you to visually express more of the available information with our points.  For instance, you could represent the ridership of each subway by point color or point size. For more information see [Customizing Plots](../user_guide/03-Customizing_Plots.ipynb).

### Effortlessly exploring data

You can keep composing datastructures together until there are more dimensions than can fit on simultaneously on your screen. For instance, you can visualize a dictionary of [``Images``](../reference/elements/elements/bokeh/Image.ipynb) (one for every hour of the day) by declaring a ``HoloMap``: 

In [None]:
dictionary = {int(hour):hv.Image(arr, ['lon','lat'], bounds=bounds) 
              for hour, arr in taxi_dropoffs.items()}
hv.HoloMap(dictionary, kdims='Hour')

This is yet another object which is rendered by the HoloViews plotting system with Bokeh behind the scenes:

In [None]:
holomap = hv.HoloMap(dictionary, kdims='Hour')
print(holomap)

As this a ``HoloMap`` is a container for our ``Image`` elements, we can use the methods it offers to return new containers. For instance, in the next cell we select three different hours of the morning from the ``HoloMap`` and display them as a ``Layout``:

In [None]:
holomap.select(Hour={3,6,9}).layout()

Here the ``select`` method picks values from the specified 'Hour' dimension. The various Elements like ``Scatter`` and ``Image`` all accept two types of dimensions: *key dimensions* (i.e., indexing dimensions or independent variables), and *value dimensions* (resulting data or dependent variables).  These attributes are named ``kdims`` and ``vdims``, respectively, and can be passed as the second and third positional argument for all Elements other than Histogram. As you can see above, the `HoloMap`` of ``Image``s also has a ``kdims`` argument, allowing it to be indexed over those dimensions.  The ``kdims`` and ``vdims`` accept either single dimensions or lists of dimensions, and let you express the space in which your data lives.

Note how the ``Image`` elements where the holomap is constructed are declared using key dimensions of ``['lat','lon']`` which describes the fact that New York City is being viewed in terms of longitude and latitude. This semantic information is automatically mapped to our visualization by the HoloViews plotting system, which sets the x-axis and y-axis labels accordingly. In the case of the ``HoloMap`` we used a key dimension of ``'Hour'`` to declare that the interactive slider ranges over the hours of the day.

### Data as visualization

Holomaps are able to compose with elements and other holomaps into overlay and layouts just as easily as you compose two elements together. Here is one such composition where we select a range of longitudes and latitudes from our [``Points``](../reference/elements/elements/bokeh/Points.ipynb) before we overlay them:

In [None]:
%%opts Image [xrotation=90] Points (color='deepskyblue' marker='v' size=6)
hotspot = points.select(lon=(-73.99, -73.96), lat=(40.75,40.765))
composition = holomap * hotspot
composition

The line starting with ``%%opts`` used to specify the visual style is part of the HoloViews options system described in the next 'Getting started' section which also describes how to achieve the same effect with standard Python syntax.

In the cell above we created and styled a composite object within a few short lines of code. Furthermore, this composite object relates tabular and array data and is immediately presented in a way that can be explored interactively. This way of working enables highly productive exploration, allowing new insights to be gained easily. For instance, after exploring with the slider we notice a hotspot of taxi dropoffs at 7am which we can select as follows:

In [None]:
composition.select(Hour=7)

We can now see that the slice of subway locations was chosen in relation to the hotspot in taxi dropoffs around 7am in the morning. This area of Manhattan just south of Central Park contains many popular tourist attractions, including Times Square, and we can infer that tourists often take short taxi rides from the subway stations into this area.

At this point it may appear that HoloViews is about easily generating explorative, interactive visualizations *from* your data. In fact, as we have been building these visualizations we have actually been working *with* our data, as we can show by examining the ``.data`` attribute of our sliced subway locations:

In [None]:
hotspot.data

We see that slicing the HoloViews [``Points``](../reference/elements/elements/bokeh/Points.ipynb) object in the visualization sliced the underlying data, with the structure of the table left intact. We can see that the Times Square 42nd Street station is indeed one of the subway stations surrounding our taxi dropoff hotspot. This seamless interplay and exchange between the raw data and easy-to-generate visualizations of it is crucial to how HoloViews helps you understand your data.

### Onwards

The next getting-started section shows how to do [Customization](2-Customization.ipynb) of the visual appearance of your data, allowing you highlight the most important features and change the look and feel. Other related topics for deeper study:
 
* The above plots did not require any special geographic-data support, but when working with larger areas of the Earth's surface (for which curvature becomes significant) or when overlaying data with geographic features, the separate [GeoViews](http://geo.holoviews.org) library provides convenient geo-specific extensions to HoloViews.
* The taxi array data was derived from a very large tabular dataset and rasterized using [datashader](http://https://github.com/bokeh/datashader), an optional add-on to HoloViews and Bokeh that makes it feasible to work with very large datasets in a web browser.

### Hoog abstractieniveau en interactie

Voorbeeld om het hoge abstractieniveau en interactie van Holoviews te tonen.

In [None]:
import pandas as pd                                        # DataFrame-functionaliteit
import numpy as np                                         # Converteren van DataBunch

from sklearn import datasets                               # Bevat de Iris-dataset

import holoviews as hv

hv.extension('bokeh')


In [None]:
irisBunch = datasets.load_iris()


iris = pd.DataFrame(data= np.c_[irisBunch['data'], irisBunch['target']],
                     columns= irisBunch['feature_names'] + ['species'])

iris2 = iris.drop(['sepal length (cm)', 'sepal width (cm)'], axis=1)
iris2.head()


In [None]:
vdims = [('petal length (cm)')]
ds = hv.Dataset(iris2, ['petal width (cm)', 'species'], vdims)

In [None]:
%%opts Scatter [width=500 height=500]

ds.to(hv.Scatter, 'petal width (cm)', 'petal length (cm)')

Schuifje

The various Elements like Scatter and Image all accept two types of dimensions: key dimensions (i.e., indexing dimensions or independent variables), and value dimensions (resulting data or dependent variables). These attributes are named kdims and vdims, respectively, and can be passed as the second and third positional argument for all Elements other than Histogram. As you can see above, 

https://holoviews.org/getting_started/Introduction.html

In [None]:
Bar chart met opties om vakken te kiezen?

http://holoviews.org/user_guide/Dashboards.html



<a id="seaborn"></a>
## Seaborn

In [None]:
import numpy as np

X = np.linspace(-np.pi, np.pi, 256, endpoint=True)
C,S = np.cos(X), np.sin(X)

In [None]:
import matplotlib.pyplot as plt

plt.plot(X,C)
plt.plot(X,S)

plt.show()

In [None]:
plt.style.use('seaborn-whitegrid')

# Create a new figure of size 8x6 points, using 100 dots per inch
plt.figure(figsize=(8,6), dpi=80)

grid = plt.GridSpec(2,3,wspace=0.4,hspace=0.3)

# Create a new subplot from a grid of 1x1
plt.subplot(grid[0,0])

X = np.linspace(-np.pi, np.pi, 20, endpoint=True)
C,S = np.cos(X), np.sin(X)

# Plot cosine using blue color with a continuous line of width 1 (pixels)
plt.plot(X, C, '->', color="pink", linewidth=1.0)

plt.subplot(grid[0,1])

# Plot sine using green color with a continuous line of width 1 (pixels)
plt.plot(X, S, 'o', color="green", linewidth=1.0, linestyle="-")

# Set x limits
plt.xlim(-4.0,4.0)
plt.axis('equal')

# Set x ticks
plt.xticks(np.linspace(-4,4,3), ('Hoi','Dag'))

# Set y limits
#plt.ylim(-1.0,1.0)

# Set y ticks
plt.yticks(np.linspace(-1,1,5,endpoint=True))

# Save figure using 72 dots per inch
# savefig("../figures/exercice_2.png",dpi=72)


p = -2 * np.pi * np.random.random_sample((10)) + np.pi
q = -2 * np.random.random_sample((10)) + 1
sizes = np.linspace(100,1000,10)

plt.scatter(p,q,marker='o',s=sizes, c=sizes)
plt.colorbar()

plt.subplot(grid[0,2])

yerr = 0.5
Cerror = np.cos(X) + yerr*np.random.randn(20)

plt.errorbar(X,Cerror,yerr=yerr,fmt='.k')

plt.subplot(grid[1,:])

data1 = np.random.randn(10000)
data2 = np.random.randn(10000)*2
data3 = np.random.randn(10000)+2

kwargs = dict(histtype='stepfilled', alpha=0.3, normed=True, bins=50)

plt.hist(data1, **kwargs, label='Basis')
plt.hist(data2, **kwargs, label='Dubbel')
plt.hist(data3, **kwargs, label='Opgehoogd')

plt.legend()

print(np.histogram(data1, bins=5))

# Show result on screen
plt.show()


In [None]:
fig, ax = plt.subplots(2)
ax[0].plot(X,C)
ax[1].plot(X,S)
ax[0].text(0,1.1, "Top o' the sinus", size=10, color='gray')
plt.show()

In [None]:
import pandas as pd
import seaborn as sns

data = np.random.multivariate_normal([0,0], [[5,2],[2,2]], size=2000)
data = pd.DataFrame(data,columns=['x','y'])

for col in 'xy':
    sns.distplot(data[col])
    
plt.ylim(.0,.35)
    
plt.show()

In [None]:
sns.kdeplot(data['x'], data['y'])
plt.show()