## Data Visualization


The most popular python library for creatic static, animated, and interactive visualization in Python is <a url="https://matplotlib.org/">Matplotlib</a>. As its name suggests, Matplotlib was initially developed to emulate MATLAB (a non-open source programming language) graphics commands, but it is fully independent of MATLAB and, as you can see, it is designed to fully interface with python objects.

The first thing to do is importing the module <code>pyplot</code> from the <code>matplotlib</code> library. As with many of our previous imports, we import the module under an alternate, shorter, name for convenience. Finally, we specify the command <code>%matplotlib inline</code> so that, when plotting, Jupyter Notebook will not display the plots into new windows, but in the notebook itself.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

### Plotting pandas DataFrames

Let's start reading some data and plotting it. Pandas DataFrames integrate many matplotlib functionalities in their methods, one of these methods is ```plot```. Specifying as a first and second argument our x and y axis respectively, and specifying the kind of plot we want, we can quickly visualise how our data looks like.

In [None]:
surveys = pd.read_csv("../data/surveys.csv")
my_plot = surveys.plot("hindfoot_length", "weight", kind="scatter")

<div class="alert alert-block alert-success">
<b>TRY IT YOURSELF</b>: Time to play with plots! Look at the pandas.DataFrame.plot() documentation (<a url="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html">here</a>) and change your data visualization selecting different DataFrame columns, x and y axes, and kind of plot (try at least three different kind of plots).
</div>

### Plotting general data

In the next example we will generate our own data getting 1000 normally distributed data points. 

In [None]:
import numpy as np
sample_data = np.random.normal(0, 0.1, 1000)

This time we will use the ```pyplot``` method hist to compute and visualise a histogram of our data.

In [None]:
plt.hist(sample_data)

### Matplotlib plot hierarchy

In the previous examples we generage very simple plots to have a quick look at the data. However, with Matplotlib you can control every aspect of your plot: its dimension, x and y ticks and label, making multi-plots in the same area, and many other graphical features.

To get full control of the plots generated with Matplotlib.pyplot, it is important be aware of the hierarchy between pyplot main objects:
<ul>
    <li>At the higher level we have a **Figure**. Figure is simply the total white space where you will organise your plots;</li>
    <li>One of the most confusion Matplotlib convention is that single plots are called **Axes**. You can have a single Axes (still, with final "s") per Figure, so one plot per Figure (see Plot1 on the left) or multiple Axes per Figure, like in Plot2 (on the right) where the same Figure contains three plots distributed in two rows;</li>
    <li>Finally, each Axes (aka plot) contains two **Axis**, i.e. x and y axis.</li>
</ul>

<div>
<img src="pictures/plot_hierarchy.jpeg" width="800"/>
</div>

To initiate at the same time a Figure and a Axes, we can use the method ```plt.subplots()```. Using only the default arguments (like in this first case), the methods will return a Figure and a single Axes object, i.e. a white window and a single plot at the center of it. We will assign to these two python objects the names *fig* and *ax*.<br>
To plot data on our Axes (*as*) we will use the same plotting methods used in the previous examples. In this case, we will use ```hist()``` sampling the data in 30 bins.

In [None]:
fig, ax = plt.subplots()  # initiate an empty figure and axis matplotlib object
ax.hist(sample_data, 30)

Once we defined a Figure and an Axes, we can add other Axes to our Figure to plot additional data in the same space. This can be done using ```fig.add_axes([left,bottom,right,top])``` where ```add_axes``` is a method that, indeed, ad an additional Axes to our Figure and its argument are the coordinates of our people. Default coordinate units are such that 0 corresponds to the beginning of an axis and 1 to its end. For examples, the list of coordinates [0.5,0.5,1,1] will locate the bottom left corner of our additional Axis at the very center of the Figure and its right top corner at the very top right corner of our Figure.

In [None]:
# prepare a matplotlib figure
fig, ax1 = plt.subplots()
ax1.hist(sample_data, 30)
# add labels
ax1.set_ylabel('density')
ax1.set_xlabel('value')

# define and sample beta distribution
a = 5
b = 10
beta_draws = np.random.beta(a, b)

# plot beta distribution
# by adding additional axes to the figure
ax2 = fig.add_axes([0.125, 0.575, 0.3, 0.3])
#ax2 = fig.add_axes([left, bottom, right, top])
ax2.hist(beta_draws)

Axes do not neet to be always added on the go, we can plan the structure of our Figure according to the number of plots we want to display in it. In the following example, we give some more arguments to ```plt.subplots()```. The first two arguments indicate the number of vertical and horizontal plots we want to fit in our Figure. In this case, we want one plot vertically (one row) and one row horizontally (two columns). As we want to be sure that there will be enough space for our two plots, we specify the size of the Figure, 12 inches long and 6 inches high (inches is the default size unit, but you can specify different ones).<br>
Compared to our previous example, this time ```plt.subplots()``` returns a figure and a tuple of Axes. The number of these Axes depends on our orguments. In this case we want plots distributed in one row and two columns, so a total of 2 plots, therefore the returned Axes objects will be 2.

In [None]:
# prepare a matplotlib figure
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(12,6))
ax1.hist(sample_data, 30)
# add labels
ax1.set_ylabel('density')
ax1.set_xlabel('value')

# define and sample beta distribution
a = 5
b = 10
beta_draws = np.random.beta(a, b)

# plot beta distribution
ax2.hist(beta_draws)

### Integrating pandas plot with matplotlib

Let's now go back to our pandas DataFrames. We saw that we can quickly plot data from a pandas DataFrame, but what about I *already* have a Figure with Axes and I want to plot my Dataframe data in it? This examples will show you how.

In [None]:
fig, ax1 = plt.subplots() # prepare a matplotlib figure

surveys.plot("hindfoot_length", "weight", kind="scatter", ax=ax1)

# Provide further adaptations with matplotlib:
ax1.set_xlabel("Hindfoot length")
ax1.tick_params(labelsize=16, pad=8)
fig.suptitle('Scatter plot of weight versus hindfoot length', fontsize=15)

As you could see, you just need to specify the argument ```ax=<my_figure_ax>``` to plot the DataFrame data in your Axes.

<div class="alert alert-block alert-success">
<b>TRY IT YOURSELF</b>: Plot DataFrame data in a single Figure:
        <ol>
            <li>Initialize a Figure with 4 Axes distributed in two rows and two columns;</li>
            <li>In each Axis plot DataFrame data of different columns (try also to use different kind of plots);</li>
            <li>Make your Axes (plots) "pretty": label all your axes, use clear character font, choose a nice title for your plot. You may want to consult the <code>Axes.plot()</code> documentation for that (<a url="https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.plot.html"></a>).
        </ol>
</div>