In [None]:
#%matplotlib widget
%matplotlib inline

# <center>**Cours**</center>

<img src="../images/logo2_matplotlib.svg" style="margin:0 auto;">
<div style="text-align:center">
    Bertrand Néron
    <br>
    <a src=" https://research.pasteur.fr/en/team/bioinformatics-and-biostatistics-hub/">Bioinformatics and Biostatistiqucs HUB</a>
    <br />
    © Institut Pasteur, 2021
</div>

# Motivation

# Installation

* open a shell 
* activate the virtualenv (*source <prefix>/bin/activate*)
and type:
    
```
pip install matplotlib
```

*You may need root permission if you do not use a virtualenv.*

# Using matplotlib

In [None]:
import matplotlib
import matplotlib.pyplot as plt

# Concepts and Terminology

<img src="../images/mplt_concept.png"  width="600px">

> https://matplotlib.org/stable/tutorials/introductory/usage.html#parts-of-a-figure

### Figure

The figure is like a canvas where all you Axes (plots) where drawn.
A figuer acn containes several Axes (plots) but to be useful should have at least one.
The easiest way to create a new figure is with pyplot:

In [None]:
fig = plt.figure()  # an empty figure with no axes

In [None]:
fig, ax_lst = plt.subplots(2, 2)  # a figure with a 2x2 grid of Axes

### Axes

This is what you think of as ‘a plot’.
* The Axes contains two (or three in the case of 3D) **Axis** objects
* Each Axes has a title 
* Each Axes can contain a legend 

### Axis

These are the number-line-like objects.

### Labels

This the "legend" of Axis. There is 2 labels for 2D plots the ``x_label`` and ``y_label``

### Ticks

The ticks arethe marks on the axis and ticklabels (strings labeling the ticks).
the is two kind of ticks, major and minor ticks.
by default they are automaticaly generated by the axis.
but they can be configured.

# Coding styles

When viewing matplotlib code, you will find different coding styles and usage patterns. 
* matlab style
* object-oriented style

These styles are perfectly valid and have their pros and cons.
The only caveat is to avoid mixing the coding styles for your own code.

matlab style is fine for small interface like in notebook,
whereas to have even more control in application embeding matplotlib GUI the pyplot level may be dropped completely, leaving a purely object-oriented approach.

## pyplot functional style (*aka matlab style*)

In [None]:
plt.plot([1, 2, 3])
plt.title('hi mom')
plt.grid(True)
plt.xlabel('time')
plt.ylabel('volts')

## Object oriented style

In [None]:
fig, ax = plt.subplots() # by default 1 row, 1 column, 1 axe
ax.plot([1, 2, 3])
ax.set_title('hi mom')
ax.grid(True)
ax.set_xlabel('time')
ax.set_ylabel('volts')

In this notebook we will use the *pyplot functional coding style*

## Dive into matplotlib

Now we are going to learn to use some compounds.
To the demos belows we will need of *numpy* and *pandas* packages

In [None]:
import numpy as np
import pandas as pd

# Plot

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html

## One variable

In [None]:
temp = pd.read_csv("../data/fr_sp_it_temp.tsv", sep="\t", header=0, index_col=0)
temp

In [None]:
paris = temp[temp.City == 'Paris']
paris

In [None]:
plt.plot(paris['Tmp'])

## Two variables

In [None]:
plt.plot(paris['Year'],paris['Tmp'])

In [None]:
plt.plot(paris['Year'],paris['Tmp'],
         marker='o')

In [None]:
plt.plot(paris['Year'],paris['Tmp'],
         marker='o',
        linestyle=''
        )

In [None]:
plt.plot(paris['Year'],paris['Tmp'],
         marker='s',
         color='red',
        linestyle='--'
        )

In [None]:
plt.plot(paris['Year'],paris['Tmp'],
         marker='v',
         color='brown',
        linestyle='-.'
        )

all available linestyles, markers and colors are described in https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html

## Several plots

In [None]:
bdx = temp[temp.City == 'Bordeaux']
plt.plot(paris['Year'], paris['Tmp'], label='Paris')
plt.plot(bdx['Year'], bdx['Tmp'], label='Bordeaux')

plt.legend()

In [None]:
for city, df in temp.groupby('City'):
    plt.plot(df['Year'], df['Tmp'], label=city)

plt.legend(ncol=2)
plt.xlabel("Year")
plt.ylabel("Tp in °F")
plt.title("Average Temperature")

In [None]:
paris_max = paris['Tmp'] + paris['std']
paris_min = paris['Tmp'] - paris['std']

plt.plot(paris['Year'], paris['Tmp'], label= "Paris" )

plt.fill_between(paris['Year'], 
         paris_max, 
         y2=paris_min,
         alpha=0.5        
        )

plt.legend()

## xlabel and ylabel

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.xlabel.html
> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.ylabel.html

In [None]:
paris_max = paris['Tmp'] + paris['std']
paris_min = paris['Tmp'] - paris['std']

plt.plot(paris['Year'], paris['Tmp'], label= "Paris" )

plt.fill_between(paris['Year'], 
         paris_max, 
         y2=paris_min,
         color="orange",        
         alpha=0.5        
        )
plt.xlabel('Year')
plt.ylabel('Temp in °F')
plt.legend()

## Legend

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html

In [None]:
plt.plot(paris['Year'], paris['Tmp'], label='Paris')
plt.plot(bdx['Year'], bdx['Tmp'], label='Bordeaux')
plt.legend(['T°F at Paris', 'T°F at Bordeaux'], ncol=2)

In [None]:
plt.plot(paris['Year'], paris['Tmp'], label='Paris')
plt.plot(bdx['Year'], bdx['Tmp'], label='Bordeaux')
plt.legend(['T°F at Paris', 'T°F at Bordeaux'], loc="upper right")

## Title

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.title.html

In [None]:
plt.plot(paris['Year'], paris['Tmp'], label='Paris')
plt.plot(bdx['Year'], bdx['Tmp'], label='Bordeaux')
plt.legend(['T°F at Paris', 'T°F at Bordeaux'])
plt.title("average temp in France")

## Grid

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.grid.html

In [None]:
plt.plot(paris['Year'], paris['Tmp'], label='Paris')
plt.plot(bdx['Year'], bdx['Tmp'], label='Bordeaux')
plt.legend(['T°F at Paris', 'T°F at Bordeaux'])
plt.grid(ls=':') # ls = linestyle
plt.title("average temp in France")

## xlim, ylim, axvline, axhline

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.xlim.html

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.lim.html

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.axvline.html

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.axhline.html

In [None]:
plt.plot(paris['Year'], paris['Tmp'], label='Paris')
plt.plot(bdx['Year'], bdx['Tmp'], label='Bordeaux')

plt.axvline(2008, color="red", linestyle="--")
plt.axhline(55, linestyle="-.")

plt.xticks(np.arange(1998, 2019, 2))
plt.xlim([1998,2018])
plt.ylim(50,60)

## loglog and semilog

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.loglog.html

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.semilogx.html

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.semilogy.html

In [None]:
plt.semilogy(paris['Year'], paris['Tmp'],
            marker='*',
            markersize=15,
            markeredgecolor="k" # short option = mec
            )
plt.grid()

In [None]:
plt.loglog(paris['Year'], paris['Tmp'], marker='X')
plt.grid(True, which="both", linestyle='--')

# Histogram

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html

A histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set of continuous data. This allows the inspection of the data for its underlying distribution (e.g., normal distribution), outliers, skewness, etc. 

Histograms are based on area, not height of bars

In a histogram, it is the area of the bar that indicates the frequency of occurrences for each bin. This means that the height of the bar does not necessarily indicate how many occurrences of scores there were within each individual bin. It is the product of height multiplied by the width of the bin that indicates the frequency of occurrences within that bin. One of the reasons that the height of the bars is often incorrectly assessed as indicating frequency and not the area of the bar is due to the fact that a lot of histograms often have equally spaced bars (bins), and under these circumstances, the height of the bin does reflect the frequency.
What is the difference between a bar chart and a histogram?

The major difference is that a histogram is only used to plot the frequency of score occurrences in a **continuous** data set that has been divided into classes, called bins. Bar charts, on the other hand, can be used for a great deal of other types of variables including ordinal and nominal data sets.

https://statistics.laerd.com/statistical-guides/understanding-histograms.php

In [None]:
iris = pd.read_csv('../data/Iris.csv', sep=',' , header=0, index_col='Id' )
iris.head()

In [None]:
sepal_len = iris.iloc[:, 0]
bins = 20
n , plt_bins,patches =  plt.hist(sepal_len, bins)
plt.title("Histogram sepal Length")
plt.xlabel("sepal length in cm")
plt.ylabel("frequency")

## Influence of bins

In [None]:
bins = 200
n , plt_bins,patches =  plt.hist(sepal_len, bins)
plt.title("too many bins")
plt.xlabel("sepal length in cm")
plt.ylabel("frequency")

In [None]:
bins = 4
n , plt_bins,patches =  plt.hist(sepal_len, bins)
plt.title("not enought bins")
plt.xlabel("sepal length in cm")
plt.ylabel("frequency")

## several Histograms on the same figure

In [None]:
sepal_len = iris.iloc[:, 0]
petal_len = iris.iloc[:, 2]

bins = 20

n , plt_bins,patches =  plt.hist(sepal_len, bins, 
                                 edgecolor='k', 
                                 alpha=0.5,
                                 label='sepal length'
                                )
n , plt_bins,patches =  plt.hist(petal_len, bins, 
                                 edgecolor='k', 
                                 alpha=0.5,
                                 label='petal length'
                                )
plt.legend()
plt.title("Histogram sepal Length")
plt.xlabel("sepal length in cm")
plt.ylabel("frequency")

In [None]:
sepal_len = iris.iloc[:, 0]
petal_len = iris.iloc[:, 2]

bins = 20

n , plt_bins,patches =  plt.hist([sepal_len, petal_len],
                                 bins, 
                                 edgecolor='k',
                                 density=True
                                )
plt.legend(['sepal length', 'petal length'])
plt.title("Histogram sepal Length")
plt.xlabel("sepal length in cm")
plt.ylabel("frequency")

In [None]:
n , plt_bins,patches =  plt.hist(sepal_len,
                                 bins, 
                                 edgecolor='k',
                                 density=True
                                )

# Histogram 2D

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist2d.html

2D histograms are useful when you need to analyse the relationship between 2 numerical variables that have a huge number of values. It is useful for avoiding the over-plotted scatterplots. 

Given a set of ordered pairs describing data points, you can count the number of points with similar values to construct a two-dimensional histogram. This is similar to a one-dimensional histogram, but it describes the joint variation of two random variables rather than just one.

In [None]:
petal_len = iris.loc[:, 'PetalLengthCm']

In [None]:
_ = plt.hist2d(sepal_len, petal_len, bins=(25, 25), cmap=plt.cm.jet)
plt.xlabel('Sepal lenght (cm)')
plt.ylabel('Petal length (cm)')
plt.title('sepal lenght vs petal length' )

# Bar plot

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.bar.html

## Vertical bar plot

In [None]:
iris_desc = iris.describe()
iris_desc

In [None]:
iris_means = iris_desc.loc['mean', :]
iris_means.shape

In [None]:
iris_means.index

In [None]:
iris_means

In [None]:
recs = plt.bar(iris_desc.columns, iris_means, yerr=iris_desc.loc['std', :])
for idx, data in enumerate(iris_means):
    plt.text(x=idx, y=data , s=f"{data:.2f}")
plt.ylabel("Average (in cm)")
    

 now we want to compare each species

In [None]:
averages = iris.groupby("Species").mean()
averages

In [None]:
std = iris.groupby("Species").std()
std

In [None]:
width = 0.25 # the bar width      
x = np.arange(len(averages.columns)) # the xticks

for shift, specie in enumerate(averages.index, -1):
    spec_avs = averages.loc[specie, :]
    abscisses = x + (shift * width)
    plt.bar(abscisses, 
            spec_avs, 
            width,
            yerr=std.loc[specie, :],
            label=specie)
    for one_abcisse, av in zip(abscisses, spec_avs) :
        plt.text(x=one_abcisse, y=av , s=f"{av:.2f}")

plt.xticks(x, averages.columns, rotation=-45)
plt.ylabel("Average (in cm)")
plt.legend()

### xticks

In the example above we don not use the labels as first parameter of  pyplot.bar.
But we use positions that allow us 
* to center tick on the midel bar
* put one bar on the left (tick - bar width)
* one bar on the the right (tick + bar width)

and to put the name on the ticks we use the pyplot.xticks function
with the location of the ticks and the names to display.

## Horizontal bar plot

Same as *pyplot.bar* but at horizontal.
But do not forget to use *xerr* instesd of *yerr* to display *std*
and switch *idx* and *data* for *x* and *y* for labels coordinates.

In [None]:
recs = plt.barh(iris_desc.columns, iris_means, xerr=iris_desc.loc['std', :])
for idx, data in enumerate(iris_means):
    plt.text(x=data, y= idx + 0.1 , s=f"{data:.2f}")
plt.ylabel("Average (in cm)")
    


# Boxplot

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html

Box plots visually show the distribution of numerical data and skewness through displaying the data quartiles (or percentiles) and averages.

Box plots show the five-number summary of a set of data: including the minimum score, first (lower) quartile, median, third (upper) quartile, and maximum score.

<div>
 <img src="images/boxplot_explanation.png" />
</div>

for more explanation visithttps://www.simplypsychology.org/boxplots.html

In [None]:
iris.columns[:-1]

In [None]:
recs = plt.boxplot(iris.iloc[:, :-1],
                   tick_labels=iris.columns[:-1])


With a notch at the median

In [None]:
recs = plt.boxplot(iris.iloc[:, :-1],
                   tick_labels=iris.columns[:-1],
                  notch=True)

with colored box

In [None]:
res_boxes = plt.boxplot(iris.iloc[:, :-1],
                   tick_labels=iris.columns[:-1],
                   notch=True,
                   patch_artist=True)

In [None]:
color = ("orangered", "darkorange", "limegreen", "aqua")

res_boxes = plt.boxplot(iris.iloc[:, :-1],
                   tick_labels=iris.columns[:-1],
                   notch=True,
                   patch_artist=True)
for i, c in enumerate(color):
    res_boxes['boxes'][i].set_facecolor(c)

# Violin Plot

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.violinplot.html#matplotlib.pyplot.violinplot

Sometimes the median and mean aren't enough to understand a dataset. Are most of the values clustered around the median? Or are they clustered around the minimum and the maximum with nothing in the middle? When you have questions like these, distribution plots are your friends.

The box plot is an old standby for visualizing basic distributions. It's convenient for comparing summary statistics (such as range and quartiles), but it doesn't let you see variations in the data. For multimodal distributions (those with multiple peaks) this can be particularly limiting.

But fret not—this is where the violin plot comes in. A violin plot is a hybrid of a box plot and a kernel density plot, which shows peaks in the data.

formore explanation visit: https://mode.com/blog/violin-plot-examples/

In [None]:
ticks = np.arange(len(iris.columns[:-1]))
violin_res = plt.violinplot(iris.iloc[:, :-1], 
                      ticks,
                     )

With small customization

In [None]:
ticks = np.arange(len(iris.columns[:-1]))
violin_res = plt.violinplot(iris.iloc[:, :-1], 
                      ticks,
                     showmeans=True,
                     )
_= plt.xticks(ticks, iris.columns[:-1], rotation=45)

for i, c in enumerate(color):
    violin_res['bodies'][i].set_facecolor(c)

> for more customization: https://matplotlib.org/stable/gallery/statistics/customized_violin.html#sphx-glr-gallery-statistics-customized-violin-py

# Scatter plot

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html

A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are used to observe relationships between variables.

https://chartio.com/learn/charts/what-is-a-scatter-plot/

In [None]:
iris.columns

In [None]:
plt.scatter(iris['SepalLengthCm'], iris['SepalWidthCm'])
plt.xlabel('sepal length in (cm)')
plt.ylabel('sepal width in (cm)')
plt.title('Scatter plot Iris sepal')

It's nice but we have nothing very evident appear.

But there is several sepcies in dataset so we ca colored diferently each specie 

In [None]:
plt.figure(figsize=(8,6)) #in inches

for specie, data in iris.groupby('Species'):
    plt.scatter(data['SepalLengthCm'], data['SepalWidthCm'], label=specie )
    
plt.xlabel('sepal length in (cm)')
plt.ylabel('sepal width in (cm)')
plt.legend()
plt.title('Scatter plot Iris sepal')
plt.grid()

In [None]:
plt.figure(figsize=(8,6))
for specie, data in iris.groupby('Species'):
    plt.scatter(data['SepalLengthCm'], data['SepalWidthCm'],
                s=150, # the dot size
                label=specie )
    

Let's get a new data set with more data

In [None]:
data_for_plt = pd.read_csv("../data/data_for_plt.csv", sep="\t", header=0, index_col=0)
data_for_plt.columns = ['MW', 'AlogP', 'PSA', 'HBA']
data_for_plt.head()

In [None]:
x = data_for_plt['MW']
y = data_for_plt['AlogP']
plt.scatter(x, y)

When the is lot of point it could be useful to use transparency to have a better data visualization

In [None]:
plt.scatter(x, y,
           alpha=0.5)

The size can be set for each dot

use an array like (size of the x/y)

In [None]:
fig = plt.figure(figsize=(8,6))
plt.scatter(x, y,
            s=data_for_plt['PSA'],
            alpha=0.5)

And now we want to colored each dot in function of the *HBA* value 

In [None]:
fig = plt.figure(figsize=(12,7))
plt.scatter(data_for_plt['MW'],
            data_for_plt['AlogP'],
            s=data_for_plt['PSA'],
            c=data_for_plt['HBA'],
            edgecolors="k",
            alpha=0.5,
            cmap="bwr")
cb = plt.colorbar()
plt.xlabel('Molecular Weight')
plt.ylabel('Hydrophobicity')
cb.ax.set_ylabel("Hydrogen Bond Acceptor", rotation=-90, va="bottom")

# Color map

> https://matplotlib.org/stable/tutorials/colors/colormaps.html

> https://matplotlib.org/stable/gallery/color/named_colors.html

In [None]:
fig = plt.figure(figsize=(12,7))
plt.scatter(data_for_plt['MW'],
            data_for_plt['AlogP'],
            s=data_for_plt['PSA'], # marker size accept an array
            c=data_for_plt['HBA'], # marker color accept an array 
            edgecolors="k", 
            alpha=0.5,
            cmap="hot")
cb = plt.colorbar()
plt.xlabel('Molecular Weight')
plt.ylabel('Hydrophobicity')
cb.ax.set_ylabel("Hydrogen Bond Acceptor", rotation=-90, va="bottom")

# 3D Plots

## 3D scatter plot

In [None]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

for specie, data in iris.groupby('Species'):
    ax.scatter(data['SepalLengthCm'], 
               data['SepalWidthCm'], 
               data['PetalLengthCm'], 
               s = data['PetalWidthCm'] * 10,
               label=specie)

ax.set_xlabel('Sepal Lenght')
ax.set_ylabel('Sepal Width')
ax.set_zlabel('Petal Length')
plt.title('Iris 3d scatter plot')
plt.legend()

# Heatmap

It is often desirable to show data which depends on two independent 
variables as a color coded image plot. 
This is often referred to as a heatmap. 
If the data is categorical, this would be called a categorical heatmap.

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html#matplotlib.pyplot.imshow

> https://matplotlib.org/stable/gallery/images_contours_and_fields/image_annotated_heatmap.html

In [None]:
# create a dataset

vegetables = ["cucumber", "tomato", "lettuce", "asparagus",
              "potato", "wheat", "barley"]
farmers = ["Farmer Joe", "Upland Bros.", "Smith Gardening",
           "Agrifun", "Organiculture", "BioGoods Ltd.", 
           "Cornylee Corp."]

# harvest is the production intons/year for each vegetable (row)
# by each farmer (columns)
harvest = np.array([[0.8, 2.4, 2.5, 3.9, 0.0, 4.0, 0.0],
                    [2.4, 0.0, 4.0, 1.0, 2.7, 0.0, 0.0],
                    [1.1, 2.4, 0.8, 4.3, 1.9, 4.4, 0.0],
                    [0.6, 0.0, 0.3, 0.0, 3.1, 0.0, 0.0],
                    [0.7, 1.7, 0.6, 2.6, 2.2, 6.2, 0.0],
                    [1.3, 1.2, 0.0, 0.0, 0.0, 3.2, 5.1],
                    [0.1, 2.0, 0.0, 1.4, 0.0, 1.9, 6.3]])

# create the heatmap
plt.imshow(harvest)

# We want to show all ticks...
# ... and label them with the respective list entries
xticks = plt.xticks(np.arange(len(farmers)), farmers, 
                    rotation=45, ha="right", rotation_mode="anchor")
yticks = plt.yticks(np.arange(len(vegetables)), vegetables)

# Loop over data vegetables and farmers and annotate the heatmap.
for i in range(len(vegetables)):
    for j in range(len(farmers)):
        text = plt.text(j, i, # the coordinate of the cell
                        harvest[i, j], # the intensiy of the cell
                        ha="center", # horizontal alignment
                        va="center", # vertical alignment
                        color="w" if  harvest[i, j] < 5 else "k")

# add a color bar        
cbar = plt.colorbar()

# add a ylabel to the colorbar
# va = vertical alignment 
cbar.ax.set_ylabel("Harvest (tons/year)", rotation=-90, va="bottom")
    
plt.title("Harvest of local farmers (in tons/year)")


# Subplots

We can pack several plots in a figure.
There is several way to do that, here we describe the *pyplot.subplots* function
> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html

In [None]:
# in this case we use the oop coding style

fig, axs = plt.subplots(2,2, figsize=(9,7)) # 2 rows, 2 columns

_ = axs[0,0].hist(iris.iloc[:, 0], color='blue', edgecolor='k')
axs[0,0].set(xlabel='sepal length',  ylabel='Frequency')

_ = axs[0,1].hist(iris.iloc[:, 1], color='orange')
axs[0,1].set(xlabel='sepal width',  ylabel='Frequency')

_ = axs[1,0].hist(iris.iloc[:, 2], color='green', edgecolor='k')
axs[1,0].set(xlabel='petal length',  ylabel='Frequency')

_ = axs[1,1].hist(iris.iloc[:, 3], color='red')
# an other way to set the label
axs[1,1].set_xlabel('petal length')
axs[1,1].set_ylabel('Frequency')
fig.suptitle('Iris Histograms')

plt.tight_layout()


# Save figure

> https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html

In [None]:
########## Create a figure ##########

fig, axs = plt.subplots(2,2, figsize=(9,7)) # 2 rows, 2 columns

_ = axs[0,0].hist(iris.iloc[:, 0], color='blue', edgecolor='k')
axs[0,0].set(xlabel='sepal length',  ylabel='Frequency')

_ = axs[0,1].hist(iris.iloc[:, 1], color='orange')
axs[0,1].set(xlabel='sepal width',  ylabel='Frequency')

_ = axs[1,0].hist(iris.iloc[:, 2], color='green', edgecolor='k')
axs[1,0].set(xlabel='petal length',  ylabel='Frequency')

_ = axs[1,1].hist(iris.iloc[:, 3], color='red')
axs[1,1].set(xlabel='petal length',  ylabel='Frequency')

fig.suptitle('Iris Histograms')

plt.tight_layout()

####### Save the figure in png format #########

plt.savefig('../images/iris_histograms.png')


# Conclusion

We have seen

* figure, axes, subplots notions 
* plot
* hist
* hist2d
* bar plot
* boxplot
* scatter plot
* heatmap
* colormap
* xticks, xlabels, yticks, ylabels
* plot parameters: alpha, fontsize, marker, markersize, ...
    
But ther are so many others plots and functions in matplotlib

* meshgrid
* 3D plots
* pie chart
* violin plot
* ...

check the matplotlib galerie: https://matplotlib.org/stable/gallery/index.html#pyplot

and the cheat sheet: https://github.com/matplotlib/cheatsheets

> the examples of this course are largely inspired from: https://towardsdatascience.com/matplotlib-tutorial-with-code-for-pythons-powerful-data-visualization-tool-8ec458423c5e