# Data visualization with matplotlib 

## Basics

We start by importing the matplotlib library -- it is the Python data visualization package. 

In particular, we import the pyplot module of the matplotlib library under the alias plt. It provides an interface that allows us to automatically create figures and axes for our plot.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

#to embed the plots inside the notebook. 
%matplotlib inline      

## The structure of a figure

   
   
 Figure is the overall window or page that everything is drawn on. It comprises of axes, title, legend and gridlines. 
 
 The Axes is the area on which the data is plotted. The axes have the axis labels.
 Each Axes has an x-axis and a y-axis, which contain ticks, which have major and minor ticklines and ticklabels.
 
 Spines are lines that connect the axis tick marks and that designate the boundaries of the data area.

  <tr>
    <td> <img src="http://matplotlib.org/1.4.0/_images/fig_map.png" alt="Drawing" style="width: 400px;"/> </td>
    <td> <img src="http://matplotlib.org/2.0.2/_images/anatomy1.png" alt="Drawing" style="width: 600px;"/> </td>
   </tr>
   


## Line plots

### Single line plots
Let's generate some data to visualize.


In [None]:
x = [1,2,4,5,6,7,8,10]
y = [2.5,3.2,4.1,1.0,6,2, 0.5, -2.1]

First, we plot the data as a line plot.

You basically just need two functions for it: plot() to start plotting and show() for showing the plot you generated. 

In [None]:
plt.plot(x,y)
plt.show()

The pyplot module of python has defaults that are initialized so that you can quickly plot something without instantiating any Figures or Axes components.

### Multiple lines

We can also plot mulitple lines in one plot

In [None]:
x2 = [1,3,4,5,6,8,9,10]
y2 = [1.3,3,4,15,6,2, 0.2, -1]
plt.plot (x,y)
plt.plot (x2,y2)

Add some description to the plot. Here we add x and y axes labels, as well as the plot title.

In [None]:
x2 = [1,3,4,5,6,8,9,10]
y2 = [1.3,3,4,15,6,2, 0.2, -1]
plt.plot (x,y)
plt.plot (x2,y2)
plt.ylabel ("my y")
plt.xlabel ("my x")
plt.title("Nice Plot!")

Let us try to format the axes labels and the title further by changing font colour and size.

In [None]:
x2 = [1,3,4,5,6,8,9,10]
y2 = [1.3,3,4,15,6,2, 0.2, -1]
plt.plot (x,y)
plt.plot (x2,y2)
plt.ylabel ("my y", color="red")
plt.xlabel ("my x", color="#00FF00",)
plt.title("Nice \nPlot!", fontsize=24)
plt.plot (x,y)

Use different line styles and colors

**Important resource:** https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html  

In [None]:
x2 = [1,3,4,5,6,8,9,10]
y2 = [1.3,3,4,15,6,2, 0.2, -1]
x3 = [1,2,3,4,5,6,7,8,9,10]
y3 = [1.3,5,4,3,4,8,6.1, 0.2, -1,3]
plt.plot (x,y, marker= "v", color = "red")
plt.plot (x2,y2, marker="*", color="green")
plt.plot (x3,y3, marker="o", color= "black")

In [None]:
x2 = [1,3,4,5,6,8,9,10]
y2 = [1.3,3,4,15,6,2, 0.2, -1]
x3 = [1,2,3,4,5,6,7,8,9,10]
y3 = [1.3,5,4,3,4,8,6.1, 0.2, -1,3]
plt.plot (x,y, marker= "v", linestyle=":", color = "red")  
plt.plot (x2,y2, marker= "v", linestyle="--", color="green")
plt.plot (x3,y3, marker= "v", linestyle="-.", color= "black")

** We can also do the same in a more concise (but less understandable) way:  fmt = '[color][marker][line]'**


In [None]:
plt.plot(x, y, 'r*:', x2, y2, 'go--', x3, y3, 'kv-.')

In [None]:
plt.plot (x,y, "ro", color="black") # no line

In [None]:
plt.plot (x,y, "ro:", color="red")
plt.plot(x2, y2, "bs") #, color="green") #overwrite color

Adding a legend for the plot using legend() function and by adding labels for each line plot.

In [None]:
plt.plot (x,y, "ro", color="red", label="red ones")
plt.plot(x2, y2, "bs", color="green", label = "fancy ones")
plt.xlabel ("my_x")
plt.ylabel ("my_y")
plt.legend()

We can also cutomize the legend. 

For instance, the location of the legend, which can be set to 'upper left', 'upper right', 'lower left', 'lower right' that place the legend at the corresponding corner of the axes/figure; 'upper center', 'lower center', 'center left', 'center right' place the legend at the center of the corresponding edge of the axes/figure; 'center' places the legend at the center of the axes/figure.

bbox_to_anchor is the box that is used to position the legend in conjunction with loc and allows arbitrary placement of the legend. A 2-tuple (x, y) places the corner of the legend specified by loc at x, y. 



In [None]:
plt.plot (x,y, "ro", color="red", label="red ones")
plt.plot(x2, y2, "bs", color="green", label = "fancy ones")
plt.xlabel ("my_x")
plt.ylabel ("my_y")

#manual placement of the legend
plt.legend(loc="lower left",bbox_to_anchor=(0, 0))

Lets take another example.

In [None]:
y = [10,210,54000,250, 8,3, 2400, 20,520, 230000, 25, 25000]

What if we skip x, and only use y for plotting?

In [None]:
plt.plot(y)              # plot y using x as index array 0..N-1

In [None]:
x = range(12)
plt.plot(x,y)

We can also limit the visible range of either axes.

In [None]:
x = range(12)
plt.plot(x,y)
plt.ylim(0, 250000)

We can also apply scale type to axis (yscale or xscale) -- linear, log.

In [None]:
plt.plot(x,y)  
plt.yscale("log")

### Exercise 3.1

1. Create a simple line plot with two lines comparing the following measurements¶

In [None]:
weather_station_1 = [22,17,18,19,25,26,28,17,15,12]
weather_station_2 = [24,19,17,18,24,27,29,15,14,10]

2. Style this plot, such that the line colors are red and blue; add a legend; set the visible area for the y-axis from -5 to 35

## Bar Charts and Histograms

The range() function returns a sequence of numbers, starting from 0 by default, and increments by 1 (by default), and stops before a specified number. 

Syntax: range(start, stop, step)

Let us create a list of numbers from 1 to 6, in increments of 2

In [None]:
list(range(1,6,2))

The arange() function from numpy generates a numpy array with evenly spaced values.

In [None]:
list(np.arange(1,6,2))

In [None]:
list(np.arange(1,6)-0.2)

### Bar plot

Let us create a simple bar plot using bar() function, with some dummy data.

bar(x,y)  --> The bars are positioned at x, where x is a sequence of scalars. And y are the heights of the bars, where y is sequence of scalars.

In [None]:
y = [10, 30, 50, 20, 40, 80]
plt.bar(range(6), y)

Let us make a plot with multiple bars and add tick labels, and customize bar style (position, labels, width and color).

In [None]:
y = [10, 30, 50, 20, 40, 80]
y2 = [12, 32, 52, 22, 36, 56]
la = ["apples", "oranges", "melons", "peaches", "pears", "plums"]
plt.bar(np.arange(6) -0.2, y, width=0.4, color="blue")
plt.bar(np.arange(6), y, tick_label=la, width=0.0)        #for adding labels
plt.bar(np.arange(6) + 0.2, y2, width=0.4, color="red")

### Histogram


Let us generate some floating point data.

In [None]:
# generates an array of shape (d0, d1, ..., dn), filled with random floats sampled from a 
# univariate “normal” (Gaussian) distribution of mean 0 and variance 1.
# we add a arbitrary float to ensure the  generated random numbers are positive.

x = np.random.randn(1000) +2.7
x

Let us now plot the above data in a histogram by computing and drawing the histogram of x.

In [None]:
plt.hist(x)

If we pass an integer as the bins argument, we can define the number of equal-width bins in the range.

The default value for bins is 10. (See the plot above!).

In [None]:
plt.hist(x, bins=50)

If we set the density argument to True, then we draw and return a probability density (note the y axis).

Each bin will display the bin's raw count divided by the total number of counts and the bin width.

In [None]:
plt.hist(x, bins=20, density=True)

rwidth argument sets the relative width of the bars as a fraction of the bin width

In [None]:
plt.hist(x, bins=20, density = True, rwidth=0.8)

Lets add another dataset.

In [None]:
x2 = np.random.randn(2000) +1.4

Now we can plot both the datasets in the same histogram, and specify the color of the bars for both.

In [None]:
plt.hist([x,x2], bins=20, rwidth=0.8, color=["red", "blue"], density=False) 

### Exercise 3.2

Visualize the following election data with a paired bar chart

In [None]:
parties = ["CDU","SPD","AfD", "FDP", "DIE LINKE", "GRÜNE", "CSU", "Sonstige"]
result_2017 = [26.8,20.5,12.6,10.7,9.2,8.9,6.2,5.0]
result_2013 = [34.1,25.7,4.7,4.8,8.6,8.4,7.4,6.2]

## Scatter Plot

Let us generate some sample data to plot.
We create two data sequences of length 200 using random numbers.

In [None]:
x = np.concatenate([np.random.randn(100)/5 + 1, np.random.randn(100)/5 + 1.4])
y = np.concatenate([np.random.randn(100)/5 - 1, np.random.randn(100)/5 + 1])

In [None]:
plt.plot(x,range(200), 'o')

In [None]:
plt.plot(y,range(200), 'o') 

In [None]:
plt.plot(x,y, "o")   # here the first in x is plotted against first in y, second in x with second in y and so on

In [None]:
# We can do a very similar thing using the scatter function 
plt.scatter(x,y)

### Exercise 3.3

Create a scatter plot for the following data.

In [None]:
age = [32,34,19,33,57,15,62,65,32,17]
income = [52000,27000,7000,23000,18500,50000,72000,22000,42000,0]

## Pie charts


Make a pie chart of an array (say x). 
The fractional area of each wedge is given by x/sum(x).

The wedges are plotted counterclockwise, by default starting from the x-axis.

In [None]:
proportions = [4,5,11,3]
labels = ["Apple", "Plums", "Bananas", "Peaches"]
plt.pie(proportions, labels=labels)

Details about various options: https://matplotlib.org/3.3.1/api/_as_gen/matplotlib.pyplot.pie.html

In [None]:
proportions = [4,5,11,3]
labels = ["Apple", "Plums", "Bananas", "Peaches"]
plt.pie(proportions, labels=labels, colors=["#999900", "r", "b", "g"], labeldistance=1.1, explode=[0.1,0,0,0], startangle=90, counterclock=False, autopct='%.0f%%')

In the above example we set the following options:

explode is a len(x) array which specifies the fraction of the radius with which to offset each wedge.

startangle is the angle by which the start of the pie is rotated, counterclockwise from the x-axis.

counterclock sets the fractions direction, clockwise or counterclockwise

autopct is used to label the wedges with their numeric value. The label will be placed inside the wedge.


### Exercise 3.4

Create two pie charts for the following election data

In [None]:
parties = ["CDU","SPD","AfD", "FDP", "DIE LINKE", "GRÜNE", "CSU", "Sonstige"]
result_2017 = [26.8,20.5,12.6,10.7,9.2,8.9,6.2,5.0]
result_2013 = [34.1,25.7,4.7,4.8,8.6,8.4,7.4,6.2]

## Annotations


In [None]:
#Return evenly spaced floating numbers over a specified interval (start, stop, number of samples)
x = np.linspace(0,10,100) 
y = np.sin(x)

In [None]:
plt.plot(x,y)

In [None]:
plt.plot(x,y)
plt.ylim(-1.5, 1.5)
plt.annotate('local max', xy=(0.6, 1.2), color="k", fontsize=15) #xytext defaults to xy

In [None]:
plt.plot(x,y)
plt.ylim(-1.5, 1.5)
plt.annotate('local max', xytext=(3, 1.2), xy=(1.65, 1), color="k", fontsize=15,arrowprops={'facecolor':'black'}) #xytext defaults to xy

In [None]:
plt.plot(x,y)
ax = plt.gca()
ax.set_ylim((-1.5, 1.5))


In [None]:
plt.plot(x,y)
plt.ylim(-1.5, 1.5)
plt.annotate('local max', xytext=(3, 1.2), xy=(1.65, 1), color="k", fontsize=15,arrowprops={'facecolor':'black'}) #xytext defaults to xy
plt.savefig("myfig.jpg")

More savefig options available at: https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.savefig.html

# Plotting with Pandas

Pandas also has plotting functions directly integrated. These are basically shortcuts to matplotlib

In [None]:
import pandas as pd

df = pd.read_csv ("adult.csv", na_values="?")
h = df.head(100)
h

Let us get the number of males and females

In [None]:
df.sex.value_counts()

Now lets try to use Pandas plotting functionality. Its call plot(). 
We can specify different "kind" in plot() -- ‘line’ : line plot (default); ‘bar’ : vertical bar plot; ‘barh’ : horizontal bar plot; 
‘hist’ : histogram; ‘pie’ : pie plot; ‘scatter’ : scatter plot


Lets start by creating bar plot for the number of males and females.

## Bar

In [None]:
df.sex.value_counts().plot(kind="bar")

What if we want to plot the densities instead of counts.

In [None]:
(df.sex.value_counts() / len(df)).plot(kind="bar")

## Histogram

Now lets try to plot the histogram of the ages.

In [None]:
df["age"].hist(bins=[20,25,30,35,40,50])

In [None]:
# since its a shortcut to matplotlib, we can just add matplotlib parameters
df["age"].hist(bins=[20, 25,30,35,40,50], rwidth=0.5, color="red")

## Scatter

In [None]:
h.plot(x="age", y="hours-per-week", kind="scatter")

## Exercise 3.5

1. Plot a bar graph the average age of males and females.

# Seaborn

Seaborn is another visualization library built on top of matplotlib and pandas. It offers new styles, simple creation of more standardized plots and easy acces from pandas


In [None]:
import seaborn as sns

In [None]:
sns.factorplot(x="sex", y="age", data=df)

In [None]:
sns.factorplot(x="race", y="age", hue="sex", data=df, kind="bar")

In [None]:
sns.factorplot(x="race", y="age", hue="sex", data=df, kind="bar",  aspect=3)

In [None]:
sns.regplot ("age", "hours-per-week", data=df.head(1000))

In [None]:
sns.distplot(df.age, bins=20)

In [None]:
c = df.corr()
c

In [None]:
sns.heatmap(c)

In [None]:
plt.figure(figsize=(10,5))
g = sns.lmplot(x="age", y="hours-per-week", hue="sex",
               truncate=True, size=5, data=h)

In [None]:
sns.jointplot(df.age, df["education-num"], kind="kde", color="#4CB391")

In [None]:
sns.pairplot(data=df.sample(1000), hue="sex")

In [None]:
f = plt.figure()
plt.gca().plot(x,y)

In [None]:
f.suptitle("Hallo")
f

# Optional

## Scatter plots

In [None]:
x = np.concatenate([np.random.randn(10000)/5 + 1, np.random.randn(10000)/5 + 1.4])
y = np.concatenate([np.random.randn(10000)/5 - 1, np.random.randn(10000)/5 + 1])

In [None]:
plt.scatter(x,y, color="blue", alpha=0.005)

In [None]:
x = np.concatenate([np.random.randint(5, size=1000), np.random.randint(5, size=20) + 3])
y = np.concatenate([np.random.randint(5, size=1000), np.random.randint(5, size=20) -1])

In [None]:
x[:40]

In [None]:
x[-40:]

In [None]:
len(x)

In [None]:
plt.scatter (x,y)

In [None]:
# add "jitter"
x_jit = x + np.random.randn(len(x)) * 0.1
y_jit = y + np.random.randn(len(y)) * 0.1
plt.scatter(x_jit,y_jit, alpha= 0.2)

## Style Sheets

The style package adds support for easy-to-switch plotting "styles". There are a number of pre-defined styles provided by Matplotlib.

Reference: https://matplotlib.org/3.1.0/gallery/style_sheets/style_sheets_reference.html

Lets display all available styles.

In [None]:
print(plt.style.available)

For example, there's a pre-defined style called "ggplot", which emulates the aesthetics of ggplot (a popular plotting package for R).

In [None]:
plt.style.use('ggplot')

In [None]:
y = [10, 30, 50, 20, 40, 80]
y2 = [12, 32, 52, 22, 36, 56]
l = ["apples", "oranges", "melons", "peaches", "pears", "plums"]
plt.bar(np.arange(6), y, tick_label=l, width=0.4)

In [None]:
plt.style.use('dark_background')
y = [10, 30, 50, 20, 40, 80]
y2 = [12, 32, 52, 22, 36, 56]
l = ["apples", "oranges", "melons", "peaches", "pears", "plums"]
plt.bar(np.arange(6), y, tick_label=l, width=0.4)

In [None]:
plt.style.use('bmh')
y = [10, 30, 50, 20, 40, 80]
y2 = [12, 32, 52, 22, 36, 56]
l = ["apples", "oranges", "melons", "peaches", "pears", "plums"]
plt.bar(np.arange(6), y, tick_label=l, width=0.4)

In [None]:
plt.style.use('classic')
y = [10, 30, 50, 20, 40, 80]
y2 = [12, 32, 52, 22, 36, 56]
l = ["apples", "oranges", "melons", "peaches", "pears", "plums"]
plt.bar(np.arange(6), y, tick_label=l, width=0.4)

## Subplots

In [None]:
#plt.style.use('classic')
x= [1,2,3,4]
y= [1,4,9,16]

plt.figure(figsize=(10,4))  # an empty figure with no axes

In [None]:
plt.plot (x,y)

In [None]:
plt.subplots(3)

In [None]:
_, ax_lst = plt.subplots(3) 

In [None]:
type(_)

In [None]:
ax_lst

In [None]:
type(ax_lst)

In [None]:
for i,ax in enumerate(ax_lst):
    print(i,ax)

In [None]:
x = [1,2,3,4]
y =  [[1,2,3,4],[1,4,9,16],[1,10,20,25]]
_, ax_lst = plt.subplots(3, figsize=(5,10))
for i,ax in enumerate(ax_lst):
    ax.plot(x,y[i])

In [None]:
x = [1,2,3,4]
y =  [[1,2,3,4],[1,4,9,16],[1,10,20,25]]
_, ax_lst = plt.subplots(3, figsize=(5,10))
for i,ax in enumerate(ax_lst):
    ax.plot(x,y[i])
    ax.set_ylim((0,25))
    ax.set_ylabel ("my y")
    if (i==1):
        ax.grid(True)

In [None]:
fig = plt.figure(figsize=(4,6))
ax1 = fig.add_subplot(211) # short for ...add_subplot(2,1,1), which means: 2x1 grid, subplot number 1
ax1.plot([1,2,3,4],[1,2,10,3])
ax2 = fig.add_subplot(212, sharex=ax1, sharey=ax1) # short for ...add_subplot(2,1,1), which means: 2x1 grid, subplot number 2
ax2.plot([1,2,3,3.5],[1,2,5,3])

Non-object-oriented:

In [None]:
plt.figure(figsize=(4,6))
plt.subplot(211) 
plt.plot([1,2,3,4],[1,2,10,3])
plt.subplot(212,  sharex=plt.gca(), sharey=plt.gca()) 
plt.plot([1,2,3,3.5],[1,2,5,3])

In [None]:
fig = plt.figure(figsize=(5,3))
ax = fig.add_subplot(111) # short for ...add_subplot(1,1,1), which means: 1x1 grid, subplot number 1
axis = ax.xaxis
axis.set_ticks ([0.5,1.5,3,4])
axis.set_ticklabels (["Zero", "one.5", "three", "four"])

In [None]:
fig = plt.figure(figsize=(5,3))
ax = fig.add_subplot(111) # short for ...add_subplot(1,1,1), which means: 1x1 grid, subplot number 1
axis = ax.xaxis
axis.set_ticks ([1,1.5,3,4])
axis.set_ticklabels (["1", "one.5", "three", "four"])
for label in axis.get_ticklabels():
    label.set_color('red')
    label.set_rotation(45)
    label.set_fontsize(16)
rect = ax.patch
rect.set_facecolor('#991111')
#ax.set_xlim((-1.5, 4))
ax.plot([1,2,3,4],[0,2,3,4], color="k")

In [None]:
plt.figure(figsize=(5,3))
plt.subplot(111, facecolor='#991111') # short for ...add_subplot(1,1,1), which means: 1x1 grid, subplot number 1
plt.xticks ([1,1.5,3,4], ["1", "one.5", "three", "four"])
for label in plt.gca().xaxis.get_ticklabels():
    label.set_color('red')
    label.set_rotation(45)
    label.set_fontsize(16)
plt.plot([1,2,3,4],[0,2,3,4], color="k")

# Further reading

1. https://www.datacamp.com/community/tutorials/matplotlib-tutorial-python

2. 