<font color = green >

# Matplotlib
</font>

In [1]:
import matplotlib.pyplot as plt
import numpy as np 
# Note:  Using pylab is now discouraged

***
<font color = green >

## Setup backend
</font>


A backend is an abstraction layer which knows how to interact with the operating environment, whether it's an operating system, or an environment like the browser, and knows how to render matplotlib commands
<br>

*Note: some backends don't support some features particularly interaction features*

In [2]:
import matplotlib as mpl
mpl.get_backend() 
# Output: 'module://ipykernel.pylab.backend_inline'

'module://matplotlib_inline.backend_inline'

Setup matplotlib backend
`%matplotlib notebook`


In [6]:
%matplotlib notebook
# %matplotlib inline
mpl.get_backend() 
# Output : 'nbAgg'

'nbAgg'

<font color = green >

### working on mac os
</font>

In case of error on Mac:
<br>`RuntimeError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework.`
<br>Use:
<br>`import matplotlib`
<br>`matplotlib.use('TkAgg')`
<br>
`print (matplotlib.get_backend())`
<br>
`# out: TkAgg`

also include `plt.show()` after plot configuration

***
<font color = green >

## Matplotlib layers 
</font>

* `backend layer` deals with actual drawing 
* `artist layer` on top of the backend  describes how data is arranged (aka `matplotlib api`)
* `scripting layer` - actually creates artists and choreographs them all togethe

***
<font color = green >

## plot 
</font>

In [7]:
plt.plot(1,3) # no mark 
plt.plot(1.5,0, '.') # small point 
plt.plot(1.5,4,'x') # x mark 
plt.plot(2,2,'o')
plt.plot([1,3,5],[3,5,-1],'-o', c= 'red') # joined
# Note 
    # different colors since it considers as different series 
    # This is scripting layer 


<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x21b35c9c590>]

In [8]:
import numpy as np 
plt.figure()
y= np.arange(0,10,2)
plt.plot(y,'o-')
# Note x-vals resolved automatically 

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x21b35cf3090>]

<font color = green >

### Artist layer sample
</font>

In [9]:
from matplotlib.backends.backend_agg import FigureCanvasAgg
from matplotlib.figure import Figure
# create new figure
fig = Figure()
# associate fig with the backend
canvas= FigureCanvasAgg(fig)
# add subplot to figure
ax = fig.add_subplot(111)
ax.plot(3,2,'.')
# create the png file
canvas.print_png('img/test.png') # the backend that for the jupiter notebooks, isn't able to render this directly.

FileNotFoundError: [Errno 2] No such file or directory: 'img/test.png'

In [6]:
%%html
<img src = "img/test.png">

<font color = green >

### gca (get current axis)
</font>

In [7]:
# scripting layer 
plt.figure()
plt.title('simple_plot')
plt.plot(1,2,'x')

# go down to artist layer 
ax= plt.gca() # get current axis 
ax.axis([0,3,-2,5])

<IPython.core.display.Javascript object>

[0, 3, -2, 5]

In [54]:
ax.get_children()
# line 2D is the data points 
# spines are the borders
# two lables for axis objects and title 
# texts are labels for the chart. 
# rectangle is the background for the axis

[<matplotlib.lines.Line2D at 0x1146fcd30>,
 <matplotlib.spines.Spine at 0x1146fce48>,
 <matplotlib.spines.Spine at 0x1146fc780>,
 <matplotlib.spines.Spine at 0x1146fcc88>,
 <matplotlib.spines.Spine at 0x1146fcc50>,
 <matplotlib.axis.XAxis at 0x1146f3c50>,
 <matplotlib.axis.YAxis at 0x1146f35c0>,
 Text(0.5,1,'simple_plot'),
 Text(0,1,''),
 Text(1,1,''),
 <matplotlib.patches.Rectangle at 0x1133d69e8>]

<font color = green >

### Multi axes
</font>

In [8]:
plt.figure()
ax1 = plt.subplot(1, 2, 1)
linear_data = np.array([1, 2, 3, 4, 5, 6, 7, 8])
exponential_data = linear_data ** 2
ax1.plot(linear_data, '-o')
ax2 = plt.subplot(1, 2, 2, sharey=ax1) # Note shared y axis
ax2.plot(exponential_data, '-o')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x10888dd68>]

In [84]:
plt.gcf().get_axes() 

[<matplotlib.axes._subplots.AxesSubplot at 0x117324860>,
 <matplotlib.axes._subplots.AxesSubplot at 0x11764ce48>]

<font color = green >

### sca (set current asix)
</font>

In [9]:
print (plt.gca())
plt.sca(ax1) # set current axis as active
# print (plt.gca())

AxesSubplot(0.547727,0.11;0.352273x0.77)


<font color = green >

### gcf ( get current figure)
</font>

In [10]:
plt.gcf().canvas

<matplotlib.backends.backend_nbagg.FigureCanvasNbAgg at 0x10849dd68>

<font color = green >

### Structure summary 
</font>

<b>scripting layer - for most usage. Note : it captures current figure and current axis </b>
<br>&emsp;&emsp;`plt`
<br>&emsp;&emsp;&emsp;&emsp;how to get: import matplotlib.pyplot as plt 
<br>&emsp;&emsp;&emsp;&emsp;samples: `plt.plot, plt.title, plt.show, plt.text, plt.tick_params, etc.`)
<br><b>artist layer - use for more complicated / for tuning </b>
<br>&emsp;&emsp;`axis` 
<br>&emsp;&emsp;&emsp;&emsp;how to get: fig.add_subplot() / plt.gca() / plt.gcf().get_axes()
<br>&emsp;&emsp;&emsp;&emsp;samples: ax.spines, ax.axis, ax.get_xbound, ax.arrow, ax.set_title, ax.set_ylabel etc.) 
<br>&emsp;&emsp;`figure` 
<br>&emsp;&emsp;&emsp;&emsp;how to get: from matplotlib.figure import Figure / plt.figure / plt.gcf()
<br>&emsp;&emsp;&emsp;&emsp;samples: FigureCanvasAgg(fig), fig.add_subplot, fig.savefig, fig.canvas, etc.) </li>
<br><b>backend layer - use for configuration / for tuning </b>
<br>&emsp;&emsp;`canvas` 
<br>&emsp;&emsp;&emsp;&emsp;how to get: import matplotlib as mpl
<br>&emsp;&emsp;&emsp;&emsp;samples: mpl.cm, mpl.colors, mpl.colorbar  etc.)

*** 
<font color = green >

## Major drawing types  
</font>


<font color = green >

### Scatter
</font>

In [11]:
# Use scatter when you don't need to keep consequence of data 

# create some series for x-vals 
x= np.arange (20)

# create some series for y-vals 
y=x.copy()
np.random.shuffle(y)

# create new figure
plt.figure()

# draw 2 scatters  
plt.scatter(x[:5],y[:5],c = 'g',s = 100,label= 'low')  
plt.scatter(x[5:],y[5:],c = 'r',s = 50,label= 'high')

plt.title('Scatterplot')

plt.legend(loc = 4, title= 'legend title')
plt.xlabel('x-labels')
plt.ylabel('y-labels')



<IPython.core.display.Javascript object>

Text(0,0.5,'y-labels')


<font color = green >

### Plot 
</font>

In [12]:
# Use plot to keep consequence of data 
plt.figure() 
x = np.linspace(0,7,100)
sin_x = np.sin(x)
plt.plot(x, sin_x, '.')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x108e3df28>]


<font color = green >

#### fill_between
</font>

In [13]:
plt.figure()
linear_data = np.arange(10)
exponential_data = linear_data ** 2
plt.plot(linear_data, '-og')
plt.plot(exponential_data, '--ob')
plt.fill_between(range(10),linear_data, exponential_data, facecolors='yellow',alpha=0.15)  

<IPython.core.display.Javascript object>

<matplotlib.collections.PolyCollection at 0x109106438>


<font color = green >

### Barchart 
</font>

In [17]:
plt.figure()
linear_data = np.array([1, 2, 3, 4, 5, 6, 7, 8])
quadratic_data = linear_data ** 2
x_vals = np.arange(len(linear_data))
plt.bar(x_vals, linear_data)

<IPython.core.display.Javascript object>

<BarContainer object of 8 artists>


<font color = green >

#### Horizontal barchart 
</font>

In [18]:
plt.figure()
plt.barh(x_vals, linear_data, height=0.3)

<IPython.core.display.Javascript object>

<BarContainer object of 8 artists>


<font color = green >

#### Difference on barchart 
</font>

In [19]:
plt.figure()
plt.barh(x_vals, linear_data, height=0.7)
n = len(x_vals)
added_const = np.array([2]* n)
# plt.barh(x_vals, added_const, height=0.7, color='orange', left=linear_data, alpha = .6)
plt.barh(x_vals, quadratic_data, height=0.7, color='orange', left=linear_data, alpha = .6)

<IPython.core.display.Javascript object>

<BarContainer object of 8 artists>

<font color = green >

### Histogram
</font>

In [20]:
fig = plt.figure()
ax1= fig.add_axes([0,0,0.95,.95])
ax2 = fig.add_axes([0.5,0.1,0.4,0.4])
sample = np.random.normal(loc=0.0, scale=1.0, size=10000)
ax1.hist(sample) 
ax1.set_title('n=10')

ax2.hist(sample, bins=100) # hist displays by default 10 bins
ax2.set_title('n=1000')

<IPython.core.display.Javascript object>

Text(0.5,1,'n=1000')

<font color = green >

### Heatmap
</font>

In [21]:
plt.figure()

x = np.random.random(size=10000)
y = np.random.normal(loc=0.0, scale=1.0, size=10000)

plt.hist2d(x, y, bins=50)

plt.colorbar()

<IPython.core.display.Javascript object>

<matplotlib.colorbar.Colorbar at 0x10a8c4d30>

***
<font color = green >

## Sublots
</font>

<font color = green >

### sublot
</font>

In [22]:
plt.figure()
vals = np.arange(0.0, 3.0, 0.01)
ax2 = plt.subplot(2,3,3) # use 5th cell of created table of 2 rows and  3 columns 
ax2.plot(vals, c='g')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x10ad97b38>]

In [79]:
plt.figure()
ax1 = plt.subplot(212)  # create 2 cells and  use 2nd
ax1.plot(vals, c='r')

ax2 = plt.subplot(10,7,3) # create 4 cells and  use 1st
ax2.plot(vals, c='g')

ax3 = plt.subplot(222) # create 4 cells and  use 2nd
ax3.plot(vals, c='b')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x1190b45f8>]

<font color = green >

### sublots
</font>

In [23]:
# Note: it returns not only axes but figure as first el of tuple 
fig, ((ax1, ax2, ax3), (ax4, ax5, ax6)) = plt.subplots(2, 3, sharex=True, sharey=True) # Note : this creates new figure
vals = np.arange(0.0, 3.0, 0.2)
plt.plot(np.random.permutation(vals), '-x') # Note: the last axis is active 
ax2.plot(vals, '.')
ax5.plot(np.random.permutation(vals), '-')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x10b3712b0>]

<font color = green >

### gridspec
</font>

In [24]:
import matplotlib.gridspec as gridspec  
x = np.random.random(size=1000)
y = np.random.randn(1000)

plt.figure()  # new figure
gspec = gridspec.GridSpec(3, 3)

# get necessary axes
ax_top_histogram = plt.subplot(gspec[0, 1:])  # this returns axis
ax_side_histogram = plt.subplot(gspec[1:, 0])
ax_lower_right = plt.subplot(gspec[1:, 1:])

# make necessary drawwing 
ax_lower_right.scatter(x, y, s= 5)  # regular scatter
ax_top_histogram.hist(x, bins=100)  # Note:  normed=True, scales to 0-1 range. 
ax_side_histogram.hist(y, bins=100, orientation='horizontal')
ax_side_histogram.invert_xaxis()  # flip the histogram for more natural view 


<IPython.core.display.Javascript object>

***
<font color = green >

## Animation
</font>

In [6]:
from matplotlib import animation
fig, ax = plt.subplots()

def update(curr):
    if curr>10:
        a.event_source.stop()
    ax.clear()
    ax.plot(np.random.rand(10))
    plt.title('Animation: {}'.format(curr)) # need to add it once more since it disappears with ax.clear()
#     ax.set_ylim(0, 1)
    

a = animation.FuncAnimation(fig, update, interval=500) # Note: you need to assign this in order to run the animation if you believe it necessary 
# Note use this name to stop animation

<IPython.core.display.Javascript object>

***
<font color = green >

## Interaction
</font>

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as st
import matplotlib as mpl

def calc_conf_interval(data):
    return st.t.interval(0.95, len(data) - 1, loc=np.mean(data), scale=st.sem(data))

def onclick(event): # create event handler
    print (event.xdata)
    plt.gca().clear()
    update_plt(event.ydata)
    plt.gcf().canvas.draw() # Note: this is required to refresh the current window

def calc_prob(threshold, data_column):
    conf_inerval = calc_conf_interval(data_column)
    data_of_interval = data_column[(conf_inerval[0] <= data_column) & (data_column< conf_inerval[1])]
    data_larger= data_of_interval[threshold< data_of_interval]
    return len(data_larger)/len(data_of_interval ) # percentage of data from coef interval that larger than threshold


def update_plt(y_line=None):
    df_mean = df.mean()  # get mean of every column
    x_vals = np.arange(len(df.columns))

    if not y_line: # set initial = before any mouse click
        y_line= df_mean.mean()

    array_probs = np.array([calc_prob(y_line, df[year_index]) for year_index in df.columns])
    colors = np.array([cmap(array_probs[i]) for i in range(len(df.columns))])  # range 0-1
    bars = plt.bar(x_vals, df_mean, width=1, color = colors, alpha=0.9, edgecolor='black') # plt.errorbar does not work in Jupyter :(
    for i in range(len(df.columns)):
        conf_interval = calc_conf_interval(df[df.columns[i]])
        plt.plot([i, i], conf_interval, '-', color='black')  # draw the vertical line
        for boundary in conf_interval:
            plt.plot([i- 1/4, i + 1/ 4], [boundary, boundary], '-', color = 'black')


    plt.xticks(x_vals, df.columns, color='black', alpha=0.7)
    plt.tick_params(bottom=False)
    plt.yticks(color='black', alpha=0.7)

    max_bar = np.max(df_mean) # get max bar to use it as template for calculation of additional extension of plot area
    ax.axis([-0.6, 4.4, -max_bar*0.2,max_bar*1.3]) # correct plot position
    ax.spines['bottom'].set_position(('data', 0)) # move the x-axis to 0 point

    xmin, xmax = ax.get_xbound() # get x bounds for line
    plt.axhline(y=y_line, xmin=xmin - 2, xmax=xmax, linewidth=2, color='grey') # draw the line for all
    y_template = df_mean[df.columns[0]]/20
    ax.text(xmax, y_line +y_template/5,str(int(y_line)), ha= 'right', va= 'bottom',color= 'black',fontsize= 10,bbox=dict(facecolor='none', edgecolor='gray', pad=2.0))
    ax.arrow(3.70, y_line+ y_template, 0.0, y_template*3, fc="k", ec="k", head_width=0.1, head_length=y_template*2)
    ax.arrow(3.70, y_line - y_template, 0.0, -y_template*3, fc="k", ec="k",
             head_width=0.1, head_length=y_template*2)

    ax.spines['top'].set_visible(False)  # remove the border top
    ax.spines['right'].set_visible(False)  # remove the border right


    bounds = np.linspace(0, 1, 12) # set the bound for color bar
    norm = mpl.colors.BoundaryNorm(bounds, cmap.N)
    cb = mpl.colorbar.ColorbarBase(ax_cb, cmap=cmap, norm=norm,drawedges=True,  ticks=bounds, boundaries=bounds,format='%.2f',orientation='horizontal')

np.random.seed(12345) # allows to get the same random sequesnce at every launch of program
df = pd.DataFrame([np.random.normal(32000,200000,3650),
                   np.random.normal(43000,100000,3650),
                   np.random.normal(43500,140000,3650),
                   np.random.normal(48000,70000,3650)],
                  index=[1992,1993,1994,1995])

df = df.T

fig = plt.figure()
ax_main= plt.gca()
ax_main.axis('off') # remove border and axis
ax = fig.add_axes([0.1, 0.1, 0.9, 0.9]) # create new axis to update x,y,length,hight

fig.canvas.mpl_connect('button_press_event', onclick) # subscribe the event hander to event
cmap = plt.cm.jet # set the color schema
# cmap = mpl.cm.viridis # alternative color chema # https://matplotlib.org/examples/color/colormaps_reference.html

ax_cb = fig.add_axes([0.2, 0.07, 0.75, 0.02]) # create new axis for color bar x,y,length,hight
plt.sca(ax) # set the current axis as active
update_plt() # draw the plot before any click


<IPython.core.display.Javascript object>

***
<font color = green >

## Setting colors and styles
</font>

In [8]:
cmap(0) # (0.993248, 0.906157, 0.143936, 1.0)


(0.0, 0.0, 0.5, 1.0)

<font color = green >

### color map
</font>

In [9]:
fig, ax = plt.subplots()
bounds = np.linspace(0, 1, 12) # set the bound for color bar
cmap = mpl.cm.magma 
# cmap = mpl.cm.viridis # alternative color chema # https://matplotlib.org/examples/color/colormaps_reference.html
norm = mpl.colors.BoundaryNorm(bounds, cmap.N)
cb = mpl.colorbar.ColorbarBase(ax, cmap=cmap, norm=norm,drawedges=True,  ticks=bounds, boundaries=bounds,format='%.2f',orientation='horizontal')
# print ('color for 100 of {}: {}'.format(cmap.N, cmap(100)[:3]))

<IPython.core.display.Javascript object>

<font color = green >

### style
</font>

In [10]:
# Review the style : https://tonysyu.github.io/raw_content/matplotlib-style-gallery/gallery.html
plt.style.available # see the pre-defined styles provided

['Solarize_Light2',
 '_classic_test_patch',
 'bmh',
 'classic',
 'dark_background',
 'fast',
 'fivethirtyeight',
 'ggplot',
 'grayscale',
 'seaborn',
 'seaborn-bright',
 'seaborn-colorblind',
 'seaborn-dark',
 'seaborn-dark-palette',
 'seaborn-darkgrid',
 'seaborn-deep',
 'seaborn-muted',
 'seaborn-notebook',
 'seaborn-paper',
 'seaborn-pastel',
 'seaborn-poster',
 'seaborn-talk',
 'seaborn-ticks',
 'seaborn-white',
 'seaborn-whitegrid',
 'tableau-colorblind10']

***
<font color = green >

## Visualization with pandas 
</font>

In [30]:
plt.style.use('seaborn-colorblind')
np.random.seed(123) 

#  Cumulative sum (running total) - total sum of data as it grows with time. 
# here it is random steps 
df = pd.DataFrame({'A': np.random.randn(365).cumsum(0),
                   'B': np.random.randn(365).cumsum(0) + 20,
                   'C': np.random.randn(365).cumsum(0) - 20},
                  index=pd.date_range('1/1/2017', periods=365))
# print (df.head(20))
# print (df.describe())


<font color = green >

### pandas plot
</font>

In [31]:
df.plot()


# Note: labels , legend

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x10f8ac4a8>

<font color = green >

### pandas scatter
</font>

In [32]:
df.plot.scatter('A', 'C', c=df['B'], s=df['B'], colormap='viridis') 
# Note:: colorbar

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x10fca1668>

In [598]:
df.plot('A','B', kind = 'scatter')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x14c6612e8>

In [33]:
# alternative syntaxis df.plot.scatter instead of kind = 'scatter'
ax = df.plot.scatter('B', 'C', c=np.arange(len(df)), s=10, colormap='magma') # set color corresponding to index
# Note it return s axis object 
ax.set_aspect('equal') # to compare absolute ranges 

<IPython.core.display.Javascript object>

<font color = green >

### pandas histogram
</font>

In [34]:
df.plot.hist(alpha=0.7, bins=100);



<IPython.core.display.Javascript object>

<font color = green >

### kernel density estimate
</font>

In [35]:
df.plot.kde() # kernel density estimate




<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1107f0e80>

In [None]:
# mention about seaborn 


<font color = green >

## Remove junk 
</font>

In [1]:
plt.figure()

# use subplot to demonstrate difference  
ax1 = plt.subplot(2,1, 1)
languages = ['Python','SQL', 'Java','C++', 'JavaScript']
popularity = np.array([56,39,34,34,29])
x_vals= range(len(popularity))
plt.bar (x_vals,popularity,align= 'center',alpha = 0.9)
plt.xticks(x_vals,languages,alpha = 0.9 )

plt.ylabel('% popularity')
plt.title('Top 5 languages for math and data')

ax_2 = plt.subplot(2,1,2)

languages = ['Python','SQL', 'Java','C++', 'JavaScript']
popularity = np.array([56,39,34,34,29])
x_vals= range(len(popularity))

bar_2= plt.bar (x_vals,popularity,align= 'center',alpha = 0.9, color= 'grey')
bar_2[0].set_color('#1F77B4')

plt.xticks(x_vals,languages,alpha = 0.9 )

# move the y-label to title 
plt.title('Top 5 languages for math and data by % popularity')

# hide axes ticks and y-axis labels 

plt.tick_params(
    top =False,
    bottom= True,
    left  = False,
    labelleft = False, 
    labelbottom = True 
    )


for bar in bar_2:
    ax_2.text(
        bar.get_x() + bar.get_width()/2, # set x position
        bar.get_height()-6, # set y position
        '{}%'.format(bar.get_height()), # # provide text
        ha= 'center',
        color= 'w',
        fontsize= 10
    )

for spine in ax_2.spines.values():
    spine.set_visible(False)


plt.subplots_adjust(hspace = .6)

NameError: name 'plt' is not defined

<font color = green >

### Learn more
</font>

Вocumentation for Matplotlib
<br>https://matplotlib.org/index.html
<br>Applied Plotting, Charting & Data Representation in Python
<br>https://www.coursera.org/courses?query=plotting%20charting
<br>Ten Simple Rules for Better Figures
<br>https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003833#s8
<br>Hunter, J., & Droettboom, M. (2012). matplotlib in A. Brown (Ed.), The Architecture of Open Source Applications, Volume II: Structure, Scale, and a Few More Fearless Hacks (Vol. 2)
<br>http://www.aosabook.org/en/matplotlib.html
<br>seaborn: statistical data visualization
<br>https://seaborn.pydata.org/index.html



