<hr>
<p style="font-size:30px; color: goldenrod; text-align: center; line-height: 80px;">Plotting and Visualization</p>
<hr>

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# for interactive plotting in the Jupyter Notebook
%matplotlib notebook

In [7]:
data = np.arange(10)
plt.plot(data)

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x13a7818ddc0>]

## Figures and Subplots

Plots in matplotlib reside within a  Figure  object. You can create a new figure with
plt.figure

In [34]:
fig = plt.figure()  # empty plot window

<IPython.core.display.Javascript object>

In [37]:
ax1 = fig.add_subplot(2, 2, 1)  # add plot to the blank figure above
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)
ax4 = fig.add_subplot(2, 2, 4)

In [39]:
# plotting command make matplotlib draw on the last figure (ax4)
plt.plot(np.random.randint(-10, 10, size=20))

[<matplotlib.lines.Line2D at 0x13a78a4cd30>]

In [40]:
# to plot on other figure use vars (ax1, ax2, ax3)
ax3.plot(np.random.randn(50).cumsum(), 'k--')
# 'k--' is a style option instructing matplotlib to plot a black dashed line.

[<matplotlib.lines.Line2D at 0x13a78a11190>]

In [42]:
np.random.randn(50).cumsum()

array([-1.26675177,  0.05644553, -1.97910615, -3.40141376, -5.85714084,
       -8.78285065, -8.92709004, -9.13219127, -8.32989457, -8.22048846,
       -8.50318772, -7.92651297, -7.9547813 , -8.32246479, -8.01368685,
       -7.41579877, -7.43870735, -9.25152816, -8.70947544, -6.98102098,
       -6.87683651, -7.26969208, -6.61152186, -6.08020057, -7.04171761,
       -6.99663519, -6.95685153, -5.06453973, -4.77961496, -7.19805395,
       -6.57563461, -6.43829399, -6.98343729, -7.54241386, -6.1557402 ,
       -5.11231895, -3.80802924, -4.53913239, -4.49721931, -3.67600303,
       -2.66355047, -1.26166373, -0.31689969, -0.37925707,  1.13495107,
       -0.62470763,  0.46126611,  1.33768214,  1.57396903,  1.993159  ])

The
objects returned by  fig.add_subplot  here are  AxesSubplot  objects, on which you
can directly plot on the other empty subplots by calling each one’s instance method

In [43]:
_ = ax1.hist(np.random.randn(100), bins=20, color='k', alpha=0.3)

In [44]:
ax2.scatter(np.arange(30), np.arange(30) + 3*np.random.randn(30))

<matplotlib.collections.PathCollection at 0x13a78a5cc10>

In [53]:
fig = plt.figure()
ax = fig.add_subplot(projection='3d')

<IPython.core.display.Javascript object>

In [54]:
ax.scatter(np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10))
ax.scatter(np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10))
ax.scatter(np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10))
ax.scatter(np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10))

<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x13a79c8e8e0>

Catalog of plot types: <a href="http://matplotlib.sourceforge.net/">matplotlib docs</a>

Creating  a  figure  with  a  grid  of  subplots  is  a  very  common  task,  so  matplotlib
includes a convenience method,  <code>plt.subplots</code> , that creates a new figure and returns
a NumPy array containing the created subplot objects:

In [3]:
fig, axes = plt.subplots(2, 3)

<IPython.core.display.Javascript object>

In [14]:
print(axes,
      axes[0, 1], sep='\n\n')

[[<AxesSubplot:> <AxesSubplot:> <AxesSubplot:>]
 [<AxesSubplot:> <AxesSubplot:> <AxesSubplot:>]]

AxesSubplot(0.398529,0.53;0.227941x0.35)


This is very useful, as the  axes  array can be easily indexed like a two-dimensional
array; for example,  axes[0, 1] . You can also indicate that subplots should have the
same  x-  or  y-axis  using  sharex   and  sharey ,  respectively.  This  is  especially  useful
when you’re comparing data on the same scale; otherwise, matplotlib autoscales plot
limits independently.

Indicate that subplots should have the
same  x-  or  y-axis  using  sharex   and  sharey ,  respectively.

In [17]:
axes[0, 1].sharex

<bound method _AxesBase.sharex of <AxesSubplot:>>

In [18]:
axes[0, 1].sharey

<bound method _AxesBase.sharey of <AxesSubplot:>>

Adjusting the spacing around subplots:
<pre>
<code>subplots_adjust(left=None, botton=None, right=None, top=None,
                      wspace=None, hspace=None)</code></pre>

wspace  and  hspace  controls the percent of the figure width and figure height, respec‐
tively, to use as spacing between subplots. Here is a small example where I shrink the
spacing all the way to zero

In [23]:
fig, axes = plt.subplots(2, 2, sharex=True, sharey=True)
for i in range(2):
    for j in range(2):
        axes[i, j].hist(np.random.randn(500), bins=50, color='k', alpha=0.5)
plt.subplots_adjust(wspace=0, hspace=0)

<IPython.core.display.Javascript object>

## Colors, Markers, and Line Styles

In [24]:
# plot x versus y with green dashes:
fig = plt.figure()
ax = fig.add_subplot()
ax.plot(np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10), 'g--')  # 'g--' or 'g' -green

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x261c197f1c0>]

In [34]:
fig = plt.figure()
ax = fig.add_subplot()
ax.plot(np.random.randn(50).cumsum(), linestyle='--', color='g')  # 'g' - green

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x261c1c2dd00>]

In [41]:
fig = plt.figure()
ax = fig.add_subplot()
ax.plot(*[np.random.randn(50).cumsum() for _ in range(2)], linestyle='--', color='r')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x261c1d83eb0>]

In [39]:
plt.plot?

In [48]:
# plot with markers
fig = plt.figure()
ax = fig.add_subplot()
ax.plot(np.random.randn(30).cumsum(), 'ko--')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x261c2ec6cd0>]

In [49]:
# plot with markers (explicitly with keyargs)
fig = plt.figure()
ax = fig.add_subplot()
ax.plot(np.random.randn(30).cumsum(), linestyle='dashed', color='k', marker='o')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x261c2f3ae50>]

For line plots subsequent points are linearly interpolated by default. This can be altered with the
<code>drawstyle</code> option.

In [50]:
fig = plt.figure()
ax = fig.add_subplot()
data = np.random.randn(30).cumsum()
ax.plot(data, 'k--', label='Default')
ax.plot(data, 'k-', drawstyle='steps-post', label='steps-post')
ax.legend(loc='best')

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x261c2fabaf0>

## Setting the title, axis labels, ticks and ticklabels

Call <code>plt.legend</code> or <code>ax.legend</code> for reference to the axes.
For set ticks use: <pre>
<code>ax.set_xticks; ax.set_xticklabels</code></pre>

In [51]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(np.random.randn(1000).cumsum())

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x261c40296a0>]

In [52]:
# To change the x-axis ticks use set_xticks and set_xticklabels
ticks = ax.set_xticks([0, 250, 500, 750, 1000])

In [53]:
# With set_xticklabels can set any other values as the labels
labels = ax.set_xticklabels(['one', 'two', 'three', 'four', 'five'], rotation=30, fontsize='small')

In [58]:
# Give a name to the x-axis
ax.set_xlabel('Stages')

Text(0.5, 16.994938032606715, 'Stages')

In [55]:
# Give subplot title
ax.set_title('My matplotlib.pyplot plot')

Text(0.5, 1.0, 'My matplotlib.pyplot plot')

In [57]:
ax.set_ylabel('Degree')

Text(53.666666666666664, 0.5, 'Degree')

In [59]:
axis_names = {
    'title': 'New plot',
    'xlabel': 'Stages',
    'ylabel': 'Degree'
}
ax.set(**axis_names)

[Text(0.5, 1.0, 'New plot'),
 Text(0.5, 16.994938032606715, 'Stages'),
 Text(53.666666666666664, 0.5, 'Degree')]

### Adding legends

In [3]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(np.random.randn(1000).cumsum(), 'k', label='one')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x2b3bca0d2b0>]

In [4]:
ax.plot(np.random.randn(1000).cumsum(), 'k--', label='two')
ax.plot(np.random.randn(1000).cumsum(), 'k.', label='three')

[<matplotlib.lines.Line2D at 0x2b3beac20a0>]

In [5]:
# To automatically create a legend call ax.legend()
ax.legend(loc='best')

<matplotlib.legend.Legend at 0x2b3beab67c0>

In [87]:
df = pd.DataFrame({
    'name':['Ivan','Maria','Aleksander','Maksim','Artem','Anna','Mark'],
    'age':[23,78,22,19,45,33,20],
    'gender':['M','F','M','M','M','F','M'],
    'state':['Moscow','Tula','Astrakhan','Tula','Moscow','Rostov','Rostov'],
    'num_children':[2,0,0,3,2,1,4],
    'num_pets':[5,1,0,5,2,2,3]
})

In [88]:
df

Unnamed: 0,name,age,gender,state,num_children,num_pets
0,Ivan,23,M,Moscow,2,5
1,Maria,78,F,Tula,0,1
2,Aleksander,22,M,Astrakhan,0,0
3,Maksim,19,M,Tula,3,5
4,Artem,45,M,Moscow,2,2
5,Anna,33,F,Rostov,1,2
6,Mark,20,M,Rostov,4,3


In [89]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

<IPython.core.display.Javascript object>

In [90]:
df.groupby('state')['name'].nunique().plot(ax=ax, kind='bar')

<AxesSubplot:xlabel='state'>

### Annotation and Drawing on a Subplot
Add annotations and text using the <code>text</code>, <code>arrow</code>, and <code>annotate</code> functions

In [6]:
# 'text' draws text at given coordinates (x, y) on the plot with optional custom styling:
ax.text(0, -30, 'Start point', family='monospace', fontsize=10)

Text(0, -30, 'Start point')

Annotations can draw both text and arrows arranged appropriately. As an example, let's plot the closing MOEX index price since 2006 december and annotate it with some of the important dates from the 2008-2009 financial crisis.

In [13]:
pd.to_datetime?

In [3]:
df = pd.read_csv("IMOEX.txt")
df.columns = ['ticker', 'period', 'date', 'time', 'open', 'high', 'low', 'close', 'vol']

In [4]:
df['date'] = df['date'].apply(str)
df['date']

0       20061213
1       20061214
2       20061215
3       20061218
4       20061219
          ...   
3683    20210823
3684    20210824
3685    20210825
3686    20210826
3687    20210827
Name: date, Length: 3688, dtype: object

In [5]:
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')

In [6]:
df['date']

0      2006-12-13
1      2006-12-14
2      2006-12-15
3      2006-12-18
4      2006-12-19
          ...    
3683   2021-08-23
3684   2021-08-24
3685   2021-08-25
3686   2021-08-26
3687   2021-08-27
Name: date, Length: 3688, dtype: datetime64[ns]

In [73]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

<IPython.core.display.Javascript object>

In [74]:
df.plot(ax=ax, x='date', y='close', style='k-', label='MOEX index')

<AxesSubplot:xlabel='date'>

In [75]:
import datetime as dt

In [76]:
crisis_periods = [
    (dt.datetime(2007, 12, 10), 'Peak of bull market'),
    (dt.datetime(2008, 3, 12), 'Bear Stearns Fails'),
    (dt.datetime(2008, 9, 15), 'Lehman Brothers Bankruptcy')
]

In [77]:
for date, label in crisis_periods:
    ax.annotate(label, xy=(date, df.loc[df.date==date, 'close'] + 75),
                xytext=(date, df.loc[df.date==date, 'close'] + 310),
                arrowprops=dict(facecolor='red', headwidth=8, width=2, headlength=6),
                horizontalalignment='left', verticalalignment='top')

In [78]:
# Zoom in on 2007-2010
ax.set_xlim(['1/1/2007', '1/1/2011'])
ax.set_ylim([500, 2500])

(500.0, 2500.0)

In [79]:
ax.set_title('Important dates in the 2008-2009 financial crisis')

Text(0.5, 1.0, 'Important dates in the 2008-2009 financial crisis')

Matplotlib has objects like 'Rectangle' and 'Circle' in <code>matplotlib.pyplot</code>, but the full
set is located in <code>matplotlib.patches</code>

To add a shape to a plot create the patch object <code>shp</code> and add it to a subplot by calling

<code>ax.add_patch(shp)</code>

In [80]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

<IPython.core.display.Javascript object>

In [83]:
rect = plt.Rectangle((0.2, 0.75), 0.4, 0.15, color='k', alpha=0.3)
circ = plt.Circle((0.7, 0.2), 0.15, color='b', alpha=0.3)
pgon = plt.Polygon([[0.15, 0.15], [0.35, 0.4], [0.2, 0.6]], color='g', alpha=0.5)

In [84]:
ax.add_patch(rect)
ax.add_patch(circ)
ax.add_patch(pgon)

<matplotlib.patches.Polygon at 0x1fed6cf0670>

## Saving Plots to File

In [86]:
# To save SVG version of a figure:
plt.savefig('svgfile.svg')

In [88]:
# To get a plot as a PNG with minimal whitespace around the plot and at 400 DPI:
plt.savefig('png_file.png', dpi=400, bbox_inches='tight')

In [89]:
# savefig can also write to any file-like object, such as BytesIO:
from io import BytesIO
buffer = BytesIO()

In [90]:
plt.savefig(buffer)

In [91]:
plot_data = buffer.getvalue()

## matplotlib Configuration
All of the default behavior can be customized via an extensive set of global parameters governing figure size, subplot spacing, colors, font sizes, grid styles.

In [107]:
# One way to modify the config from Python is to use the rc method.
# To set the global default figure size to be 10 x 10
plt.rc('figure', figsize=(9, 9))
# the first component of rc is the component you wish to customize: 'figure', 'axes', 'xtick', 'ytick', 
#'grid', 'legend', etc

In [None]:
# Optinons as a dict:
font_options = {'family': 'monospace',
                'weight': 'bold',
                'size': 'small'}
plt.rc('font', **font_options)

# For more extensive customization use .matplotlibrc file in your home directory

## Plotting

Series and DataFrame each have a <code>plot</code> attribute.

In [7]:
import seaborn as sns
df = sns.load_dataset("penguins")
sns.pairplot(df, hue="species")

<IPython.core.display.Javascript object>

<seaborn.axisgrid.PairGrid at 0x22d0c244f10>

In [8]:
fig = plt.figure()
fig.add_subplot(1, 1, 1)

<IPython.core.display.Javascript object>

<AxesSubplot:>

In [9]:
penguins = sns.load_dataset("penguins")
sns.histplot(data=penguins, x="flipper_length_mm", hue="species", multiple="stack")

<AxesSubplot:xlabel='flipper_length_mm', ylabel='Count'>

In [7]:
plt.rc('figure', figsize=(5, 5))

In [8]:
# Line Plots
s = pd.Series(np.random.randn(10).cumsum(), index=np.arange(0, 100, 10))
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
s.plot()

<IPython.core.display.Javascript object>

<AxesSubplot:>

The Series object's index is passed to matplotlib for plotting on the x-axis, though can disable this by passing use_index=False. The x-axis ticks and limits can be adjusted with the xticks and xlim and xlim options, and y-axis respectively with yticks and ylim.

In [10]:
#fig = plt.figure()
#ax = fig.add_subplot(1, 1, 1)
df = pd.DataFrame(np.random.randn(10, 4).cumsum(0),
                  columns=list('ABCD'),
                 index=np.arange(0, 100, 10))

In [11]:
df.plot()

<IPython.core.display.Javascript object>

<AxesSubplot:>

Bar Plots

In [12]:
fig, axes = plt.subplots(2, 1)

<IPython.core.display.Javascript object>

In [17]:
data = pd.Series(np.random.rand(16), index=list('abcdefghijklmnop'))

In [18]:
data.plot.bar(ax=axes[0], color='k', alpha=0.7)

<AxesSubplot:>

In [20]:
data.plot.barh(ax=axes[1], color='k', alpha=0.7)  # alpha=0.7 for partial transparency on the filling

<AxesSubplot:>

In [21]:
df = pd.DataFrame(np.random.rand(6, 4),
                  index=['one', 'two', 'three', 'four', 'five', 'six'],
                  columns=pd.Index(['A', 'B', 'C', 'D'], name='Genus'))
df

Genus,A,B,C,D
one,0.822681,0.087178,0.984791,0.868414
two,0.888804,0.883182,0.362786,0.714053
three,0.97016,0.829574,0.442145,0.770179
four,0.932024,0.895802,0.331157,0.203519
five,0.242905,0.919195,0.642802,0.960023
six,0.154463,0.791935,0.87008,0.638043


In [22]:
df.plot.bar()  # Genius used to title of the legend

<IPython.core.display.Javascript object>

<AxesSubplot:>

In [23]:
# Stacked bar plots from a df by passing 'stacked=True'
df.plot.barh(stacked=True, alpha=0.5)

<IPython.core.display.Javascript object>

<AxesSubplot:>

<p style="color:gold; font-size:20px;">A useful recipe for bar plots is to visualize a Series's value frequency using value_counts: <code>s.value_counts().plot.bar()</code></p>

In [3]:
tips = pd.read_csv('tips.csv')

In [5]:
party_counts = pd.crosstab(tips['day'], tips['size'])
party_counts

size,1,2,3,4,5,6
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Fri,1,16,1,1,0,0
Sat,2,53,18,13,1,0
Sun,0,39,15,18,3,1
Thur,1,48,4,5,1,3


In [6]:
party_counts = party_counts.loc[:, 2:5]
party_counts

size,2,3,4,5
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Fri,16,1,1,0
Sat,53,18,13,1
Sun,39,15,18,3
Thur,48,4,5,1


In [7]:
# Then, normalize so that each row sums to 1 and make the plot
party_pcts = party_counts.div(party_counts.sum(1), axis=0)
party_pcts

size,2,3,4,5
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Fri,0.888889,0.055556,0.055556,0.0
Sat,0.623529,0.211765,0.152941,0.011765
Sun,0.52,0.2,0.24,0.04
Thur,0.827586,0.068966,0.086207,0.017241


In [27]:
party_pcts.plot.bar()

<IPython.core.display.Javascript object>

<AxesSubplot:xlabel='day'>

Party sizes appear to increase on the weekend.

With data that requires aggregation or summarization before making a plot, using the seaborn package can make things much simpler. Let's look now at the tipping percentage by day with seaborn

In [23]:
import seaborn as sns

In [24]:
tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])

In [25]:
tips.head()

Unnamed: 0,total_bill,tip,smoker,day,time,size,tip_pct
0,16.99,1.01,No,Sun,Dinner,2,0.063204
1,10.34,1.66,No,Sun,Dinner,3,0.191244
2,21.01,3.5,No,Sun,Dinner,3,0.199886
3,23.68,3.31,No,Sun,Dinner,2,0.162494
4,24.59,3.61,No,Sun,Dinner,4,0.172069


In [32]:
fig = plt.figure()
ax = fig.add_subplot()

<IPython.core.display.Javascript object>

In [33]:
sns.barplot(ax=ax, data=tips, x='tip_pct', y='day', orient='h')

<AxesSubplot:xlabel='tip_pct', ylabel='day'>

The black lines drawn on the bars represent the 95% confidence interval (this can be configured through optional arguments).

<code>seaborn.barplot</code> has a hue option that enables to split be an additional categorical value.