<hr>
<p style="font-size:30px; color: goldenrod; text-align: center; line-height: 80px;">Plotting and Visualization</p>
<hr>

In [1]:
# for interactive plotting in the Jupyter Notebook
%matplotlib notebook

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
data = np.arange(10)
plt.plot(data)

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x1789b76dbb0>]

## Figures and Subplots

Plots in matplotlib reside within a  Figure  object. You can create a new figure with
plt.figure

In [4]:
fig = plt.figure()  # empty plot window

<IPython.core.display.Javascript object>

In [5]:
ax1 = fig.add_subplot(2, 2, 1)  # add plot to the blank figure above
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)
ax4 = fig.add_subplot(2, 2, 4)

In [6]:
ax1.xaxis.set_label_position?

In [7]:
# plotting command make matplotlib draw on the last figure (ax4)
plt.plot(np.random.randint(-10, 10, size=20))

[<matplotlib.lines.Line2D at 0x1789e978760>]

In [8]:
# to plot on other figure use vars (ax1, ax2, ax3)
ax3.plot(np.random.randn(50).cumsum(), 'k--')
# 'k--' is a style option instructing matplotlib to plot a black dashed line.

[<matplotlib.lines.Line2D at 0x1789e933a00>]

In [9]:
np.random.randn(50).cumsum()

array([ 1.1112925 ,  0.8474188 , -0.24386239,  0.74780157, -0.4711882 ,
       -2.37542822, -1.91705313, -3.92795429, -3.79058512, -3.17702762,
       -3.63444721, -3.60255609, -4.64719989, -5.61828056, -4.11684861,
       -2.96217302, -3.03725586, -2.86294038, -3.28220038, -3.0154239 ,
       -4.95709639, -2.3164352 , -1.85623455, -2.43167182, -3.08882637,
       -2.79311007,  0.37764701,  0.67958542, -1.25663087, -1.27134223,
       -0.34213101, -0.2353187 , -0.03914524, -1.15823226, -0.85943636,
       -1.61442783, -1.56608604, -2.68906718, -1.58836269, -1.08833984,
       -0.30287414,  1.29448578,  0.69697764,  1.34809568,  2.53811262,
        2.94588984,  3.23282018,  4.67448334,  4.59338895,  5.18922447])

The
objects returned by  fig.add_subplot  here are  AxesSubplot  objects, on which you
can directly plot on the other empty subplots by calling each one’s instance method

In [10]:
_ = ax1.hist(np.random.randn(100), bins=20, color='k', alpha=0.3)

In [21]:
plt.scatter?

In [11]:
ax2.scatter(np.arange(30), np.arange(30) + 3*np.random.randn(30))

<matplotlib.collections.PathCollection at 0x1789e917970>

In [12]:
fig = plt.figure()
ax = fig.add_subplot(projection='3d')

<IPython.core.display.Javascript object>

In [13]:
ax.scatter(np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10))
ax.scatter(np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10))
ax.scatter(np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10))
ax.scatter(np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10))

<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x1789ea65280>

Catalog of plot types: <a href="http://matplotlib.sourceforge.net/">matplotlib docs</a>

Creating  a  figure  with  a  grid  of  subplots  is  a  very  common  task,  so  matplotlib
includes a convenience method,  <code>plt.subplots</code> , that creates a new figure and returns
a NumPy array containing the created subplot objects:

In [14]:
fig, axes = plt.subplots(2, 3)

<IPython.core.display.Javascript object>

In [15]:
print(axes,
      axes[0, 1], sep='\n\n')

[[<AxesSubplot:> <AxesSubplot:> <AxesSubplot:>]
 [<AxesSubplot:> <AxesSubplot:> <AxesSubplot:>]]

AxesSubplot(0.398529,0.53;0.227941x0.35)


This is very useful, as the  axes  array can be easily indexed like a two-dimensional
array; for example,  axes[0, 1] . You can also indicate that subplots should have the
same  x-  or  y-axis  using  sharex   and  sharey ,  respectively.  This  is  especially  useful
when you’re comparing data on the same scale; otherwise, matplotlib autoscales plot
limits independently.

Indicate that subplots should have the
same  x-  or  y-axis  using  sharex   and  sharey ,  respectively.

In [16]:
axes[0, 1].sharex

<bound method _AxesBase.sharex of <AxesSubplot:>>

In [17]:
axes[0, 1].sharey

<bound method _AxesBase.sharey of <AxesSubplot:>>

Adjusting the spacing around subplots:
<pre>
<code>subplots_adjust(left=None, botton=None, right=None, top=None,
                      wspace=None, hspace=None)</code></pre>

wspace  and  hspace  controls the percent of the figure width and figure height, respec‐
tively, to use as spacing between subplots. Here is a small example where I shrink the
spacing all the way to zero

In [18]:
fig, axes = plt.subplots(2, 2, sharex=True, sharey=True)
for i in range(2):
    for j in range(2):
        axes[i, j].hist(np.random.randn(500), bins=50, color='k', alpha=0.5)
plt.subplots_adjust(wspace=0, hspace=0)

<IPython.core.display.Javascript object>

## Colors, Markers, and Line Styles

In [19]:
# plot x versus y with green dashes:
fig = plt.figure()
ax = fig.add_subplot()
ax.plot(np.random.randint(-5, 5, size=10), np.random.randint(-5, 5, size=10), 'g--')  # 'g--' or 'g' -green

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x178a01de760>]

In [20]:
fig = plt.figure()
ax = fig.add_subplot()
ax.plot(np.random.randn(50).cumsum(), linestyle='--', color='g')  # 'g' - green

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x1789ffbeb80>]

In [21]:
fig = plt.figure()
ax = fig.add_subplot()
ax.plot(*[np.random.randn(50).cumsum() for _ in range(2)], linestyle='--', color='r')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x1789ffe8dc0>]

In [6]:
plt.plot?

In [22]:
# plot with markers
fig = plt.figure()
ax = fig.add_subplot()
ax.plot(np.random.randn(30).cumsum(), 'ko--')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x178a004e3d0>]

In [23]:
# plot with markers (explicitly with keyargs)
fig = plt.figure()
ax = fig.add_subplot()
ax.plot(np.random.randn(30).cumsum(), linestyle='dashed', color='k', marker='o')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x178a00aa640>]

For line plots subsequent points are linearly interpolated by default. This can be altered with the
<code>drawstyle</code> option.

In [24]:
fig = plt.figure()
ax = fig.add_subplot()
data = np.random.randn(30).cumsum()
ax.plot(data, 'k--', label='Default')
ax.plot(data, 'k-', drawstyle='steps-post', label='steps-post')
ax.legend(loc='best')

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x178a01166d0>

## Setting the title, axis labels, ticks and ticklabels

Call <code>plt.legend</code> or <code>ax.legend</code> for reference to the axes.
For set ticks use: <pre>
<code>ax.set_xticks; ax.set_xticklabels</code></pre>

In [25]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(np.random.randn(1000).cumsum())

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x178a019c9a0>]

In [26]:
# To change the x-axis ticks use set_xticks and set_xticklabels
ticks = ax.set_xticks([0, 250, 500, 750, 1000])

In [27]:
# With set_xticklabels can set any other values as the labels
labels = ax.set_xticklabels(['one', 'two', 'three', 'four', 'five'], rotation=30, fontsize='small')

In [28]:
# Give a name to the x-axis
ax.set_xlabel('Stages')

Text(0.5, 22.76993803260672, 'Stages')

In [29]:
# Give subplot title
ax.set_title('My matplotlib.pyplot plot')

Text(0.5, 1.0, 'My matplotlib.pyplot plot')

In [30]:
ax.set_ylabel('Degree')

Text(42.04166666666665, 0.5, 'Degree')

In [31]:
axis_names = {
    'title': 'New plot',
    'xlabel': 'Stages',
    'ylabel': 'Degree'
}
ax.set(**axis_names)

[Text(0.5, 1.0, 'New plot'),
 Text(0.5, 22.76993803260672, 'Stages'),
 Text(42.04166666666665, 0.5, 'Degree')]

### Adding legends

In [32]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(np.random.randn(1000).cumsum(), 'k', label='one')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x178a04a9160>]

In [33]:
ax.plot(np.random.randn(1000).cumsum(), 'k--', label='two')
ax.plot(np.random.randn(1000).cumsum(), 'k.', label='three')

[<matplotlib.lines.Line2D at 0x178a042b580>]

In [34]:
# To automatically create a legend call ax.legend()
ax.legend(loc='best')

<matplotlib.legend.Legend at 0x178a06455b0>

In [35]:
df = pd.DataFrame({
    'name':['Ivan','Maria','Aleksander','Maksim','Artem','Anna','Mark'],
    'age':[23,78,22,19,45,33,20],
    'gender':['M','F','M','M','M','F','M'],
    'state':['Moscow','Tula','Astrakhan','Tula','Moscow','Rostov','Rostov'],
    'num_children':[2,0,0,3,2,1,4],
    'num_pets':[5,1,0,5,2,2,3]
})

In [36]:
df

Unnamed: 0,name,age,gender,state,num_children,num_pets
0,Ivan,23,M,Moscow,2,5
1,Maria,78,F,Tula,0,1
2,Aleksander,22,M,Astrakhan,0,0
3,Maksim,19,M,Tula,3,5
4,Artem,45,M,Moscow,2,2
5,Anna,33,F,Rostov,1,2
6,Mark,20,M,Rostov,4,3


In [37]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

<IPython.core.display.Javascript object>

In [38]:
df.groupby('state')['name'].nunique().plot(ax=ax, kind='bar')

<AxesSubplot:xlabel='state'>

### Annotation and Drawing on a Subplot
Add annotations and text using the <code>text</code>, <code>arrow</code>, and <code>annotate</code> functions

In [39]:
# 'text' draws text at given coordinates (x, y) on the plot with optional custom styling:
ax.text(0, -30, 'Start point', family='monospace', fontsize=10)

Text(0, -30, 'Start point')

Annotations can draw both text and arrows arranged appropriately. As an example, let's plot the closing MOEX index price since 2006 december and annotate it with some of the important dates from the 2008-2009 financial crisis.

In [40]:
pd.to_datetime?

In [41]:
df = pd.read_csv("IMOEX.txt")
df.columns = ['ticker', 'period', 'date', 'time', 'open', 'high', 'low', 'close', 'vol']

In [42]:
df['date'] = df['date'].apply(str)
df['date']

0       20061213
1       20061214
2       20061215
3       20061218
4       20061219
          ...   
3683    20210823
3684    20210824
3685    20210825
3686    20210826
3687    20210827
Name: date, Length: 3688, dtype: object

In [43]:
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')

In [44]:
df['date']

0      2006-12-13
1      2006-12-14
2      2006-12-15
3      2006-12-18
4      2006-12-19
          ...    
3683   2021-08-23
3684   2021-08-24
3685   2021-08-25
3686   2021-08-26
3687   2021-08-27
Name: date, Length: 3688, dtype: datetime64[ns]

In [45]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

<IPython.core.display.Javascript object>

In [46]:
df.plot(ax=ax, x='date', y='close', style='k-', label='MOEX index')

<AxesSubplot:xlabel='date'>

In [47]:
import datetime as dt

In [48]:
crisis_periods = [
    (dt.datetime(2007, 12, 10), 'Peak of bull market'),
    (dt.datetime(2008, 3, 12), 'Bear Stearns Fails'),
    (dt.datetime(2008, 9, 15), 'Lehman Brothers Bankruptcy')
]

In [49]:
for date, label in crisis_periods:
    ax.annotate(label, xy=(date, df.loc[df.date==date, 'close'] + 75),
                xytext=(date, df.loc[df.date==date, 'close'] + 310),
                arrowprops=dict(facecolor='red', headwidth=8, width=2, headlength=6),
                horizontalalignment='left', verticalalignment='top')

In [50]:
# Zoom in on 2007-2010
ax.set_xlim(['1/1/2007', '1/1/2011'])
ax.set_ylim([500, 2500])

(500.0, 2500.0)

In [51]:
ax.set_title('Important dates in the 2008-2009 financial crisis')

Text(0.5, 1.0, 'Important dates in the 2008-2009 financial crisis')

Matplotlib has objects like 'Rectangle' and 'Circle' in <code>matplotlib.pyplot</code>, but the full
set is located in <code>matplotlib.patches</code>

To add a shape to a plot create the patch object <code>shp</code> and add it to a subplot by calling

<code>ax.add_patch(shp)</code>

In [52]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

<IPython.core.display.Javascript object>

In [53]:
rect = plt.Rectangle((0.2, 0.75), 0.4, 0.15, color='k', alpha=0.3)
circ = plt.Circle((0.7, 0.2), 0.15, color='b', alpha=0.3)
pgon = plt.Polygon([[0.15, 0.15], [0.35, 0.4], [0.2, 0.6]], color='g', alpha=0.5)

In [54]:
ax.add_patch(rect)
ax.add_patch(circ)
ax.add_patch(pgon)

<matplotlib.patches.Polygon at 0x178a1a38fa0>

## Saving Plots to File

In [86]:
# To save SVG version of a figure:
plt.savefig('svgfile.svg')

In [88]:
# To get a plot as a PNG with minimal whitespace around the plot and at 400 DPI:
plt.savefig('png_file.png', dpi=400, bbox_inches='tight')

In [89]:
# savefig can also write to any file-like object, such as BytesIO:
from io import BytesIO
buffer = BytesIO()

In [90]:
plt.savefig(buffer)

In [91]:
plot_data = buffer.getvalue()

## matplotlib Configuration
All of the default behavior can be customized via an extensive set of global parameters governing figure size, subplot spacing, colors, font sizes, grid styles.

In [57]:
# One way to modify the config from Python is to use the rc method.
# To set the global default figure size to be 10 x 10
plt.rc('figure', figsize=(9, 9))
# the first component of rc is the component you wish to customize: 'figure', 'axes', 'xtick', 'ytick', 
#'grid', 'legend', etc

In [None]:
# Optinons as a dict:
font_options = {'family': 'monospace',
                'weight': 'bold',
                'size': 'small'}
plt.rc('font', **font_options)

# For more extensive customization use .matplotlibrc file in your home directory

## Plotting

Series and DataFrame each have a <code>plot</code> attribute.

In [58]:
import seaborn as sns
df = sns.load_dataset("penguins")
sns.pairplot(df, hue="species")

<IPython.core.display.Javascript object>

<seaborn.axisgrid.PairGrid at 0x178a1a2f700>

In [59]:
fig = plt.figure()
fig.add_subplot(1, 1, 1)

<IPython.core.display.Javascript object>

<AxesSubplot:>

In [60]:
penguins = sns.load_dataset("penguins")
sns.histplot(data=penguins, x="flipper_length_mm", hue="species", multiple="stack")

<AxesSubplot:xlabel='flipper_length_mm', ylabel='Count'>

In [61]:
plt.rc('figure', figsize=(5, 5))

In [62]:
# Line Plots
s = pd.Series(np.random.randn(10).cumsum(), index=np.arange(0, 100, 10))
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
s.plot()

<IPython.core.display.Javascript object>

<AxesSubplot:>

The Series object's index is passed to matplotlib for plotting on the x-axis, though can disable this by passing use_index=False. The x-axis ticks and limits can be adjusted with the xticks and xlim and xlim options, and y-axis respectively with yticks and ylim.

In [63]:
#fig = plt.figure()
#ax = fig.add_subplot(1, 1, 1)
df = pd.DataFrame(np.random.randn(10, 4).cumsum(0),
                  columns=list('ABCD'),
                 index=np.arange(0, 100, 10))

In [64]:
df.plot()

<IPython.core.display.Javascript object>

<AxesSubplot:>

Bar Plots

In [65]:
fig, axes = plt.subplots(2, 1)

  fig, axes = plt.subplots(2, 1)


<IPython.core.display.Javascript object>

In [68]:
data = pd.Series(np.random.rand(16), index=list('abcdefghijklmnop'))

In [69]:
data.plot.bar(ax=axes[0], color='k', alpha=0.7)

<AxesSubplot:>

In [70]:
data.plot.barh(ax=axes[1], color='k', alpha=0.7)  # alpha=0.7 for partial transparency on the filling

<AxesSubplot:>

In [71]:
df = pd.DataFrame(np.random.rand(6, 4),
                  index=['one', 'two', 'three', 'four', 'five', 'six'],
                  columns=pd.Index(['A', 'B', 'C', 'D'], name='Genus'))
df

Genus,A,B,C,D
one,0.773354,0.341358,0.611189,0.274812
two,0.814891,0.251982,0.086915,0.922788
three,0.56739,0.406743,0.341635,0.026676
four,0.134702,0.252149,0.733083,0.303629
five,0.177356,0.207656,0.00324,0.994029
six,0.857078,0.405889,0.565463,0.861451


In [72]:
df.plot.bar()  # Genius used to title of the legend

<IPython.core.display.Javascript object>

<AxesSubplot:>

In [73]:
# Stacked bar plots from a df by passing 'stacked=True'
df.plot.barh(stacked=True, alpha=0.5)

<IPython.core.display.Javascript object>

<AxesSubplot:>

<p style="color:gold; font-size:20px;">A useful recipe for bar plots is to visualize a Series's value frequency using value_counts: <code>s.value_counts().plot.bar()</code></p>

In [76]:
tips = pd.read_csv('tips.csv')

In [5]:
party_counts = pd.crosstab(tips['day'], tips['size'])
party_counts

size,1,2,3,4,5,6
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Fri,1,16,1,1,0,0
Sat,2,53,18,13,1,0
Sun,0,39,15,18,3,1
Thur,1,48,4,5,1,3


In [6]:
party_counts = party_counts.loc[:, 2:5]
party_counts

size,2,3,4,5
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Fri,16,1,1,0
Sat,53,18,13,1
Sun,39,15,18,3
Thur,48,4,5,1


In [45]:
party_counts.div?

In [7]:
# Then, normalize so that each row sums to 1 and make the plot
party_pcts = party_counts.div(party_counts.sum(1), axis=0)
party_pcts

size,2,3,4,5
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Fri,0.888889,0.055556,0.055556,0.0
Sat,0.623529,0.211765,0.152941,0.011765
Sun,0.52,0.2,0.24,0.04
Thur,0.827586,0.068966,0.086207,0.017241


In [8]:
party_pcts.plot.bar()

<IPython.core.display.Javascript object>

<AxesSubplot:xlabel='day'>

Party sizes appear to increase on the weekend.

With data that requires aggregation or summarization before making a plot, using the seaborn package can make things much simpler. Let's look now at the tipping percentage by day with seaborn

In [11]:
tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])

In [12]:
tips.head()

Unnamed: 0,total_bill,tip,smoker,day,time,size,tip_pct
0,16.99,1.01,No,Sun,Dinner,2,0.063204
1,10.34,1.66,No,Sun,Dinner,3,0.191244
2,21.01,3.5,No,Sun,Dinner,3,0.199886
3,23.68,3.31,No,Sun,Dinner,2,0.162494
4,24.59,3.61,No,Sun,Dinner,4,0.172069


In [16]:
fig = plt.figure()
ax = fig.add_subplot()

<IPython.core.display.Javascript object>

In [17]:
sns.barplot(ax=ax, data=tips, x='tip_pct', y='day', orient='h')

<AxesSubplot:xlabel='tip_pct', ylabel='day'>

The black lines drawn on the bars represent the 95% confidence interval (this can be configured through optional arguments).

<code>seaborn.barplot</code> has a hue option that enables to split be an additional categorical value.

In [30]:
fig = plt.figure()
ax = fig.add_subplot()

<IPython.core.display.Javascript object>

In [31]:
sns.barplot(x='tip_pct', y='day', hue='time', data=tips, orient='h')

<AxesSubplot:xlabel='tip_pct', ylabel='day'>

In [32]:
# Switch between different plot appearances using seaborn.set
sns.set(style='whitegrid')

## Histograms and Density Plots

A histogram is a kind of bar plot that gives a discretized display of value frequency. The data points are split into discrete, evenly spaced bins, and the number of data points in each bin is plotted. Using the tipping data from before, we can make a histogram of tip percentages of the total bill using the <code>plot.hist</code> method on the Series

In [33]:
fig = plt.figure()
ax = fig.add_subplot()

<IPython.core.display.Javascript object>

In [34]:
tips['tip_pct'].plot.hist(bins=50)

<AxesSubplot:ylabel='Frequency'>

A related plot type is a density plot, which is formed by computing an estimate of a continuous probability distribution that might have generated the observed data. The usual procedure is to approximate this distribution as a mixture of 'kernels' - that is, simpler distributions like the normal distribution. Thus, density plots are also known as kernel density estimate (KDE) plots. Using <code>plot.kde</code> makes a density plot using the conventional mixture-of-normals estimate.

In [47]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

<IPython.core.display.Javascript object>

In [48]:
tips['tip_pct'].plot.density()

<AxesSubplot:ylabel='Density'>

Seaborn makes histogram and density plots even easier through its <code>distplot</code> method, which can plot both a histogram and a continuous density estimate simultaneously. As an example, consider a bimodal destribution consisting of draws from two different standard normal distributions

In [61]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

<IPython.core.display.Javascript object>

In [52]:
comp1 = np.random.normal(0, 1, size=200)
comp1[:10]

array([ 1.3653687 , -1.10570737, -0.85912825,  0.03266656, -0.28478459,
       -0.89554182,  0.59468262, -0.42367237,  0.96349653, -0.17532246])

In [53]:
comp2 = np.random.normal(10, 2, size=200)
comp2[:10]

array([ 7.37712576, 10.24455401, 12.15034133,  9.52559779, 11.36794135,
        8.93759235,  9.72495671, 13.22811238,  8.49889536, 11.98963929])

In [54]:
values = pd.Series(np.concatenate([comp1, comp2]))

In [63]:
sns.distplot(values, ax=ax, bins=100, color='k')



<AxesSubplot:ylabel='Density'>

In [60]:
sns.displot(data=values, bins=100, color='k', kde=True)

<IPython.core.display.Javascript object>

<seaborn.axisgrid.FacetGrid at 0x2d4e8d32be0>

## Scatter or Point Plots

Point plots or scatter plots can be a useful way of examining the relatinship between two 
one-dimensional data series. For example, here we load the macrodata dataset from statsmodels project.

In [64]:
macro = pd.read_csv("macrodata.csv")

In [65]:
macro

Unnamed: 0,year,quarter,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint
0,1959.0,1.0,2710.349,1707.4,286.898,470.045,1886.9,28.980,139.7,2.82,5.8,177.146,0.00,0.00
1,1959.0,2.0,2778.801,1733.7,310.859,481.301,1919.7,29.150,141.7,3.08,5.1,177.830,2.34,0.74
2,1959.0,3.0,2775.488,1751.8,289.226,491.260,1916.4,29.350,140.5,3.82,5.3,178.657,2.74,1.09
3,1959.0,4.0,2785.204,1753.7,299.356,484.052,1931.3,29.370,140.0,4.33,5.6,179.386,0.27,4.06
4,1960.0,1.0,2847.699,1770.5,331.722,462.199,1955.5,29.540,139.6,3.50,5.2,180.007,2.31,1.19
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
198,2008.0,3.0,13324.600,9267.7,1990.693,991.551,9838.3,216.889,1474.7,1.17,6.0,305.270,-3.16,4.33
199,2008.0,4.0,13141.920,9195.3,1857.661,1007.273,9920.4,212.174,1576.5,0.12,6.9,305.952,-8.79,8.91
200,2009.0,1.0,12925.410,9209.2,1558.494,996.287,9926.4,212.671,1592.8,0.22,8.1,306.547,0.94,-0.71
201,2009.0,2.0,12901.504,9189.0,1456.678,1023.528,10077.5,214.469,1653.6,0.18,9.2,307.226,3.37,-3.19


In [66]:
macro.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 203 entries, 0 to 202
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   year      203 non-null    float64
 1   quarter   203 non-null    float64
 2   realgdp   203 non-null    float64
 3   realcons  203 non-null    float64
 4   realinv   203 non-null    float64
 5   realgovt  203 non-null    float64
 6   realdpi   203 non-null    float64
 7   cpi       203 non-null    float64
 8   m1        203 non-null    float64
 9   tbilrate  203 non-null    float64
 10  unemp     203 non-null    float64
 11  pop       203 non-null    float64
 12  infl      203 non-null    float64
 13  realint   203 non-null    float64
dtypes: float64(14)
memory usage: 22.3 KB


In [68]:
data = macro[['cpi', 'm1', 'tbilrate', 'unemp']]
data

Unnamed: 0,cpi,m1,tbilrate,unemp
0,28.980,139.7,2.82,5.8
1,29.150,141.7,3.08,5.1
2,29.350,140.5,3.82,5.3
3,29.370,140.0,4.33,5.6
4,29.540,139.6,3.50,5.2
...,...,...,...,...
198,216.889,1474.7,1.17,6.0
199,212.174,1576.5,0.12,6.9
200,212.671,1592.8,0.22,8.1
201,214.469,1653.6,0.18,9.2


In [69]:
trans_data = np.log(data).diff().dropna()

In [70]:
trans_data[-5:]

Unnamed: 0,cpi,m1,tbilrate,unemp
198,-0.007904,0.045361,-0.396881,0.105361
199,-0.021979,0.066753,-2.277267,0.139762
200,0.00234,0.010286,0.606136,0.160343
201,0.008419,0.037461,-0.200671,0.127339
202,0.008894,0.012202,-0.405465,0.04256


In [75]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

<IPython.core.display.Javascript object>

In [77]:
# Seaborn's 'regplot' method makes a scatter plot and fits a linear regression line
sns.regplot(x='m1', y='unemp', data=trans_data)

<AxesSubplot:xlabel='m1', ylabel='unemp'>

In exploratory data analysis it's helpful to be able to look at all the scatter plots among a group of variables; this is known as a <em>pairs</em> plot or <em>scatter plot matrix</em>. Making such a plot from scratch is a bit of work, so seaborn has a convenient <em>pairplot</em> function, which supports placing histograms or density estimates of each variable along the diagonal.

In [79]:
# plot_kws argument enables to pass down configuration options to the individual plotting calls
# on the off-diagonal elements
sns.pairplot(trans_data, diag_kind='kde', plot_kws={'alpha': 0.2})

<IPython.core.display.Javascript object>

<seaborn.axisgrid.PairGrid at 0x2d4ea0b5670>

## Facet Grids and Categorical Data

One way to visualize data with many categorical variables is to use a <em>facet grid</em>. Seaborn has a useful built-in function <code>factorplot</code> that simplifies making many kinds of faceted plots. On the latest versioins of seaborn <code>factorplot</code> renamed to <code>catplot</code>

In [82]:
sns.catplot(x='day', y='tip_pct', hue='time', col='smoker', kind='bar', data=tips[tips.tip_pct < 1])

<IPython.core.display.Javascript object>

<seaborn.axisgrid.FacetGrid at 0x2d4eb5fbeb0>

Instead of grouping by 'time' by different bar colors within a facet, we can also expand the facet grid by adding one row per time value

In [83]:
sns.catplot(x='day', y='tip_pct', row='time', col='smoker', kind='bar', data=tips[tips.tip_pct < 1])

<IPython.core.display.Javascript object>

<seaborn.axisgrid.FacetGrid at 0x2d4eb72e940>

<code>catplot</code> supports other plot types that may be useful depending on what you are trying to display. For example, box plots (which show the median, quartiles, and outliers)

In [84]:
sns.catplot(x='tip_pct', y='day', kind='box', data=tips[tips.tip_pct < 0.5])

<IPython.core.display.Javascript object>

<seaborn.axisgrid.FacetGrid at 0x2d4ebf453a0>

There is possible to create customized facet grid plots using the more general <code>seaborn.FacetGrid</code> class. 
<p><a href="https://seaborn.pydata.org/">seaborn docs</a></p>

With tools like <a href="http://bokeh.pydata.org/">Bokeh</a> and <a href="https://github.com/plotly/plotly.py"> Plotly</a>, it's now possible to specify dynamic, interactive graphics in Python that are destined for a web browser.