# Intro to Matplotlib


####  Outline

We will get into the shining part for every data report: **data visualization**, as the saying goes, "a picture is worth a thousand words". Can you imagine an presentation or report regarding data analysis without graphs? Probably not often.

We start with basic concepts and techinical components in graphs. Then we talk about the basic graph options, e.g., title, labels (and legends) - **the indispensable components of a chart**.


In [48]:
import matplotlib.pyplot as plt # import the library for plotting
import pandas as pd # import the library of data management
import seaborn as sns # import the library of beautiful plots

# set the figures display in the talk context
sns.set_context("talk") 
sns.set_style("darkgrid")

In [49]:
# set figure parameters
params = {
   'figure.figsize': [12, 8],
   'xtick.labelsize': 16,
   'ytick.labelsize': 16
   }
plt.rcParams.update(params)

## Matplotlib

Python's leading graphics package is Matplotlib, which is designed for creating publication-quality plots.

Matplotlib can be used in a number of different ways:

* Approach #1: Create figure objects and apply them to the data in the dataframe.
* Approach #2: Create figure objects and apply plot methods of the dataframes.

They call on similar functionality, but use different syntax to get it.

We are going to plot two factors `SMB` and `HML` using the approach we introduced before: to generate an object -- two objects, in fact -- and apply methods to them to produce the various elements of a graph: the data, their axes, their labels, and so on.


We do this -- as usual -- one step at a time.

**Create objects.** The **`fig`** represent the canvas object while **`ax`** is as the object for charging the details, e.g., type of graphs, colors and axis, etc, in each graph on the canvas.


We'll see this ONE line over and over:


In [50]:
fig, ax = plt.subplots()         # create fig and ax objects

Note that we're using the pyplot function `subplots()`, which creates the objects `fig` and `ax` on the left.   The `subplot()` function produces a blank figure, which is displayed in the Jupyter notebook.  The names `fig` and `ax` can be anything, but these choices are standard.

We say `fig` is a **figure object** and `ax` is an **axis object**.  (Try `type(fig)` and `type(ax)` to see why.)  Once more, the words don't mean what we might think they mean:

* `fig` is a blank canvas for creating a figure.
* `ax` is everything in it:  axes, labels, lines or bars, legend, and so on.

Once we have the objects, we apply methods to them to create graphs.

#### Approach 1: Create figure objects and apply them to the data in the dataframe

**Create graphs.**  We create graphs by applying plot-like methods to `ax`.  We typically do this with dataframe plot methods:


We can use `ax.plot` to plot the figure. But here since the index is date and what we are going to plot is to plot the factor along the date, we then should be careful of the type of the index.

In [51]:
ff_factors_ann = pd.read_csv('factor2024.csv')
ff_factors_ann['Date'] = ff_factors_ann['Date'].astype(str)
ff_factors_ann['period'] = pd.PeriodIndex(ff_factors_ann['Date'],freq='A')
ff_factors_ann['Date'] = ff_factors_ann['period']
ff_factors_ann.set_index(['Date'],inplace=True)
ff_factors_ann.drop(columns=['period'],inplace=True)

In [53]:
ff_factors_ann.tail()

In [54]:
type(ff_factors_ann.index)

In [55]:
ff_factors_ann.index.year

In [57]:
# set figure parameters

fig, ax = plt.subplots()        # create axis object axe
ax.plot(ff_factors_ann.index.year,ff_factors_ann['SMB'])
ax.plot(ff_factors_ann.index.year,ff_factors_ann['HML'])

#### Approach #2: Create figure objects and apply plot methods to dataframes.

In this case, it is actually more convenient to use dataframe's method `plot()`. Specify the y-values as a keyword argument. The x-values default to the index of the dataframe. 

In [59]:
ff_factors_ann[['SMB','HML']]

In [60]:
fig, ax = plt.subplots()        # create axis object axe
ff_factors_ann[['SMB','HML']].plot(ax=ax)

In [61]:
fig, axe = plt.subplots()        # create axis object axe
ff_factors_ann[['SMB','HML']].plot(ax=axe,
                                   kind='line',
                                   color=[sns.xkcd_rgb["bluey green"],sns.xkcd_rgb["pale red"]])

In [62]:
fig, axe = plt.subplots()        # create axis object axe
ff_factors_ann[['SMB','HML']].plot(ax=axe,
                                   kind='line',
                                   color=[sns.xkcd_rgb["bluey green"],sns.xkcd_rgb["pale red"]],
                                   title='SMB factor v.s. HML factor')


---
## Graph Basic Options

### Basic Functionalities

A copule of basic elements that we **ALWAYS NEED**: (i) a title; (ii) well labeled x and y axis; (iii) sometimes legends. Note that the well labeled graphs should mean a couple of things, what it is and what units they are in. Now the code below does this using the methods associated with our axes object `ax.` 


Here we will use the line plot as examples. The options we learned here `in general` applies to other graphs as well.

So let's figure out the following questions now:

* How to set axis labels?
* How to set titles?
* How to set the legends?

In [64]:
fig, ax = plt.subplots()
ff_factors_ann[['SMB','HML']].plot(ax=ax,
                                   color=[sns.xkcd_rgb["bluey green"],sns.xkcd_rgb["pale red"]])


##################################################################################

ax.set_title('SMB factor v.s. HML factor', loc='center', fontsize=14, fontweight = "bold") 
# We know this, but note the new option, it specifies the location....

ax.set_xlabel("Year")
ax.set_ylabel("Factor") 
ax.legend(['SMB factor','HML factor'])

### More graph options

Now let's use the same dataset to learn more graph options.

**Getting ready for the data...**

In [66]:
fig, ax = plt.subplots() # Same deal here...

ax.scatter(ff_factors_ann['SMB'], ff_factors_ann["HML"],     # x,y variables 
            alpha= 0.5) # Then this last command specifies how dark or light the bubbles are...

##### Making it look more formative... Please pay attention to the additional codes


In [68]:
fig, ax = plt.subplots() # Same deal here...

ax.scatter(ff_factors_ann['SMB'], ff_factors_ann["HML"],     # x,y variables 
            alpha= 0.5) # Then this last command specifies how dark or light the bubbles are...

##################################################################################

ax.set_title('SMB vs. HML ', loc='center', fontsize=14, fontweight = "bold") 
# We know this, but note the new option, it specifies the location....

ax.set_xlabel("SMB")
ax.set_ylabel("HML")

# Here the legend might not be important, but thinking about what if you 
# have two countries
ax.legend(["Fama French Data"]) # The legend



#### Setting limits and saving graph

Now we are ready for setting xlim, ylim, and other handy graph details...

In [70]:
fig, ax = plt.subplots() # Same deal here...

ax.scatter(ff_factors_ann['SMB'], ff_factors_ann["HML"],     # x,y variables 
           alpha= 0.50, 
           s=50) # Then this last command specifies how large the bubbles are...
                # Here we make it smaller

ax.set_title('SMB vs. HML', loc='center', fontsize=14, fontweight = "bold") 
# We know this, but note the new option, it specifies the location....

ax.set_xlabel("SMB")
ax.set_ylabel("HML")

# Here the legend might not be important, but thinking about what if you 
# have two countries
ax.legend(["Fama French Data"],loc='upper left',frameon=True) # The legend, with a box...

##################################################################################
# This is the new stuff...

ax.set_ylim(-60,60) # This sets the y-limits
ax.set_xlim(-60,60) # This sets the x-limits

###### Saving the results to the folder where the notebook it is.
fig.savefig("ff_smb_hml.png")

---
### Time to practice

1. Can you create a new code cell and do the following...

    1) Plot three factors: market, size, and value in a line plot and add x and y labels, legends, figure title.
    2) Do a scatter plot between market factor and size factor, adding x and y labels, legends, figure title etc.

### Advanced Functionalities

So far, we have learned the basic options to make your graph look nicer and closer to publication quality level. Here we will use several other stuffs as well.

* Text
* Annotation
* Subplots/Multiple plots

Let's start with Text!


#### Text

It is often useful to annotate certain features of the plot to draw the reader's attention. This can be done manually with the `ax.text` command, which will place text at a particular x/y value.

*US GDP*. Let's first look at several years of US GDP and Consumption. 

In [73]:
gdp  = [13271.1, 13773.5, 14234.2, 14613.8, 14873.7, 14830.4, 14418.7,
        14783.8, 15020.6, 15369.2, 15710.3]
cons  = [8867.6, 9208.2, 9531.8, 9821.7, 10041.6, 10007.2, 9847.0, 10036.3,
        10263.5, 10449.7, 10699.7]
year = list(range(2003,2014))        # use range for years 2003-2013

# Note that we set the index
us_df = pd.DataFrame({'gdp': gdp, 'cons': cons}, index=year)
print(us_df)

In [74]:
fig, ax = plt.subplots()        # create axis object axe
us_df.plot(ax=ax,legend=False,color=[sns.xkcd_rgb["bluey green"],sns.xkcd_rgb["pale red"]])            # ax= looks for axis object, axe is it

ax.text(2009,15000,'GDP',color=sns.xkcd_rgb["bluey green"],fontsize=16)
ax.text(2008,10500,'Consumption',color=sns.xkcd_rgb["pale red"],fontsize=16)

The `ax.text` method takes an x position, a y position, a string, and then optional keywords specifying the color, size, style, alignment, and other properties of the text.

#### Annotation

Sometimes, in addition to adding text, we also want to annotate certain datapoint with simple arrow. This can be done by `ax.annotate` command.

In [76]:
ind_port_df = pd.read_csv("industry_portfolio.csv")

In [78]:
ind_port_df

In [35]:
ind_port_df['Date'] = ind_port_df['Date'].astype(str)
ind_port_df['period'] = pd.PeriodIndex(ind_port_df['Date'],freq='M')
ind_port_df['Date'] = ind_port_df['period']
ind_port_df.set_index(['Date'],inplace=True)
ind_port_df.drop(columns=['period'],inplace=True)

In [79]:
ind_port_df

In [81]:
fig, ax = plt.subplots() # Same deal here...

ax.scatter(ind_port_df["Mkt"], ind_port_df["Cnsmr"],     # x,y variables 
           alpha= 0.50,
           s=50) # Then this last command specifies how large the bubbles are...
                # Here we make it smaller

ax.set_title('Market vs. Consumer Industry', loc='center', fontsize=14, fontweight = "bold") 
# We know this, but note the new option, it specifies the location....

ax.set_xlabel("Monthly market return")
ax.set_ylabel("Monthly consumer industry return")

# Here the legend might not be important, but thinking about what if you 
# have two countries
ax.legend(["Fama French Data"],loc='upper left',frameon=True) # The legend, with a box...

##################################################################################
# This is the new stuff...

ax.set_ylim(-15,15) # This sets the y-limits
ax.set_xlim(-15,15) # This sets the x-limits

############################################################################
# This is something new

ax.annotate(
    "What Year Month is This?", 
    xy=(-10, -9), # This is where we point at...
    xycoords="data", 
    xytext=(0, -12), # This is about where the text is
    horizontalalignment="left", # How the text is alined
    arrowprops={
        "arrowstyle": "-|>", # This is stuff about the arrow
        "connectionstyle": "angle3,angleA=5,angleB=110",
        "color": "black"
    },
    fontsize=12,
);


#### Subplots

This is almost the first time specify something in `plt.subplots` function. The parameters are pretty self-explanatory and we have add comments in code as well if you are not familiar with this. 


In [82]:
ff_factors_ann.head()

In [83]:
fig, ax = plt.subplots(nrows = 2, ncols = 1) 

# Same deal as before, but here in the subplots we specified how many. This is given by
# the number of rows and columns for which the plots will be...
# the sharex command tells it to share the same x-axis, 

# IMPORTANT... now ax is multi-dimensional, so there are two of these, thus when we call 
# ax[0] this specifies use that axes or modify that one....

ax[0].scatter(ff_factors_ann['SMB'], ff_factors_ann["HML"],     # x,y variables 
           alpha= 0.50,
           s=50) # Then this last command specifies how large the bubbles are...
                # Here we make it smaller
    
ax[1].scatter(ff_factors_ann['RF'], ff_factors_ann["Mkt-RF"],     # x,y variables 
           alpha= 0.50,
           s=50,
           color='green') 

#######################################################################################

# Now let's make it nice looking...add a Title for everything...

fig.suptitle("Fama French 3 Factors", fontsize = 14, fontweight = "bold",x=0.525,y=0.98)
# # This is new: Add a centered title to the figure. x: the x location of the title in figure coordinates (default=0.5)
# # y: the y location of the title in figure coordinates (default=0.98).

ax[0].set_title('SMB vs. HML', loc='center',fontsize=12) 
ax[1].set_title('Market Premium vs. Risk Free', loc='center',fontsize=12) 

ax[0].set_xlabel("SMB",fontsize=12)
ax[0].set_ylabel("HML",fontsize=12)

ax[1].set_xlabel("Risk Free",fontsize=12)
ax[1].set_ylabel("Market Premium",fontsize=12)
plt.tight_layout() #Automatically adjust subplot parameters so that the whole subplots area (including labels) will fit.

---
### Time to practice


2. Can you create a new code cell and do the following...

    - Can you do the same type of plot but Market and High Tech, Market and Health? What do you see?
    - Can you place two subplots horizontally instead vertically?

---
## Graph Types

### Line Charts

For the line charts, it is almost a review of the eariler session, the most fundamental ones. 

It is most suitable for analyzing the time trends in our dataset but is usually not a great tool for cross sectional data.

In [86]:
fig, ax = plt.subplots()        # create axis object ax
us_df.plot(ax=ax)                  # ax= looks for axis object, axe is it

### Bar Charts

It is most suitable for comparing feature values of each observation in our cross sectional data. It is usually not a great tool for time series analysis.

*World Bank*. Now let's look at 2013 data for GDP per capita (basically income per person) for several countries:

In [87]:
code    = ['USA', 'FRA', 'JPN', 'CHN', 'IND', 'BRA', 'MEX']
country = ['United States', 'France', 'Japan', 'China', 'India',
             'Brazil', 'Mexico']
gdppc   = [53.1, 36.9, 36.3, 11.9, 5.4, 15.0, 16.5]

wb_df = pd.DataFrame({'gdppc': gdppc, 'country': country}, index=code)

print(wb_df)

In [88]:
fig, axe = plt.subplots()        # create axis object axe
wb_df.plot.bar(ax=axe)           # ax= looks for axis object, axe is it

We can also easily convert the bars horizontally, namely via `barh` function, where `h` stands for *horizontal*, pretty intuitive, right?

In [89]:
fig, axe = plt.subplots()        # create axis object axe
wb_df.plot.barh(ax=axe)           # ax= looks for axis object, axe is it

### Pie

This type of chart can often be used to show the composition of something. Here we will again use the `wb_df` dataset and present a composition of the world GDP. In addition, we can compare each country's contributions more directly and intuitively.



In [90]:
fig, axe = plt.subplots()        # create axis object axe
axe.pie(x=wb_df['gdppc'],labels=wb_df['country']);           # ax= looks for axis object, axe is it

---
### Time to practice


3. Consider the data in the following:

In [50]:
import pandas as pd
data = {'Food': ['French Fries', 'Potato Chips', 'Bacon', 'Pizza', 'Chili Dog'],
        'Calories per 100g':  [607, 542, 533, 296, 260]}
cals = pd.DataFrame(data)

The dataframe `cals` contains the calories in 100 grams of several different foods. We'll create and modify visualizations of this data:

* Set `'Food'` as the index of `cals`.
* Create a bar chart with `cals` using figure and axis objects.
* Add a title.
* Change the color of the bars.  What color do you prefer?
* Add the argument `alpha=0.5`.  What does it do?
* Change your chart to a horizontal bar chart.  Which do you prefer?
* Hide the legend.
* Hide the x/y-axis labels.

---
## Summary


* **`Matplotlib`**: we have covered using (`fig and ax objects`) to plot.

* **Graph Basic Options**: 
    * Set axis labels and titles (mandatory), sometimes legends.
    * Other useful graph options
    
* **Advanced Functionalities