# Data Visualization
- Matplotlib
- Basic plots
- Figures and subplots
- More plot settings
    - Plotting variables
    - Plotting time-series data¶

## 1. Matplotlib
- Matplotlib is a Python 2D plotting library which produces publication quality figures 
- Documentation: https://matplotlib.org/contents.html

<img src="http://wiki.openhatch.org/images/d/d8/Matplotlib_gallery.png" stype= "width: 10px"/>
<br>
- ```matplotlib.pyplot``` is a collection of command style functions that make matplotlib work like MATLAB
    - Each ```pyplot``` function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc

In [None]:
# importing pyplot
import matplotlib.pyplot as plt   # usually, pyplot is imported under alias plt
import numpy as np
import pandas as pd

## 2. Basic plots
- Plots can be drawn using ```plot()``` function

In [None]:
# creating simple plot
plt.plot(np.arange(1, 6))     # Integers 1 to 5
plt.ylabel('Five integers')   # setting y-label axis
plt.show()

In [None]:
# plot() with two parameters, x and y
# x = np.arange(1,6)
# y = np.arange(6, 11)
plt.plot(np.arange(1,6), np.arange(6, 11))
plt.show()

In [None]:
# creating a scatterplot
plt.plot(np.arange(1,6), np.arange(1,6), 'bo')    # set style as 'blue circle'
plt.show()

In [None]:
# it is possible to create different plots and visualize them together
plt.plot(np.arange(1,6), np.arange(6, 11), 'ro')        # set style as 'red circle'
plt.plot(np.arange(1,6), np.arange(1,6), 'bs')          # set style as 'blue square'
plt.plot(np.arange(1,6), np.linspace(1, 3, 5), 'g^')    # set style as 'green triangle'
plt.show()

In [None]:
# it is possible to set them in same line as well
plt.plot(np.arange(1,6), np.arange(6, 11), 'ro', np.arange(1,6), np.arange(1,6), 'bs')
plt.show()

In [None]:
# setting axis 
plt.plot(np.arange(1,6), np.arange(1,6), 'bo')
plt.axis([0, 10, 0, 10])    # [x_min, x_max, y_min, y_max]
plt.show()

## 3. Figures and subplots
- ```Figure``` is like a paper that plots are drawn
- ```Subplot``` is one of "plots" in ```Figure```
    - ```Subplots``` can be generated using ```add_subplot()``` function
- When not explicitly setting ```Figure``` and ```Subplot```, such instances are generated automatically

In [None]:
# generating figure and subplot
fig_one = plt.figure(1)
fig_one.add_subplot(111)     # generate subplot of size 1 X 1 
plt.show()

In [None]:
# if one wants to create several plots ...
fig_one = plt.figure(1)
plot_one = fig_one.add_subplot(211)     # generate subplot of size 2 X 1 (first position)
plot_two = fig_one.add_subplot(212)     # generate subplot of size 2 X 1 second position)
plt.show()

In [None]:
# if one wants to create several plots ...
fig_one = plt.figure(1)
plot_one = fig_one.add_subplot(121)     # generate subplot of size 1 X 2 (first position)
plot_two = fig_one.add_subplot(122)     # generate subplot of size 1 X 2 second position)
plt.show()

In [None]:
# generating figure with different sizes
fig_one = plt.figure(figsize = (8, 4))
plot_one = fig_one.add_subplot(211)     # generate subplot of size 2 X 1 (first position)
fig_two = plt.figure(figsize = (4, 8))
plot_two = fig_two.add_subplot(121)     # generating subplot of size 1`X 2 (first position)
plt.show()

In [None]:
# drawing plots in multiple axes
x = np.arange(5)
y = np.arange(5)
z = y ** 2
fig_one = plt.figure(figsize = (8, 4))
fig_one.add_subplot(211)     # generate subplot of size 2 X 1 (first position)
plt.plot(x, y, 'ro')
fig_one.add_subplot(212)     # generate subplot of size 2 X 1 (second position)
plt.plot(y, z, 'g^')
plt.show()

In [None]:
# tuning figure settings
plt.rcParams["figure.figsize"] = (10,4)
plt.rcParams['lines.linewidth'] = 1
plt.rcParams['figure.facecolor'] = 'y'
plt.rcParams['axes.grid'] = True

x = np.arange(5)
y = np.arange(5)
fig_one = plt.figure()     
fig_one.add_subplot(111)      
plt.plot(x, y)

plt.rcParams['lines.linewidth'] = 5
plt.rcParams['figure.facecolor'] = 'w'
plt.rcParams['axes.grid'] = False

fig_two = plt.figure()
fig.add_subplot(111)
plt.plot(x,y)

plt.show()

## 4. More plot settings
- Plotting variables
- Plotting time-series data¶

### Plotting variables

In [None]:
# import glass dataset
col_names = ['RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe', 'Type']
df = pd.read_table('https://archive.ics.uci.edu/ml/machine-learning-databases/glass/glass.data', \
                   sep = ',', header = None, index_col = 0, names = col_names)
print(df.head())

In [None]:
# Creating scatterplot with one variable
# Note that this plot does not convey much information - index is meaningless
plt.plot(df['RI'], 'ro')
plt.xlabel('Index')
plt.ylabel('RI')
plt.show()

In [None]:
# Creating scatterplot with two variables
# Now we can see interacting effect bw two variables
plt.plot(df['Na'], df['Mg'], 'bo')
plt.xlabel('Na')
plt.ylabel('Mg')
plt.show()

In [None]:
# Creating scatterplot with two variables
# Labeling each variable
# Now, index has some meaning
plt.plot(df['Na'], 'bo', label = 'Na')
plt.plot(df['Mg'], 'r^', label = 'Mg')
plt.xlabel('Index')
plt.legend()                               # adding legend to plot
plt.title('Plotting Na and Mg variables')  # adding title to plot
plt.show()

In [None]:
# changing settings
plt.plot(df['Na'], 'bo', label = 'Na')
plt.plot(df['Mg'], 'r^', label = 'Mg')
plt.xlabel('Index')
plt.legend(loc = 'lower left')             # moving legend 
plt.title('Plotting Na and Mg variables', fontdict = {'fontsize': 20})  # making title bigger
plt.show()

### Plotting time-series data

In [None]:
# import NationalNames Dataset
# Dataset containing name counts of each name from 1880 ~
df = pd.read_csv('NationalNames.csv', index_col = 'Name')
print(df.head())
del df['Id']       # delete Id column
print(df.head())

In [None]:
# extracting only partial information
df_partial = df.loc[['Jane', 'Alice', 'Elizabeth', 'Stella', 'Mathilda']]
df_partial.to_csv('NationalNames_partial.csv')

In [None]:
# importing partial dataset
df_partial = pd.read_csv('NationalNames_partial.csv', index_col = 'Name' )
print(df_partial.head())

In [None]:
# select only name 'Jane'
df_jane = df_partial.loc[['Jane']]
df_jane.set_index('Gender', inplace=True)
print(df_jane.head())

In [None]:
# separate statistics of female/male 
jane_female = df_jane.loc[['F']]
jane_male = df_jane.loc[['M']]
print(jane_female.head())
print(jane_male.head())

In [None]:
plt.rcParams["figure.figsize"] = (10,5)
plt.rcParams['lines.linewidth'] = 1

plt.figure(1)
plt.subplot(111)
plt.plot(jane_female['Year'], jane_female['Count'], 'r--', label = 'Female')
plt.plot(jane_male['Year'], jane_male['Count'], 'b:', label = 'Male')
plt.legend()
plt.show()

In [None]:
# calibrating settings for visibility
plt.rcParams["figure.figsize"] = (15,5)
plt.rcParams['lines.linewidth'] = 3

plt.figure(1)
plt.subplot(111)
plt.plot(jane_female['Year'], jane_female['Count'], 'r-', label = 'Female')
plt.plot(jane_male['Year'], jane_male['Count'], 'b-', label = 'Male')
plt.legend()
plt.show()

Now, we want to compare two names - Jane & Alice. Compare only female names!

In [None]:
df_jane = df_partial.loc[['Jane']]
df_jane.set_index('Gender', inplace=True)
print(df_jane.head())
df_alice = df_partial.loc[['Alice']]
df_alice.set_index('Gender', inplace=True)
print(df_alice.head())

In [None]:
df_jane = df_jane.loc[['F']]
df_alice = df_alice.loc[['F']]

In [None]:
# by comparing two (female) names, you can see the overall trend
plt.rcParams["figure.figsize"] = (15,5)
plt.rcParams['lines.linewidth'] = 3

plt.figure(1)
plt.subplot(111)
plt.plot(df_jane['Year'], df_jane['Count'], 'r-', label = 'Jane')
plt.plot(df_alice['Year'], df_alice['Count'], 'b-', label = 'Alice')
plt.legend()
plt.show()

### Exercise 1-1.
- Using data in ```df_partial```, compare other name trends (Elizabeth, Stella, and Mathilda)
- Try diverse settings!

In [None]:
## Your answer

### Exercise 1-2.
- Using ```NationalNames.csv```, visualize naming trend of some **male** names that you are interested in
- Try diverse settings!

In [None]:
## Your answer