# Reference
- Pandas documentation https://pandas.pydata.org/
- Matplotlib documentation https://matplotlib.org/
- Seaborn documentation https://seaborn.pydata.org
- Python Fundamentals for Machine Learning by Dr.Thyagaraju G S , Context Innovations Lab
- Python for Data Analysis, Data Wrangling with Pandas, Numpy and IPython, O'Reilly media

## 4. Data Visualization
## 4.1. Matplotlib
Matplotlib is a numerical mathematics extension NumPy and a great package to view or present data in a pictorial or graphical format. It enables analysts and decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns. 

Some of the most commonly used charts.
- plt.scatter – makes a scatter plot 
- plt.bar – creates a bar chart
- plt.boxplot – makes a box and whisker plot
- plt.hist – makes a histogram
- plt.plot – creates a line plot

In [3]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

In [4]:
# Creating plot from random data
# simple bar and scatter plot

x = np.arange(5) # assume there are 5 students
y = np.random.randint(40,80, size=5) # randomize test scores

# Plots in matplotlib reside within a Figure object. 
# matplotlib will draw on the last figure object or create one if necessary

# need to close the figure using show() or close(), if not closed
# any sequenction plot commands will use the same figure.


In [5]:
# subplot

# matplotlib will draw on the last figure object
# k-- means black dash, go- means green with marker and line

# alpha is transparancy

# adjusting the subplot space. values are in percentage of the figure
#subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)


In [7]:
# you can use figsize to guarantee the figure has a certain size (in inches) and aspect ratio if saved to disk

from numpy.random import randn


# saving the figure


## 4.2. Plotting from Pandas

In [8]:
# Note, you can also create subplot in Pandas

# load the iris data available in sklearn
from sklearn.datasets import load_iris
import pandas as pd
iris=load_iris()

# Read sample data
df = pd.DataFrame(iris.data)
df.columns = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']

# plot the dataframe

In [9]:
index = ['one', 'two', 'three', 'four', 'five']
columns = ['A', 'B', 'C', 'D']
df = pd.DataFrame(abs(np.random.randn(5, 4)), index=index, columns=columns)
df.style.background_gradient(cmap='Blues', axis=None)

Unnamed: 0,A,B,C,D
one,0.878139,0.851796,0.905133,0.46523
two,0.347596,0.321698,1.857411,0.777968
three,0.457187,1.779314,1.084819,0.761772
four,0.374439,0.436741,0.304706,0.517029
five,1.711879,1.424668,1.273459,0.015895


## 4.3 Seaborn
Seaborn provide a high level interface to matplotlib API, and relies on Pandas' dataframe to provide visualizations that:
- Using default themes that are aesthetically pleasing.
- Setting custom color palettes.
- Making attractive statistical plots.
- Easily and flexibly displaying distributions.
- Visualizing information from matrices and DataFrames.

In [10]:
# Import seaborn
import seaborn as sns

# Load an example dataset
tips = sns.load_dataset("tips")

# relplot shows relationship between variables


Seaborn provide a lot more types of plot that are difficult to implement with matplotlib

In [11]:
# distribution plot

In [12]:
# distribution plot