# Introduction to Data Visualization in Python

## Import Datasets

We are going to use two datasets in this tutorial. The [Iris](https://archive.ics.uci.edu/ml/datasets/iris) and the [Wine Reviews](https://www.kaggle.com/zynicide/wine-reviews) dataset.

## Matplotlib

Matplotlib is a Python plotting libary.

### Scatter plot

In [None]:
# create a figure and axis


# scatter the sepal_length against the sepal_width

# set a title and labels


We can color each point by class

In [None]:
colors = {'Iris-setosa':'r', 'Iris-versicolor':'g', 'Iris-virginica':'b'}
fig, ax = plt.subplots()
for i in range(len(iris['sepal_length'])):
    ax.scatter(iris['sepal_length'][i], iris['sepal_width'][i],color=colors[iris['class'][i]])
ax.set_title('Iris Dataset')
ax.set_xlabel('sepal_length')
ax.set_ylabel('sepal_width')

### Line Chart

In [None]:
iris.shape[0]

In [None]:
columns = iris.columns.drop(['class'])
x_data = range(0, iris.shape[0])
fig, ax = plt.subplots()
for column in columns:
    ax.plot(x_data, iris[column])
ax.set_title('Iris Dataset')
ax.legend()

### Histogram

In [None]:
fig, ax = plt.subplots()
ax.hist(wine_reviews['points'])
ax.set_title('Wine Review Scores')
ax.set_xlabel('Points')
ax.set_ylabel('Frequency')

### Bar Chart

In [None]:
fig, ax = plt.subplots()
data = wine_reviews['points'].value_counts()
points = data.index
frequency = data.values
ax.bar(points, frequency)
ax.set_title('Wine Review Scores')
ax.set_xlabel('Points')
ax.set_ylabel('Frequency')

## Pandas Visualization

Pandas uses Matplotlib and makes it easy to plot Data

### Scatter Plot

### Line Chart

### Histogram

### Bar Chart

## Seaborn

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

In [None]:
import seaborn as sns
sns.__version__

### Scatter plot

Highlighting the classes is significally easier than using Matplotlib. We only need to specify the hue parameter.

### Line chart

### Histogram

### Bar chart

## More advanced graphs 

Now that you have a basic understanding about the syntax of Matplotlib, Pandas Visualization and Seaborn I want to show you a few other graph types useful for data science and machine learning. For most of them Seaborn is the go to library because you can use it to make complicated graphs with almost no lines of code.

### Box Plots

Box Plots are useful to show distributions with respect to categories.

## Pie Plots

### Heatmap

Heatmaps are perfect for exploring the correlation of the features

In [None]:
# get correlation matrix
corr = iris.corr()
fig, ax = plt.subplots()
# create heatmap
im = ax.imshow(corr.values)

# set labels
ax.set_xticks(np.arange(len(corr.columns)))
ax.set_yticks(np.arange(len(corr.columns)))
ax.set_xticklabels(corr.columns)
ax.set_yticklabels(corr.columns)

# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
         rotation_mode="anchor")

In [None]:
# get correlation matrix
corr = iris.corr()
fig, ax = plt.subplots()
# create heatmap
im = ax.imshow(corr.values)

# set labels
ax.set_xticks(np.arange(len(corr.columns)))
ax.set_yticks(np.arange(len(corr.columns)))
ax.set_xticklabels(corr.columns)
ax.set_yticklabels(corr.columns)

# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
         rotation_mode="anchor")

# Loop over data dimensions and create text annotations.
for i in range(len(corr.columns)):
    for j in range(len(corr.columns)):
        text = ax.text(j, i, np.around(corr.iloc[i, j], decimals=2),
                       ha="center", va="center", color="black")

We can make plots bigger by using the figsize parameter from matplotlib. To use it with Seaborn we need to pass the seaborn function we are using the matplotlib axis as an argument.

## Faceting

Faceting is the act of breaking data variables up across multiple subplots, and combining those subplots into a single figure.

### FacetGrid

### Pairplot

In [None]:
from pandas.plotting import scatter_matrix

fig, ax = plt.subplots(figsize=(12,12))
scatter_matrix(iris, alpha=1, ax=ax);