# Fisher's Iris dataset

This data is taken from:
* Fisher,R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to Mathematical Statistics" (John Wiley, NY, 1950).

<img src='data-sci-images/fisher-table-all.png' style='height:500px'>

# Grab the data

In [None]:
import pandas as pd

In [None]:
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv'
irisdf = pd.read_csv(path, header=None)
irisdf.columns = ['sepalLength','sepalWidth','petalLength','petalWidth','species']

In [None]:
irisdf

# Data Viz

## Matplotlib

<img src='data-sci-images/matplotlib-front.png'>

"Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.  Matplotlib makes easy things easy and hard things possible." -- [matplotlib.org](https://matplotlib.org/)

* Matplotlib was built on the NumPy and SciPy frameworks and initially made to enable interactive Matlab-like plotting via gnuplot from iPython

* Gained early traction with support from the Space Telescope Institute and JPL

* Easily one of the go-to libraries for academic publishing needs
  * Create publication-ready graphics in a range of formats
  * Powerful options to customize all aspects of a figure
  
* Matplotlib underlies the plotting capabilities of Pandas, Seaborn, and plotnine

<img src='data-sci-images/matplotlib-anatomy.png'>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# a numerical array at evenly sampled intervals
t = np.arange(0., 5., 0.2)

# red dashes, blue squares and green triangles
plt.plot(t, t)
plt.show()

In [None]:
x1 = irisdf.loc[irisdf['species']=='Iris-setosa',('petalWidth')]
y1 = irisdf.loc[irisdf['species']=='Iris-setosa',('petalLength')]
x2 = irisdf.loc[irisdf['species']=='Iris-versicolor',('petalWidth')]
y2 = irisdf.loc[irisdf['species']=='Iris-versicolor',('petalLength')]
x3 = irisdf.loc[irisdf['species']=='Iris-virginica',('petalWidth')]
y3 = irisdf.loc[irisdf['species']=='Iris-virginica',('petalLength')]



In [None]:
fig, ax = plt.subplots(1, 1, figsize=(7.5, 5))

for i, s in enumerate(irisdf.species.unique()):
    tmp = irisdf[irisdf.species == s]
    ax.scatter(tmp.petalLength, tmp.petalWidth,
               label=s)

ax.set(xlabel='Petal Length',
       ylabel='Petal Width',
       title='Petal Width v. Length -- by Species')

ax.legend(loc=2)
fig.show()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(7.5, 5))

def scatter(group):
    plt.plot(group['petalLength'],
             group['petalWidth'],
             'o', label=group.name)

irisdf.groupby('species').apply(scatter)

ax.set(xlabel='Petal Length',
       ylabel='Petal Width',
       title='Petal Width v. Length -- by Species')

ax.legend(loc=2)
fig.show()

# Pandas

Pandas has many convenient options for making plots

In [None]:
iriscolors = []
for i in irisdf.index:
    if irisdf.loc[i,'species'] == 'Iris-setosa':
        iriscolors.append('blue')
    if irisdf.loc[i,'species'] == 'Iris-versicolor':
        iriscolors.append('orange')
    if irisdf.loc[i,'species'] == 'Iris-virginica':
        iriscolors.append('green')

In [None]:
fig,ax = plt.subplots(figsize=(8,5))
ax = irisdf['petalWidth'].plot.hist(bins=20,color='blue')
ax.set_xlabel('petal width',fontsize=14)
ax.set_ylabel('frequency',fontsize=14)
ax.set_title('Histogram of Iris petal widths',fontsize=16)
plt.show()

# Seaborn

If Matplotlib 'tries to make easy things easy and hard things possible,' Seaborn tries to make a well-defined set of hard things easy too.

https://seaborn.pydata.org

<img src='data-sci-images/seaborn.png' width=700>
          
* Built on top of matplotlib and closely integrated with pandas data structures.
* Used for making statistical graphics and using visualization to quickly and easily explore and understand data.
* The style settings can also affect matplotlib plots, even if you don't make them with seaborn.

# plotly

The plotly Python library (plotly.py) is an interactive, open-source, and browser-based graphing library.

https://plot.ly/python/

<img src='data-sci-images/plotly-1.png' width=700>

* An open-source product of Plotly, Inc., that is built on top of Javascript (plotly.js)
* Enables Python users to create beautiful interactive web-based visualizations that can be displayed in Jupyter notebooks, saved to standalone HTML files, or served as part of pure Python-built web applications using Dash
* Also has a version for R, as well as other web visualization products

# Altair

Altair is a declarative statistical visualization library for Python

https://altair-viz.github.io

<img src='data-sci-images/altair.png' width=700>

* Based on Vega and Vega-lite (high-level grammar of interactive graphics)
  * Vega-Lite provides a concise JSON syntax for rapidly generating visualizations to support analysis
  * Its specifications describe visualizations as mappings from data to properties of graphical marks
* Aims for elegant simplicity so the focus can be on understanding data