# Introduction

## Data Description

**In this tutorial, we will work on data visualization of the Iris dataset.**

The Iris flower data set consists of 50 samples from each of three species of Iris Flowers — Iris Setosa, Iris Virginica and Iris Versicolor . The Iris flower data set was introduced by the British statistician and biologist Ronald Fisher in his 1936 paper “The use of multiple measurements in taxonomic problems”.

[Image Source](https://medium.com/analytics-vidhya/exploratory-data-analysis-uni-variate-analysis-of-iris-data-set-690c87a5cd40)

![iris](https://miro.medium.com/max/1400/0*SHhnoaaIm36pc1bd)
![classes](https://miro.medium.com/max/1050/0*QHogxF9l4hy0Xxub.png)

## Data Preparation 

In [1]:
#-- import libs:  matplotlib.pyplot, pandas, and seaborn
import matplotlib 
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
%matplotlib inline

#-- ignore warnings
import warnings
warnings.filterwarnings('ignore')

#-- color blind
plt.style.use('seaborn-colorblind')

In [2]:
# Check version
print(' pandas_v = {} \n seaborn_v = {} \n matplotlib_v =  {}'.format(pd.__version__,
                                                                      sns.__version__, 
                                                                      matplotlib.__version__))

 pandas_v = 1.0.5 
 seaborn_v = 0.10.1 
 matplotlib_v =  3.2.2


In [3]:
#-- data preparation 
# load dataset

# # load from online repository of seaborn
# iris = sns.load_dataset("iris")
# iris.head()

#-- the above is a little different from the below in terms of names
# r stands for "raw" and will cause backslashes in the string to be
# interpreted as actual backslashes rather than special characters
file_path =r'./Iris.csv' #  r stands for "raw" 
iris = pd.read_csv(file_path)
iris.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [4]:
#-- create a preferable data frame for seaborn
iris_melt=pd.melt(iris, id_vars=['Species'], # Column(s) to use as identifier variables.
                   value_vars=['SepalLengthCm', 'SepalWidthCm','PetalLengthCm', 'PetalWidthCm'], # the measured variables
                   var_name='features') # name of the masured variables
iris_melt.head()

Unnamed: 0,Species,features,value
0,Iris-setosa,SepalLengthCm,5.1
1,Iris-setosa,SepalLengthCm,4.9
2,Iris-setosa,SepalLengthCm,4.7
3,Iris-setosa,SepalLengthCm,4.6
4,Iris-setosa,SepalLengthCm,5.0


In [5]:
""" Count the number of missing values in each column of the Iris DataFrame """
print("Number of missing values of the said dataframe:")
print(iris.isna().sum(), '\n')
print("Any missing values?:")
print(iris.isna().values.any())

Number of missing values of the said dataframe:
Id               0
SepalLengthCm    0
SepalWidthCm     0
PetalLengthCm    0
PetalWidthCm     0
Species          0
dtype: int64 

Any missing values?:
False


# Bar Charts

Use **pandas** to 

1) display mean and std (in separate bars) of each feature for each label in a bar chat  
2) display mean ± std of each feature for each label in a bar chat with error bar being ± std  
3) display mean + std of each feature for each label in a stacked bar chart  

Use,**seaborn, matplotlib, respectively** to 

display mean ± std of each feature for each label in a bar chat with error bar being ± std  


## Pandas 

1) display mean and std (in separate bars) of each feature for each label in a bar chat 

2) display mean ± std of each feature for each label in a bar chat with error bar being ± std

3) display mean + std of each feature for each label in a stacked bar chart

## Seaborn

display mean ± std of each feature for each label in a bar chat with error bar being ± std

## Matplotlib

display mean ± std of each feature for each label in a bar chat with error bar being ± std

# Boxplots

Use **pandas, seaborn, matplotlib, respectively**, to create boxplots of each feature for each label. 

A boxplot includes the **interquartile range** ( the 25th to the 75th percentile, IQR), the **minimum** (Q1 -1.5\*IQR), the **lower quartile**(25th percentile, Q1), the **median**(50th percentile, Q2), the **upper quartile**(75th percentile, Q3), **the maximum** (Q3 + 1.5\*IQR)， **outliers**(points outside minimum to maximum), etc. 

![boxplots](https://miro.medium.com/max/1050/1*2c21SkzJMf3frPXPAR_gZA.png)

## Pandas

Use pandas to create boxplots of each feature for each label.

## Seaborn

Use  seaborn to create boxplots of each feature for each label.

## Matplotlib

Use matplotlib to create boxplots of each feature for each label.

# Histogram/ Scatter Plot/ Pair Plot

Use Pandas and Seaborn to, respectively,  
1) create histograms for all features  
2) create scatter plots for `SepalLengthCm` and `SepalWidthCm`  
3) create pair plots based on iris_id_indexed dataframe

## Histogram via Pandas

create histograms for all features  

## Histogram via Seaborn

1) create histograms for all features

## Scatter Plot via Pandas

2) create a scatter plot for SepalLengthCm and SepalWidthCm

## 4.1  Scatter Plot via Seaborn

2) create scatter plots for SepalLengthCm and SepalWidthCm

## Pair Plot via Pandas

3) create a pair plot based on iris_id_indexed dataframe

##  Pair Plot via Seaborn

3) create pair plots based on iris_id_indexed dataframe

# More plots via Seaborn (Additional)

Use Seaborn to 

1) display the histogram and the density plots on the same figure for `SepalLengthCm`  
2) display the joint distribution and the marginal distribution on the same figure for `SepalLengthCm`  and `SepalWidthCm`  
3) create violin plots for `SepalLengthCm` of each label  
4) create violin plots for all the features and labels. 

## 1) 
display the histogram and the density plots on the same figure for SepalLengthCm

## 2) 
display the joint distribution and the marginal distribution on the same figure for `SepalLengthCm`  and `SepalWidthCm` 

## 3) 
create violin plots for SepalLengthCm of each label

## 4) 
create violin plots for all the features and labels.