## Case Study with the Iris dataset

![Iris Flowers](../../assets/images/iris.png)

### Iris Flower:
Iris is the family in the flower which contains the several species such as the iris.setosa, iris.versicolor, iris.virginica, etc.

## Imports

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

## Load Data

In [None]:
df = pd.read_csv('../../../data-science-fall2019/data/Iris.csv')

# Set Id as Index
df.set_index('Id',inplace=True)

## Data Exploration

In [None]:
# View head
df.head(4)

In [None]:
# View tail
df.tail(4)

In [None]:
# View Info
df.info()

In [None]:
# View Describe
df.describe()

## Visualizing Data

#### Scatter plots for Petal Length for each species

In [None]:
# Isolate PetalLength & Species into a new dataframe
df_scatter = df[['PetalLengthCm', 'PetalWidthCm', 'Species']]

colors = ['red', 'orange', 'blue']
species = ['Setosa', 'Versicolor', 'Virginica']

for i in range(0, 3):    
    plt.scatter(        
        df_scatter['PetalWidthCm'],        
        df_scatter['PetalLengthCm'],
        color=colors[i],        
        alpha=0.2,        
        label=species[i]
    )
    
plt.xlabel('petal width (cm)')
plt.ylabel('petal length (cm)')
plt.title('Iris dataset: petal length vs petal width')
plt.legend(loc='lower right')

plt.show()

#### Pie Chart

In [None]:
# Set labels
labels = ['Setosa', 'Versicolor', 'Virginica']

# Get counts using Pandas .value_counts()
sections = df['Species'].value_counts().tolist()

colors = ['c', 'g', 'y']

plt.pie(sections, labels=labels, colors=colors,
        startangle=90,
        explode = (0.02, 0.02, 0.02), # specifies the fraction of the radius with which to offset each wedge
        autopct = '%1.1f%%') # used to label the wedges with their numeric value

title_obj = plt.title("IRIS Count by Species")
plt.setp(title_obj, color='black') 
plt.show()



#### Boxplots for Petal Length for each species

In [None]:
# Isolate PetalLength & Species into a new dataframe
petal_length = df[['PetalLengthCm', 'Species']]

# Review its head, shape and info
print('Shape = {}\n'.format(petal_length.shape))

print('Info = {}\n'.format(petal_length.info()))

petal_length.head()

In [None]:
# Save the new dataframe you created
petal_length.to_csv('boxplot_example.csv', index=None)

# Create a new dataframe for setosa
setosa = petal_length["Species"].isin(['Iris-setosa']) 
df_setosa = petal_length[setosa]

# Plot setosa boxplot
plt.figure(figsize = (10, 7)) 
plt.title('Iris-setosa')
df_setosa.boxplot()

# Create a new dataframe for versicolor
versicolor = petal_length["Species"].isin(['Iris-versicolor']) 
df_versicolor = petal_length[versicolor]

# Plot versicolor boxplot
plt.figure(figsize = (10, 7)) 
plt.title('Iris-versicolor')
df_versicolor.boxplot()

# Create a new dataframe for virginica
virginica = petal_length["Species"].isin(['Iris-virginica']) 
df_virginica = petal_length[virginica]

# Plot virginica boxplot
plt.figure(figsize = (10, 7)) 
plt.title('Iris-virginica')
df_virginica.boxplot()

#### Histograms for Petal Length with 13 bins, 3 bins and 25 bins

In [None]:
# Create list with bin values
bins = [13, 3, 25]

# Loop thru bins
for bin in bins:
    plt.figure(figsize = (10, 7)) 
    x = df.PetalLengthCm 

    plt.hist(x, bins = bin, color = "green") 
    plt.title("Petal Length in cm") 
    plt.xlabel("Petal_Length_cm") 
    plt.ylabel("Count") 

    plt.show()