## The scenario:

You are a business intelligence manager at a fast moving startup that deals with flowers. Iris Mania is sweeping the world and certain species fetch upwards of 50,000 dollars AU for a single flower!

A new iris has just been delivered. It’s species is not known and the resident florist is
on holidays.

The business has a sample data set with typical measures for the
following three species for iris flower.

Our mystery flower has the following characteristics: <br><br>
Sepal length = 4.2 cm <br>
Sepal width =  4.1 cm <br>
Petal length = 1.3 cm <br>
Petal width =  0.25 cm<br>


Which species is it likely to be?

In [None]:
%pylab inline

import pandas as pd
import seaborn as sns

In [None]:
#Read in data, check for missing values
df = pd.read_csv('./iris.csv')
df.info()

In [None]:
#Check species names and number of species samples collected
print(df['species'].unique())
print(df['species'].value_counts())

In [None]:
#Explore average values of sepal + petal characteristics
groups = df.groupby(by = ['species'])
groups.mean()

Sepal length = 4.2 cm <br>
Sepal width =  4.1 cm <br>
Petal length = 1.3 cm <br>
Petal width =  0.25 cm<br>

In [None]:
#Frequeny plot of characteristics - First sepal length:
ax, fig = plt.subplots(figsize = [7, 5])
for species in df['species'].unique():
    sns.distplot(df[df['species'] == species]['sepal_length'], hist = False, label = species);
plt.legend();

ymax = fig.get_ylim()[1]

plt.vlines(x = 4.2, ymin = 0, ymax = ymax, linestyles = 'dashed', colors = 'k');

print('Unknown species sepal length = 4.2 cm')

In [None]:
ax, fig = plt.subplots(figsize = [7,5])
for species in df['species'].unique():
    sns.distplot(df[df['species'] == species]['sepal_width'], hist = False, label = species)
plt.legend();

ymax = fig.get_ylim()[1]

plt.vlines(x = 4.1, ymin = 0, ymax = ymax - 0.4, linestyles = 'dashed', colors = 'k');

print('Unknown species sepal width = 4.1 cm')

In [None]:
ax, fig = plt.subplots(figsize = [7,5])
for species in df['species'].unique():
    sns.distplot(df[df['species'] == species]['petal_length'], hist = False, label = species)
plt.legend();

ymax = fig.get_ylim()[1]

plt.vlines(x = 1.3, ymin = 0, ymax = ymax, linestyles = 'dashed', colors = 'k');

print('Unknown species petal length = 1.3 cm')

In [None]:
ax, fig = plt.subplots(figsize = [7,5])
for species in df['species'].unique():
    sns.distplot(df[df['species'] == species]['petal_width'], hist = False, label = species)
plt.legend();
print('Unknown species petal width = 0.25 cm')

ymax = fig.get_ylim()[1]

plt.vlines(x = 0.25, ymin = 0, ymax = ymax, linestyles = 'dashed', colors = 'k');

## Given our data, which species is the unknown iris likely to be?