# Applying Bayes' theorem to iris classification

Let's see if **Bayes' theorem** might be able to help us solve a **classification task**, namely predicting the species of an iris!

## Preparing the data

We'll load the iris data into a DataFrame, and **round up** all of the measurements to the next integer:

In [1]:
from sklearn.datasets import load_iris
import numpy as np
import pandas as pd

In [2]:
# load the iris data
iris = load_iris()

# round up the measurements
X = np.ceil(iris.data)

# clean up column names
col_names = [name[:-5].replace(' ', '_') for name in iris.feature_names]

# read into pandas
df = pd.DataFrame(X, columns=col_names)

# create a list of species using iris.target and iris.target_names
species = [iris.target_names[num] for num in iris.target]

# add the species list as a new DataFrame column
df['species'] = species

In [3]:
# print the head
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,6.0,4.0,2.0,1.0,setosa
1,5.0,3.0,2.0,1.0,setosa
2,5.0,4.0,2.0,1.0,setosa
3,5.0,4.0,2.0,1.0,setosa
4,5.0,4.0,2.0,1.0,setosa


In [4]:
df.columns

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
       'species'],
      dtype='object')

## Deciding how to make a prediction

Let's say that we have an **out-of-sample observation** with the following measurements: **7, 3, 5, 2**. I want to predict the species of this iris. 

How might we do that?

We'll first examine all observations in the **training data** with those measurements:

In [11]:
# show all observations with features: 7, 3, 5, 2
df[(df.sepal_length==7) & (df.sepal_width==3) & 
   (df.petal_length==5) & (df.petal_width==2)]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
54,7.0,3.0,5.0,2.0,versicolor
58,7.0,3.0,5.0,2.0,versicolor
63,7.0,3.0,5.0,2.0,versicolor
68,7.0,3.0,5.0,2.0,versicolor
72,7.0,3.0,5.0,2.0,versicolor
73,7.0,3.0,5.0,2.0,versicolor
74,7.0,3.0,5.0,2.0,versicolor
75,7.0,3.0,5.0,2.0,versicolor
76,7.0,3.0,5.0,2.0,versicolor
77,7.0,3.0,5.0,2.0,versicolor


In [12]:
# count the species for these observations
our_measurement_counts = df[(df.sepal_length==7) & (df.sepal_width==3) & 
                             (df.petal_length==5) & (df.petal_width==2)
                            ].species.value_counts()

In [10]:
our_measurement_counts

versicolor    13
virginica      4
Name: species, dtype: int64

In [13]:
# count the species for all observations
df.species.value_counts()

versicolor    50
virginica     50
setosa        50
Name: species, dtype: int64

Okay, so how might **Bayes' theorem** help us here?

Let's frame this as a **conditional probability**: What is the probability of some particular class, given the measurements 7352?

$$P(class | 7352)$$

We could calculate this conditional probability for **each of the three classes**, and then predict the class with the **highest probability**:

$$P(setosa | 7352)$$
$$P(versicolor | 7352)$$
$$P(virginica | 7352)$$

## Calculating the probability of each class

Let's start with **versicolor**:

$$P(versicolor | 7352) = \frac {P(7352 | versicolor) \times P(versicolor)} {P(7352)}$$

We'll calculate each of the terms on the right side of the equation:

$$P(7352 | versicolor) = \frac {13} {50} = 0.26$$

$$P(versicolor) = \frac {50} {150} = 0.33$$

$$P(7352) = \frac {17} {150} = 0.11$$

Therefore, Bayes' theorem says the **probability of versicolor given these measurements** is:

$$P(versicolor | 7352) = \frac {0.26 \times 0.33} {0.11} = 0.76$$

Let's repeat this process for the other two classes, though we already know that versicolor will have the highest probability:

$$P(virginica | 7352) = \frac {0.08 \times 0.33} {0.11} = 0.24$$

$$P(setosa | 7352) = \frac {0 \times 0.33} {0.11} = 0$$

In summary, we framed a **classification problem** as three conditional probability equations, we used **Bayes' theorem** to solve those equations, and then we made a **prediction** by choosing the class with the highest conditional probability.

## Adjusting the data

Let's make some hypothetical adjustments to the data, to demonstrate how Bayes' theorem actually makes intuitive sense:

Pretend that **more of the existing versicolors were 7352:**

- $P(7352|versicolor)$ would increase, thus increasing the numerator.
- It would make sense that given an iris with 7352, the probability of it being a versicolor would also increase.

Pretend that **most of the existing irises were versicolor:**

- $P(versicolor)$ would increase, thus increasing the numerator.
- It would make sense that the probability of any iris being a versicolor (regardless of measurements) would also increase.

Pretend that **17 of the setosas were 7352:**

- $P(7352)$ would double, thus doubling the denominator.
- It would make sense that given an iris with 7352, the probability of it being a versicolor would be cut in half.