# Applying Bayes' theorem to iris classification

Can **Bayes' theorem** help us to solve a **classification problem**, namely predicting the species of an iris?

## Preparing the data

We'll read the iris data into a DataFrame, and **round up** all of the measurements to the next integer:

In [1]:
import pandas as pd
import numpy as np

In [2]:
# read the iris data into a DataFrame
#url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
url = 'https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv'
#col_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
#iris = pd.read_csv(url, header=None, names=col_names)
iris1 = pd.read_csv(url, header=0)
iris1.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [3]:
# apply the ceiling function to the numeric columns
# The ceiling of a scalar 'x' is the smallest integer 'i', such that i >= x.
iris = iris1[:]
iris.loc[:, 'sepal_length':'petal_width'] = iris1.loc[:, 'sepal_length':'petal_width'].apply(np.ceil)
iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,6.0,4.0,2.0,1.0,setosa
1,5.0,3.0,2.0,1.0,setosa
2,5.0,4.0,2.0,1.0,setosa
3,5.0,4.0,2.0,1.0,setosa
4,5.0,4.0,2.0,1.0,setosa


## Deciding how to make a prediction

Let's say that I have an **out-of-sample iris** with the following measurements: **7, 3, 5, 2**. How might I predict the species?

In [4]:
# show all observations with features: 7, 3, 5, 2
iris[(iris.sepal_length==7) & (iris.sepal_width==3) & (iris.petal_length==5) & (iris.petal_width==2)]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
54,7.0,3.0,5.0,2.0,versicolor
58,7.0,3.0,5.0,2.0,versicolor
63,7.0,3.0,5.0,2.0,versicolor
68,7.0,3.0,5.0,2.0,versicolor
72,7.0,3.0,5.0,2.0,versicolor
73,7.0,3.0,5.0,2.0,versicolor
74,7.0,3.0,5.0,2.0,versicolor
75,7.0,3.0,5.0,2.0,versicolor
76,7.0,3.0,5.0,2.0,versicolor
77,7.0,3.0,5.0,2.0,versicolor


It's quite easy to compute the conditional probability of which species will be most likely for a test data with 7,3,5,2 using simple counting.

In [5]:
# count the species for these observations
### This gives you an idea of the (conditional) probability of each species given the data is 7,3,5,2
iris[(iris.sepal_length==7) & (iris.sepal_width==3) & (iris.petal_length==5) & (iris.petal_width==2)].species.value_counts()

versicolor    13
virginica      4
Name: species, dtype: int64

In [6]:
### Using the above, It's quite easy to compute the (conditional) probability of each species given the data is 7,3,5,2
iris[(iris.sepal_length==7) & (iris.sepal_width==3) & (iris.petal_length==5) & (iris.petal_width==2)].species.value_counts(1)

versicolor    0.764706
virginica     0.235294
Name: species, dtype: float64

We can also use the longer method of computing the same using Bayes' Theorem

In [7]:
# Now count the species for all observations
iris.species.value_counts()

setosa        50
versicolor    50
virginica     50
Name: species, dtype: int64

In [8]:
### This gives you an idea of the (prior) probability of each species in the data set
iris.species.value_counts(1)

setosa        0.333333
versicolor    0.333333
virginica     0.333333
Name: species, dtype: float64

Let's frame this as a **conditional probability problem**: What is the probability of some particular species, given the measurements 7, 3, 5, and 2?

$$P(species \ | \ 7352)$$

We could calculate the conditional probability for **each of the three species**, and then predict the species with the **highest probability**:

$$P(setosa \ | \ 7352)$$
$$P(versicolor \ | \ 7352)$$
$$P(virginica \ | \ 7352)$$

## Calculating the probability of each species

**Bayes' theorem** gives us a way to calculate these conditional probabilities.

Let's start with **versicolor**:

$$P(versicolor \ | \ 7352) = \frac {P(7352 \ | \ versicolor) \times P(versicolor)} {P(7352)}$$

We can calculate each of the terms on the right side of the equation:

$$P(7352 \ | \ versicolor) = \frac {13} {50} = 0.26$$

$$P(versicolor) = \frac {50} {150} = 0.3333333$$

$$P(7352) = \frac {17} {150} = 0.11333333$$

Therefore, Bayes' theorem says the **probability of versicolor given these measurements** is:

$$P(versicolor \ | \ 7352) = \frac {0.26 \times 0.33333} {0.113333} = 0.7647$$

Let's repeat this process for **virginica** and **setosa**:

$$P(virginica \ | \ 7352) = \frac {0.08 \times 0.3333333} {0.1133333} = 0.2353$$

$$P(setosa \ | \ 7352) = \frac {0 \times 0.333333} {0.1133333} = 0$$

We predict that the iris is a versicolor, since that species had the **highest conditional probability**.

## Summary

1. WE can confirm the above by running a Naive Bayes Algorithm on data set (see below)
1. We framed a **classification problem** as three conditional probability problems.
1. We used **Bayes' theorem** to calculate those conditional probabilities.
1. We made a **prediction** by choosing the species with the highest conditional probability.

In [9]:
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()

In [10]:
target = 'species'
predictors = [x for x in list(iris) if x not in [target]]
gnb.fit(iris[predictors], iris[target]).predict(np.array([7,3,5,2]).reshape(1, -1))

array(['versicolor'], 
      dtype='|S10')

## Bonus: The intuition behind Bayes' theorem

Let's make some hypothetical adjustments to the data, to demonstrate how Bayes' theorem makes intuitive sense:

Pretend that **more of the existing versicolors had measurements of 7352:**

- $P(7352 \ | \ versicolor)$ would increase, thus increasing the numerator.
- It would make sense that given an iris with measurements of 7352, the probability of it being a versicolor would also increase.

Pretend that **most of the existing irises were versicolor:**

- $P(versicolor)$ would increase, thus increasing the numerator.
- It would make sense that the probability of any iris being a versicolor (regardless of measurements) would also increase.

Pretend that **17 of the setosas had measurements of 7352:**

- $P(7352)$ would double, thus doubling the denominator.
- It would make sense that given an iris with measurements of 7352, the probability of it being a versicolor would be cut in half.

### Q: What happens when the Test Data has a much different Class distribution than the Train data. In that case, the Naive Bayes algorithm will struggle to predict correctly on test data.  

A: The creators of SK-Learn's Naive Bayes algorithm though this might happen, and have provided us a simple way to re-set (or change) the class_priors to enable the model to predict better. This comes in handy during Imbalanced Classes (if the test differs from the train). 
#### use custom prior to make 1 more likely
gnb = GaussianNB(priors=[0.1, 0.9])