# Naive Bayes: A Good Starting Point

<p>The naive bayes learner is a simple yet effective machine learning model that is a great starting point to learn about the task of classification. What we're trying to do is predict the most likely class for a given instance. In other words, for a given instance, what is the probability it belongs to a certain class. Our prediction is then based on the class with the highest probability. </p>

<p>What we're trying to do:</p>

$\widehat{C} = argmax_{c_j\in C} P(c_j|T)$

<p>Through bayes rule we can transform the equation</p>

$\widehat{C} = argmax_{c_j\in C} P(T|c_j)*P(c_j)$

<p>Since T is an instance where $T = <x_1, x_2, x_3, ... , x_n>$</p>

$\widehat{C} = argmax_{c_j\in C} P(x_1, x_2, x_3, ... , x_n|c_j)*P(c_j)$

<p>The 'naive' part: Conditional Independance Assumption</p>

<p>We can't directly calculate $P(x_1, x_2, x_3, ... , x_n|c_j)$, so we do something 'dumb', we assume that all our instance attributes are <b>conditionally independant</b> on the class. Rarely does this seem to exist in most datasets but this assumption does not impede on the model's ability to predict classes</p>

<p>This conditional independance assumption translates into the following:</p>

$P(x_1, x_2, x_3, ... , x_n|c_j) ≈ P(x_1|c_j)P(x_2|c_j)P(x_3|c_j)...P(x_n|c_j)$

$≈\prod_{i} P(x_i|c_j)$

In order to classify an instance we simply do the following:
1. Find the prior probability of the class
2. Multiply each attribute's corresponding posterior probability by one another
3. Take 1 and multiply that by 2
4. 

#### Example with a toy dataset:

|Headache|Sore  |Temperature|Cough|Diagnosis|
|--------|------|-----------|-----|---------|
|severe  |mild  |high       |yes |flu     |
|no      |severe|normal     |yes |cold     |
|mild    |mild  |normal     |yes |flu     |
|mild    |no    |normal     |no |cold     |
|severe  |severe|normal     |yes |flu     |


#### Calculate our priors and posteriors:
|Flu                                                                        |Cold                                                                       |
|:--------------------------------------------------------------------------|---------------------------------------------------------------------------|
|$\text{P}(\text{Flu})=\frac{3}{5}$                                         |$\text{P}(\text{Cold})=\frac{2}{5}$    
|$\text{P}(\text{Headache} = \textit{severe } | \text{ Flu})=\frac{2}{3}$   |$\text{P}(\text{Headache} = \textit{severe } | \text{ Cold})=\frac{0}{2}$|
|$\text{P}(\text{Headache} = \textit{mild } | \text{ Flu})=\frac{1}{3}$     |$\text{P}(\text{Headache} = \textit{mild } | \text{ Cold})=\frac{1}{2}$|
|$\text{P}(\text{Headache} = \textit{no } | \text{ Flu})=\frac{0}{3}$       |$\text{P}(\text{Headache} = \textit{no } | \text{ Cold})=\frac{1}{2}$|
|$\text{P}(\text{Sore} = \textit{severe } | \text{ Flu})=\frac{1}{3}$       |$\text{P}(\text{Sore} = \textit{severe } | \text{ Cold})=\frac{1}{2}$|
|$\text{P}(\text{Sore} = \textit{mild } | \text{ Flu})=\frac{2}{3}$         |$\text{P}(\text{Sore} = \textit{mild } | \text{ Cold})=\frac{0}{2}$|
|$\text{P}(\text{Sore} = \textit{no } | \text{ Flu})=\frac{0}{3}$           |$\text{P}(\text{Sore} = \textit{no } | \text{ Cold})=\frac{1}{2}$|
|$\text{P}(\text{Temperature} = \textit{high } | \text{ Flu})=\frac{1}{3}$  |$\text{P}(\text{Temperature} = \textit{high } | \text{ Cold})=\frac{0}{2}$|
|$\text{P}(\text{Temperature} = \textit{normal } | \text{ Flu})=\frac{2}{3}$|$\text{P}(\text{Temperature} = \textit{normal } | \text{ Cold})=\frac{2}{2}$|
|$\text{P}(\text{Cough} = \textit{yes } | \text{ Flu})=\frac{3}{3}$         |$\text{P}(\text{Cough} = \textit{yes } | \text{ Cold})=\frac{1}{2}$|
|$\text{P}(\text{Cough} = \textit{no } | \text{ Flu})=\frac{0}{3}$          |$\text{P}(\text{Cough} = \textit{no } | \text{ Cold})=\frac{1}{2}$|

### Classification Example

1. Someone comes to the clinic with a mild headache, severe soreness, normal temperature and no cough. Are they more likely to have a cold, or the flu?

    
    For each possible class, we calculate the probability of the instance being part of that class, and then take the class with the highest probability as our prediction

    Probability of patient having cold:

$$\text{P(Cold) x P(Headache = mild|Cold) x P(Sore = severe|Cold) x P(Temperature = no|Cold) x P(Cough = no|Cold)}$$
$$= (\frac{2}{5})(\frac{1}{2})(\frac{1}{2})(\frac{2}{2})(\frac{1}{2})$$
$$= 0.05$$

    Probability of patient having flu:
$$\text{P(Flu) x P(Headache = mild|Flue) x P(Sore = severe|Flue) x P(Temperature = no|Flue) x P(Cough = no|Flu)}$$
$$= (\frac{3}{5})(\frac{1}{3})(\frac{1}{3})(\frac{2}{3})(\frac{0}{3})$$
$$= 0$$

The NB learner would choose Cold as the classification of this instance

## Implementation: The Fun Part

### Priors
Getting our priors for each class in the dataset is easy: simple counting is only needed. We can do this through the following,

```python
# assuming we have a pandas dataframe containing our entire dataset with labels
def get_priors(df):
        class_freq = dd(int)
        for item in df['class']:
            class_freq[item] += 1 
        sum_ = sum(class_freq.values())
        for item in class_freq.keys():
            class_freq[item] = class_freq[item]/sum_    
        return class_freq
```
What our priors look like for a 4 class dataset:


```python

car_dataset = preprocess_supervised("car")
nb = NaiveBayes(car_dataset)

nb.priors = defaultdict(int,
            {'acc': 0.2222222222222222,
             'good': 0.03993055555555555,
             'unacc': 0.7002314814814815,
             'vgood': 0.03761574074074074})
```







### Posteriors

Now getting our posteriors is a bit tricky due to the triple dictionary structure of our implementation

```python
def get_posteriors(self):
        df = self.df
        posterior_dict = {}
        class_freq = dd(int)
        for item in df['class']:
            class_freq[item] += 1
        for attribute in list(df)[0:-1]:
            # for each attribute value we need a dictionary of attribute-value: class counts
            posterior_dict[attribute] = {}
        # this allows ease of access for the different classes in the dataset
        dictinct_class_values = df["class"].unique()

        for x in posterior_dict:
            for values in dictinct_class_values:
                distinct_cell_values = df[x].unique()
                posterior_dict[x].update({values:{}})
                for things in distinct_cell_values:
                    posterior_dict[x][values].update({things:0})

        for row in range(len(df.index)):
            for attribute in list(df)[0:-1]:
                # do not contribute missing data to probability counts
                
                if(posterior_dict[attribute][df["class"][row]][df[attribute][row]]!="?"):
                
                    posterior_dict[attribute][df["class"][row]][df[attribute][row]]+=1
                    
        for key, value in posterior_dict.items():
            for key1, value1 in value.items():
                for key2, value2 in value1.items():
                    posterior_dict[key][key1][key2] = value2/class_freq[key1]
        return posterior_dict
```
What our posterior dict looks like for the same 'car' dataset

```python
nb.posteriors =
{0: {'acc': {'high': 0.28125,
   'low': 0.23177083333333334,
   'med': 0.2994791666666667,
   'vhigh': 0.1875},
  'good': {'high': 0.0,
   'low': 0.6666666666666666,
   'med': 0.3333333333333333,
   'vhigh': 0.0},
  'unacc': {'high': 0.26776859504132233,
   'low': 0.21322314049586777,
   'med': 0.22148760330578512,
   'vhigh': 0.2975206611570248},
  'vgood': {'high': 0.0, 'low': 0.6, 'med': 0.4, 'vhigh': 0.0}},
 1: {'acc': {'high': 0.2734375,
   'low': 0.23958333333333334,
   'med': 0.2994791666666667,
   'vhigh': 0.1875},
  'good': {'high': 0.0,
   'low': 0.6666666666666666,
   'med': 0.3333333333333333,
   'vhigh': 0.0},
  'unacc': {'high': 0.25950413223140495,
   'low': 0.22148760330578512,
   'med': 0.22148760330578512,
   'vhigh': 0.2975206611570248},
  'vgood': {'high': 0.2, 'low': 0.4, 'med': 0.4, 'vhigh': 0.0}},
 2: {'acc': {'2': 0.2109375, '3': 0.2578125, '4': 0.265625, '5more': 0.265625},
  'good': {'2': 0.21739130434782608,
   '3': 0.2608695652173913,
   '4': 0.2608695652173913,
   '5more': 0.2608695652173913},
  'unacc': {'2': 0.2694214876033058,
   '3': 0.24793388429752067,
   '4': 0.2413223140495868,
   '5more': 0.2413223140495868},
  'vgood': {'2': 0.15384615384615385,
   '3': 0.23076923076923078,
   '4': 0.3076923076923077,
   '5more': 0.3076923076923077}},
 3: {'acc': {'2': 0.0, '4': 0.515625, 'more': 0.484375},
  'good': {'2': 0.0, '4': 0.5217391304347826, 'more': 0.4782608695652174},
  'unacc': {'2': 0.47603305785123967,
   '4': 0.2578512396694215,
   'more': 0.26611570247933886},
  'vgood': {'2': 0.0, '4': 0.46153846153846156, 'more': 0.5384615384615384}},
 4: {'acc': {'big': 0.375, 'med': 0.3515625, 'small': 0.2734375},
  'good': {'big': 0.34782608695652173,
   'med': 0.34782608695652173,
   'small': 0.30434782608695654},
  'unacc': {'big': 0.30413223140495865,
   'med': 0.3239669421487603,
   'small': 0.371900826446281},
  'vgood': {'big': 0.6153846153846154,
   'med': 0.38461538461538464,
   'small': 0.0}},
 5: {'acc': {'high': 0.53125, 'low': 0.0, 'med': 0.46875},
  'good': {'high': 0.43478260869565216, 'low': 0.0, 'med': 0.5652173913043478},
  'unacc': {'high': 0.22892561983471074,
   'low': 0.47603305785123967,
   'med': 0.2950413223140496},
  'vgood': {'high': 1.0, 'low': 0.0, 'med': 0.0}}}
```

In [1]:
def preprocess_supervised(file, normal = True):
    if(normal):
        df = pd.read_csv("./2018S1-proj1_data/"+file+".csv",header=None)
        unnamed = df.columns[len(df.columns)-1]
        df.rename(columns={unnamed:'class'},inplace=True)
    return df

In [2]:
from NaiveBayes import *
car_dataset = preprocess_supervised("car")

In [12]:
def get_train_test_split(df, split):

    train_percent, test_percent = split[0], split[1]

    # randomise our dataset and reset the indexes
    df_shuffled = df.sample(frac=1).reset_index(drop=True)

    #find the indexes we want to slice
    train_portion_index = int(len(df_shuffled.index)//((1/train_percent)))

    test_portion_index = train_portion_index+1

    #slice the instances according to the split ratio
    train_portion = df_shuffled[:train_portion_index].reset_index(drop=True)

    test_portion = df_shuffled[test_portion_index-1:].reset_index(drop=True)

    return train_portion, test_portion

In [18]:
train, test = get_train_test_split(car_dataset, (0.1,0.9))

In [19]:
nb = NaiveBayes(car_dataset)

In [20]:
np.mean(nb.predict(test)["class"] == nb.predict(test)["predicted"])

0.87274041937816338

In [23]:
len(test)

1383