**Class Exercise**: Show of hands for 99%? 50-99%? 10-50%? less than 10%?

Bayes rule is notoriously counter-intuitive when posed with problems like this.

Let's work through the formula:

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$

In this case, $P(A|B)$ denotes the probability of being a user (A), given a positive test (B).

So we can re-write the formula in the language of our problem:

$$ P(user|positive) = \frac{P(positive|user)P(user)}{P(positive)} $$

$P(positive|user)$ is the probability of a positive test, given a that the subject is a user, $P(user)$ is the probability of being a user.

$P(positive)$ is the probability of a positive test - In our case, it is the probability of a user scoring a positive, multiplied by the probability of being a user, plus the probability of a positive test for a non-user, multiplied by the probability of not being a user.

Let's do the math:

In [1]:
p_user = 0.005
p_nonuser = 1 - p_user
p_user_pos = 0.99
p_nonuser_negative = 0.99
p_nonuser_pos = 1 - p_nonuser_negative

prob = (p_user_pos * p_user)/(p_user_pos * p_user + p_nonuser_pos * p_nonuser)

print(prob)

0.33221476510067094


If we run this test again on the same police officer, we would update our prior belief that they have used the drug to be 0.33. 

In [2]:
# Ok, so if we run this test again on the same police officer, we
# would update our prior belief that they have used the drug to be 0.33

p_user = 0.33
p_nonuser = 1 - p_user
p_user_pos = 0.99
p_nonuser_negative = 0.99
p_nonuser_pos = 1 - p_nonuser_negative

prob = (p_user_pos * p_user)/(p_user_pos * p_user + p_nonuser_pos * p_nonuser)

print(prob)

0.9799040191961608


This is a problem! If we had simply done the test and told people that they were 99% likely to have taken the drug, we would be completely wrong.

What we have done is used a prior (in this case the p(user)) to make sure that we are sensible about interpreting our test. We know that drug use in rare in our population, so the majority of positive tests come from false positives.

Sensitivity and Specificity are the terms commonly used in medical diagnosis - in the above example our sensitivity is the probability that we detect a true drug user (0.99), and the specificity is the probability of correctly identifying a non-user (also 0.99 in our example, but not necessarily even). The  rarity of a disease will often cause false positives - you can imagine even if we have a test for a disease which is 99% for both specificity and sensitivity if the disease is rare, the majority of positives will be errors.

---

#### Exercise 1


The world population in 2020 was ~7,800,000,000 people on the planet. Out of those, 5,732,780 people were diagnosed with disease X. We have a new test potentially which identifies people with disease X correctly 99% of the time, and identifies people who do not have it correctly 99.8% of the time.

1. What is the probability that someone identified as having disease X truly having disease X?
2. Is this a useful model? How many false positives do we expect?
3. What recommendations do you have for the modeler - should they focus on false positives, false negatives, or give up?

---

In [3]:
p_sick = 5_732_780 / 7_800_000_000
p_not_sick = 1 - p_sick

p_positive_if_sick = 0.99
p_negative_if_sick = 1 - p_positive_if_sick

p_negative_if_not_sick = 0.998
p_positive_if_not_sick = 1 - p_negative_if_not_sick

# Remind ourselves of Bayes' rule
# p_sick_if_positive = (p_positive_if_sick * p_sick) / p_positive

# Need to calculate p_positive
p_positive = p_not_sick*p_positive_if_not_sick + p_sick*p_positive_if_sick


# Apply Bayes' rule
p_sick_if_positive = (p_positive_if_sick * p_sick) / p_positive
p_sick_if_positive

0.26690442841625095

In [33]:
# 2 - How many false positives do we expect?

# Let's say we tested every person in the world who isn't sick. 
# How many false positives do we expect?

round(p_positive_if_not_sick * p_not_sick * (7_800_000_000 - 5_732_780))

15577077

### Naive Bayes Classification

Naive Bayes is a set of algorithms for classifying data based on *features*. In the simplest case, given a set of data, we can decide which class it belongs to based on its attributes. You can imagine the model looking at mammals vs birds, and finding that the $P(bird|wings)$ is high, with some $(mammal|wings)$ from bats. It will carry out the probability for each independent variable we give, and then find the most likely class.

The Naive Bayes classifier determines what *class* that instances from a new set of data belongs to, given previous data it has observed and learned from. Suppose we have many classes $C_1,C_2,\ldots,C_n$, and we represent the set of data to be classified as $\textbf{x} =  [x_1, x_2, \cdots , x_k]$.  The probability that the given data $\textbf{x}$ belongs to class $C_i$ is given by

$$ P(C_i\,|\,\textbf{x}) = \frac{P(C_i)P(\textbf{x}\,|\,C_i)}{P(\textbf{x})}$$

We will carry out a few simple examples - Naive Bayes Classifiers are often called a machine learning method, which we will be talking about in a couple weeks time.

You can read the documentation here:
http://scikit-learn.org/stable/modules/naive_bayes.html

Let's use an example dataset, to classify fruits versus vegetables:

In [5]:
import pandas as pd
import numpy as np

# data:
columns = ['Used in salads', 'Grows Underground', 'Served cooked', 'Needs Peeling', 'Fruit']

apple = [0,0,0,0,1]
tomato = [1,0,1,0,1]
potato = [0,1,1,1,0]
carrot = [1,1,0,1,0]
banana = [0,0,0,1,1]
turnip = [0,1,1,1,0]

data = pd.DataFrame([apple, tomato, potato, carrot, banana, turnip], columns = columns)

data

Unnamed: 0,Used in salads,Grows Underground,Served cooked,Needs Peeling,Fruit
0,0,0,0,0,1
1,1,0,1,0,1
2,0,1,1,1,0
3,1,1,0,1,0
4,0,0,0,1,1
5,0,1,1,1,0


From here, we will use the classifier to figure out the probabilities. Under the hood, here is effectively what is happening:

In [6]:
x = data.iloc[:,:-1].values
y = data.iloc[:,-1].values

counts = {}
for label in np.unique(y):
    counts[label] = x[y == label].sum(axis = 0)
print(counts)

{0: array([1, 3, 2, 3]), 1: array([1, 0, 1, 1])}


The classifier then compares the frequency of each feature of $X$ to each label in $y$, and calculates a probability that a new item with that attribute belongs to the class.

We use BernoulliNB here, as we have all 1/0 data - GaussianNB stores mean and standard deviation (sd) of each value for a continuous variables, and Mulitnomial allows multiple categorical values. When we see new data, we use the rule to classify it into the new bin:

In [7]:
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB

# Instatiate our model
nbmodel = BernoulliNB()
# Fit our model
nbmodel.fit(x,y)

BernoulliNB()

Lets say we have a new edible object where we want to predict whether it's a fruit or a vegetable.

In [8]:
# new data to be classified will have values for each column, but not
# our target column. (in_salads, underground, cooked, peeling)
new_data = [1, 0, 1, 1] # e.g. a pineapple

# reformat our data as a 1xn array
new_data = np.array(new_data).reshape(1, -1)

print(nbmodel.predict_proba(new_data) )#get probabilistic prediction
print(nbmodel.predict(new_data) )#get hard prediction

# nbmodel.feature_count_ 
#print("score",nbmodel.score(x,y))

[[0.42857143 0.57142857]]
[1]


In [9]:
nbmodel.feature_count_

array([[1., 3., 2., 3.],
       [1., 0., 1., 1.]])