# Naive Bayes

Is a supervised classification algorithm.

Before we jump into Naive Bayes let's review a few probability concepts.

### <font color='blue'>Definition of Probability</font>

P(A) = number of ways of getting A/total number of possible outcomes

#### Important Rules

If A and B are two events of an experiment then

1) $ 0 \leq P(A) \leq 1 $ likewise $ 0 \leq P(B) \leq 1.$

2) $ P(A) + P(B) = 1.$

3) $ P(A') = 1 - P(A) $ here $A'$ is the complement of A. Some use $A^{c}$ to represent complement of A.

4) $ P(A \cup B) = P(A) + P(B) $ when $P(A \cap B) = 0.$

5) $ P(A \cup B) = P(A) + P(B) - P(A \cap B). $

### Coin Examples

#### Flipping one coin

If we flip a coin what are the outcomes?

Flipping a fair coin is the experiment. 

* Possible outcomes: Heads or Tails, the events are getting heads or getting tails

probability of getting heads?

* P(heads) = number of ways of getting heads/total number of possible outcomes = 1/2 

* P(no heads) = P(tails) = 1 - P(heads) = 1 - 1/2 = 1/2

#### Flipping two coins

If we flip two coins then what is the probability of:
    
possible outcomes: HH, HT, TH, TT, number of possible outcomes is 4

1) Getting two heads:

* P(HH) = number of ways of getting two heads/total possibilities = 1/4 = 0.25 = 25%

2) One head and one tail:

* P(one head and one tail) = 2/4 = 1/2 = 0.5 = 50%

3) At least one tail means one or more tails:

* P(at least one tail) = 1 - P(no tails) = 1 - P(all heads) = 1 - 1/4 = 3/4 

another way

* P(at least one tail) = P(one tail and one head) + P(both tails) = 2/4 + 1/4 = 3/4

#### Six-sided Die

<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse2.mm.bing.net%2Fth%3Fid%3DOIP.girWU8iHl7UymkU44cnUtQHaFj%26pid%3DApi&f=1&ipt=131e24021adb62704f7d475ca78aee44ee67d6b9db04dbefc776d7905b3905a9&ipo=images">


Outcomes are {1, 2, 3, 4, 5, 6} 

* P(getting 4) = 1/6

* P(3 or 6) = P(3) + P(6) = 1/6 + 1/6 = 2/6 = 1/3

##### In-class activity: 

If you roll a six-sided die, what is the probability of:

Outcomes are {1, 2, 3, 4, 5, 6}, number of outcomes is 6

1) Getting a 3: P(3) 

2) Getting 2 or 4: P(2 or 4) =  

3) Getting an odd number: P(odd number) = 

#### Computing total possible outcomes: 

If we flip one coin - possible outcomes are 2.

If we flip two coins - possible outcomes are $2^2 = 4.$

If we flip 3 coins - possible outcomes are $2^3 = 8.$

If we roll a six-sided die - possible outcomes are 6.

If we roll two six-sided dice - possible outcomes are $6^2 = 36.$

#### Examples Rolling two six-sided dice


<img src ="dice.png" width = 300, height = 100>

Least sum is 2, highest sum is 12.


1) Sums that are divisible by 3: sum 3, sum 6, sum 9 and sum 12

* outcomes - (1,2)(2,1) (1,5)(5,1)(2,4)(4,2),(3,3) (3,6)(6,3)(4,5)(5,4) (6,6)

* total outcomes who sum is divisible by 3 is = 12

* P(sum that is divisible by 3) = 12/36 = 1/3


2) Sums that are divisible by 4: sum 4, sum 8, sum 12

* outcomes - (1,3)(2,2)(3,1) (2,6)(3,5)(4,4)(5,3)(6,2) (6,6)

* total outcomes who sum is divisible by 4 is = 9

* P(sum that is divisible by 4)  = 9/36 = 1/4


* Only sum 12 is divisible by both 3 and 4, this can be obtained by (6,6) 

* P(sum that is divisible by 3 and 4)  = 1/36


* P(sum that is divisible by 3 or 4) = P(sum divisible by 3) + P(sum divisible by 4) - P(sum divisible by both 3 and 4) = 12/36 + 9/36 - 1/36 = 20/36.


3) Sums that are divisible by 5: sum 5, sum 10

* outcomes -(1,4)(4,1)(2,3)(3,2) (4,6)(6,4)(5,5)

* sums that are divisible by 6: sum 6, sum 12

* outcomes -(1,5)(5,1)(2,4)(4,2)(3,3), (6,6)


4) P(sum divisible by both 5 and  6) = 0/36 = 0

5) P(sum divisble by 5 or 6) = P(sum div by 5) + P(sum div by 6) = 7/36 + 6/36 = 13/36 

In-class activity: If you roll two six-sided dice, what is the probability of:

1) Getting two even numbers.

   * possible outcomes = 
   * P(two even numbers) =  

2) Getting a sum of 7

   * sum 7 possible ways = 
   * P(sum 7) = 

3) Getting a sum divisible by 8

* sums that are divisble by 8 are sum 8
* sum 8 can be obtained in 
* P(sum divisible by 8) = 
 

## <font color='blue'>Independence of Events</font>
Two events are independent when

$ P(A \cap B) = P(A)*P(B).$

#### Deck of Cards 

Deck of 52 cards - 

* 4 suits - hearts (red), diamonds (red), spades (black), and clubs (black). 

* How many cards in each suit? 13 cards in each suit.

##### What kind of cards?

* Numbered cards: 2, 3, ..., 10, 

* Face cards: Jack, Queen, King,

* and Ace 

* In each suit: there are 9 number cards, 3 face cards, and 1 ace. 

#### Examples of drawing cards from a deck of 52 cards

We are drawing a card from a deck of 52 cards. Answer the following:

P(a red card) = 26/52 = 1/2 = 50%

P(a face card) = 12/52 

Let's consider a few examples:

1) Two cards are drawn from a deck of 52 cards with replacement. 

1a) What is the probability of choosing a king and then a nine?
 
P(a king and a nine) = 

1b) What is the probability of choosing a numbered card and then a face card?

P(a numbered and a face) =  

2) A bowl contains 3 red, 4 green and 8 blue marbles. Three marbles are drawn from the bowl with replacement. What is the probability of choosing a blue, a red and a green?

r = 3, g = 4, b = 8, total marbles are 15

P(b and r and g) = 

In-class activity: If you roll a die twice, what is the probability of:

1) Getting a 4 on the first roll and a 3 on the second roll.

P(4 on first and 3 on second) = 
   
2) Getting an even number on the first roll and a number divisible by 3 on second.

    P(even number) = 
    P(number div by 3) =  

   P(even number on the first roll and number divisible by 3 on second) = 

3) Getting a sum divisible by 5 or a sum divisible by 4.

   P(sum div by 5 or sum div by 4) = P(sum div by 5) + P(sum div by 4) = 

## <font color='blue'>Conditional probability</font>

Conditional probability of A given B is, another way to think about this is probability of A when we know that B happened.
    
$$ P(A, given B) =  P(A|B) = \frac{P(A \cap B)}{P(B)} $$

#### Examples of using conditional probability on a six-sided die

We roll a six-sided die. What is the probability of getting a 4 given that an even number occured?

P(4|even) = P(4 and even)/P(even) = 


P(3|even) = P(3 and even)/P(even) = 

Conditional probability is shrinking the total outcomes of experiment.

P(7|odd) = 

P(6|odd) = 

#### A few examples of using Conditional Probability on a deck of 52 cards

1a) If you pick a card from a deck of 52 cards, then what is the probability of getting an ace given it is a diamond?

straight-forward way:

P(ace|diamond) = 1/13

conditional probability 

P(Ace|Diamond) = P(Ace and Diamond)/P(Diamond) = 1/52/13/52 = 1/52 * 52/13 = 1/13 

1b) If you pick a card from a deck of 52 cards, what is the probability of getting a face card given it is a red card?

straight-forward way: 

P(face|red) = 6/26 

conditional probability

P(face|red) = P(face and red)/P(red) = 6/52/26/52 = 6/26 

#### Conditional Probability Example with a Contingency Table
Consider the table below. What is the probability that a person chosen at random from the below group is a teacher given that they are a female?

<img src="conditional1.png" width=300, height=200>

P(teacher|female) = P(teacher and female)/P(female) = 8/100/40/100 = 8/100 * 100/40 = 8/40 = 1/5 = 20%

In-class activity

1. P(male|teacher) = 

2. P(student|male) = 

   P(male|teacher) =    P(student|male) = 

In-class activity: If you roll a die once, what is the probability of:

1) Getting a 5 given that the outcome is odd.
   P(5|odd) = 

2) Getting 4 given that the outcome is odd.
   P(4|odd) = 

3) Getting 6 given that the outcome is divisible by 3.
   P(6|div by 3) =  
   P(6|div 3) = 

## <font color='blue'>Naive Bayes</font>
Is a probabilistic classifier technique.

It is fast and scalable. Used for binary and for multi class classification. 

It assumes that every feature is unrelated to other features.
This is the disadvantage of this model as in real life, features might not be unrelated to each other. 


Where is Naive Bayes used:

1) Text classification

2) Recommendation system

3) Weather forecasting and more.

## <font color='blue'>Naive Bayes Formula</font>
<img src="bayes1.png" width = 300, height=200>


References:
https://www.machinelearningplus.com/predictive-modeling/how-naive-bayes-algorithm-works-with-example-and-full-code/

### Derivation of Bayes Formula


a) P(Y|X) = P(Y and X)/P(X) by using conditional probability

multiply both sides by P(X)

* P(Y|X) P(X) = P(Y and X)  - eq (1)


b) P(X|Y) = P(X and Y)/P(Y)

if we multiply both sides by P(Y)

* P(X|Y) P(Y) = P(X and Y)  - eq (2)


since P(X and Y) = P(Y and X), we can equate the left hand sides in equations (1) and (2)

* P(Y|X) P(X) = P(X|Y) P(Y)

divide both sides by P(X) will give us the Bayes rule

* P(Y|X) = P(X|Y) * P(Y) /P(X)

posterior = (likelihood * prior)/evidence


P(X|Y) = likelihood

P(Y) = prior

P(X) = evidence

P(Y|X) = posterior = ?