# Naives Bayes Model

- The naive Bayes model is a purely probabilistic mode. 
- The main component of the naive Bayes model is Bayes’ theorem.

It’s called naive Bayes because to simplify the calculations, we make a slightly naive assumption that is not necessarily true. However, this assumption helps us come up with a good estimate of the probability.

📊 Datos del problema.

Precisión (sensibilidad y especificidad):

99% de los enfermos dan positivo → sensibilidad = 99%

99% de los sanos dan negativo → especificidad = 99%

Prevalencia de la enfermedad:

Solo 1 de cada 10 000 personas tiene la enfermedad → 0.01%

📐 Cálculo Bayesiano (aproximado con 1 millón de personas)

En un grupo de 1 000 000 de personas:

TP = 1 de cada 10 000 → 100 personas

TN = 999 900 personas

✔️ Prueba en los enfermos reales:

99% de 100 enfermos → 99 dan positivo (true positives)

❌ Prueba en los sanos:
1% de 999,900 sanos → 9999 dan falsos positivos

📦 Total de positivos:

Verdaderos positivos: 99

Falsos positivos: 9999

Total positivos: 99 + 9999 = 10 098


p = 99 / 99 + 9999 



In [22]:
p_have_disease = 0.0001
p_test_effectiveness = 99/100
sample = 1_000_000


global_sick = sample * p_have_disease 
global_healthy = sample * (1 - p_have_disease)

true_positive = global_sick * p_test_effectiveness
false_negative = global_healthy * (1 - p_test_effectiveness)

print(f"Global Sick: {global_sick} - Global Healthy: {global_healthy}")
print(f"True Sick: {true_positive} - False Healthy: {false_negative}")

p_positive = true_positive / (true_positive + false_negative)

print(f"Probability of positive: {p_positive}")

Global Sick: 100.0 - Global Healthy: 999900.0
True Sick: 99.0 - False Healthy: 9999.00000000001
Probability of positive: 0.009803921568627442


Prelude to Bayes’ theorem: The prior, the event, and the posterior

**prior** The initial probability

**event** Something that occurs, which gives us information

**posterior** The final (and more accurate) probability that we calculate using the prior probability and the event

An example follows. Imagine that we want to find out the probability that it will rain today. If we don’t know anything, we can come up with only a rough estimate for the probability, which is the prior. If we look around and find out that we are in the Amazon rain forest (the event), then we can come up with a much more exact estimate. In fact, if we are in the Amazon rain forest, it will probably rain today. This new estimate is the posterior.



### Rule of complementary probabilitis

P(Ec) = 1 − P(E)



1. First

- P(lottery | spam) = 15 = / 20 = 0.75 => the probability that a spam email contains the word lottery.
- P(no lottery | spam): = 1 - 0.75 = 0.25 => the probability that a spam email does not contain the word lottery.
- P(lottery | ham): 5 / 80 = 0.0625 => the probability that a ham email contains the word lottery.
- P(no lottery | ham): 1 - 0.0625 = 0.9375 => the probability that a ham email does not contain

![Confusion's Matriz](../../../../images/Emal_spam_ham.png)



![Confusion's Matriz](../../../../images/product_rule_probabilities2.png)

P(lottery | spam) = 3 / 4 = 0.75

P(no lottery | spam) = 1 - P(lottery | spam) = 1 - 0.75 = 0.25

P(lottery | ham) = 1 / 16  = 0.0625

P(no lottery | ham) = 1 - P(lottery | ham) = 1 - 0.0625 = 0.9375

The next thing we do is find the probabilities of two events happening at the same time. More specifically, we want the following four probabilities:

- The probability that an email is spam and contains the word lottery
- The probability that an email is spam and does not contain the word lottery
- The probability that an email is ham and contains the word lottery
- The probability that an email is ham and does not contain the word lottery
These events are called intersections of events and denoted with the symbol ∩. Thus, we need to find the following probabilities:

P('lottery' ∩ spam)  = 3 / 4 * 1 / 5 = 3 /20 = 0.15
 
P(no 'lottery' ∩ spam) = 1 / 4 * 1 / 5 = 1 / 20 = 0.05

P('lottery' ∩ ham) = 1 / 16 * 4 / 5 =  1 / 20 = 0.05

P(no 'lottery' ∩ ham) = 15 / 16 * 4 / 5 = 15 / 20 = 0.75

![Confusion's Matriz](../../../../images/product_rule_probabilities.png)


 
To convert to  probabilities thas add to 1:

 P(lottery ∩ spam) = 3 / 20 / (3 / 20 + 1 / 20) = 3 / 4

 P(lottery ∩ ham) = 1 / 20 / (3 / 20 + 1 / 20) = 1 / 4 

 ![Confusion's Matriz](../../../../images/email-spam_lottery.png)

## Formulas 1

P(spam | lottery) = P(lottery ∩ spam)  / ( P(lottery ∩ spam) + P(lottery ∩ ham) )

P(ham | lottery) = P(lottery ∩ ham)  / ( P(lottery ∩ ham) + P(lottery ∩ spam) )

## Formulas 2

P(spam | lottery) = P(lottery | spam) * P(spam) / ( P(lottery | spam) * P(spam) + P(lottery | ham) * P(ham) )

P(ham | lottery) = P(lottery | ham) * P(ham) / ( P(lottery | ham) * P(ham) + P(lottery | spam) * P(spam) )


## Bayes theorem

P(E|F) = P(F|E) * P(E) / P(F)

Because the event F can be broken down into the two disjoint events F|E and F|Ec, then

P(E|F) = P(F|E) * P(E) / ( P(F|E) * P(E) + P(F|Ec) * P(Ec) )


P(spam | lottery) = 1 / 5 * 3 /4 / ( 1 / 5 *  3 / 4 + 4 / 5 * 1 / 16) = 3 / 4 = 0.75

P(spam | ham) = 4 / 5 * 1 / 16 / ( 4 / 5 * 1 / 16 + 1 / 5 *  3 / 4) = 1 / 4 = 0.25




## Bayes theorem with multiples features



P(lottery | spam) = 0.75

P(lottery | ham) = 0.25

P(sales | spam) =  6 / 20  = 0.3

P(sales | ham) = 4 / 80 = 0.05

P(lottery, sales | spam) = 0.75 * 0.3 = 0.225

P(lottery, sales | ham) = 0.25 * 0.05  = 0.0125


**naive assumption The words appearing in an email are completely independent of each other. In other words, the appearance of a particular word in an email in no way affects the appearance of another one.**

Most likely, the naive assumption is not true. The appearance of one word can sometimes heavily influence the appearance of another. For example, if an email contains the word salt, then the word pepper is more likely to appear in this email, because many times they go together. This is why our assumption is naive