<a href="https://colab.research.google.com/github/8291606522/ML-Pracs/blob/Prac5/P5BayesTheorem.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Practical No: 05**
#**Aim: Bayes Theorem**

# **Test Scenario**

Consider a human population that may or may not have cancer (Cancer is True or False) and a medical test that returns positive or negative for detecting cancer (Test is Positive or Negative).
If a randomly selected patient has the test and it comes back positive, what is the probability that the patient has cancer?



Manual Calculation
Medical diagnostic tests are not perfect; they have error.
Sometimes a patient will have cancer, but the test will not detect it. This capability of the test to detect cancer is referred to as the sensitivity, or the true positive rate.
In this case, we will contrive a sensitivity value for the test. The test is good, but not great, with a true positive rate or sensitivity of 85%. That is, of all the people who have cancer and are tested, 85% of them will get a positive result from the test.
P(Test=Positive | Cancer=True) = 0.85


Given this information, our intuition would suggest that there is an 85% probability that the patient has cancer.
Our intuitions of probability are wrong.
This type of error in interpreting probabilities is so common that it has its own name; it is referred to as the base rate fallacy.
It has this name because the error in estimating the probability of an event is caused by ignoring the base rate. That is, it ignores the probability of a randomly selected person having cancer, regardless of the results of a diagnostic test.
In this case, we can assume the probability of breast cancer is low, and use a contrived base rate value of one person in 5,000, or (0.0002) 0.02%.
P(Cancer=True) = 0.02%.


We can correctly calculate the probability of a patient having cancer given a positive test result using Bayes Theorem.
Let’s map our scenario onto the equation:
P(A|B) = P(B|A) * P(A) / P(B)
P(Cancer=True | Test=Positive) = P(Test=Positive|Cancer=True) * P(Cancer=True) / P(Test=Positive)


We know the probability of the test being positive given that the patient has cancer is 85%, and we know the base rate or the prior probability of a given patient having cancer is 0.02%; we can plug these values in:
P(Cancer=True | Test=Positive) = 0.85 * 0.0002 / P(Test=Positive)


We don’t know P(Test=Positive), it’s not given directly.
Instead, we can estimate it using:
P(B) = P(B|A) * P(A) + P(B|not A) * P(not A)
P(Test=Positive) = P(Test=Positive|Cancer=True) * P(Cancer=True) + P(Test=Positive|Cancer=False) * P(Cancer=False)


Firstly, we can calculate P(Cancer=False) as the complement of P(Cancer=True), which we already know
P(Cancer=False) = 1 – P(Cancer=True)
= 1 – 0.0002
= 0.9998


Let’s plugin what we have:
We can plug in our known values as follows:
P(Test=Positive) = 0.85 * 0.0002 + P(Test=Positive|Cancer=False) * 0.9998


We still do not know the probability of a positive test result given no cancer.
This requires additional information.
Specifically, we need to know how good the test is at correctly identifying people that do not have cancer. That is, testing negative result (Test=Negative) when the patient does not have cancer (Cancer=False), called the true negative rate or the specificity.
We will use a contrived specificity value of 95%.
P(Test=Negative | Cancer=False) = 0.95


With this final piece of information, we can calculate the false positive or false alarm rate as the complement of the true negative rate.
P(Test=Positive|Cancer=False) = 1 – P(Test=Negative | Cancer=False)
= 1 – 0.95
= 0.05


We can plug this false alarm rate into our calculation of P(Test=Positive) as follows:
P(Test=Positive) = 0.85 * 0.0002 + 0.05 * 0.9998
P(Test=Positive) = 0.00017 + 0.04999
P(Test=Positive) = 0.05016


Excellent, so the probability of the test returning a positive result, regardless of whether the person has cancer or not is about 5%.
We now have enough information to calculate Bayes Theorem and estimate the probability of a randomly selected person having cancer if they get a positive test result.
P(Cancer=True | Test=Positive) = P(Test=Positive|Cancer=True) * P(Cancer=True) / P(Test=Positive)
P(Cancer=True | Test=Positive) = 0.85 * 0.0002 / 0.05016
P(Cancer=True | Test=Positive) = 0.00017 / 0.05016
P(Cancer=True | Test=Positive) = 0.003389154704944


The calculation suggests that if the patient is informed they have cancer with this test, then there is only 0.33% chance that they have cancer.



In our scenario we were given 3 pieces of information, the the base rate, the sensitivity (or true positive rate), and the specificity (or true negative rate).
Sensitivity: 85% of people with cancer will get a positive test result.
Base Rate: 0.02% of people have cancer.
Specificity: 95% of people without cancer will get a negative test result.


We did not have the P(Test=Positive), but we calculated it given what we already had available.
We might imagine that Bayes Theorem allows us to be even more precise about a given scenario. For example, if we had more information about the patient (e.g. their age) and about the domain (e.g. cancer rates for age ranges), and in turn we could offer an even more accurate probability estimate.


# **Code for executing the example**

In [1]:
# calculate P(A|B) given P(A), P(B|A), P(B|not A)
def bayes_theorem(p_a, p_b_given_a, p_b_given_not_a):
	# calculate P(not A)
	not_a = 1 - p_a
	# calculate P(B)
	p_b = p_b_given_a * p_a + p_b_given_not_a * not_a
	# calculate P(A|B)
	p_a_given_b = (p_b_given_a * p_a) / p_b
	return p_a_given_b
 
# P(A)
p_a = 0.0002
# P(B|A)
p_b_given_a = 0.85
# P(B|not A)
p_b_given_not_a = 0.05
# calculate P(A|B)
result = bayes_theorem(p_a, p_b_given_a, p_b_given_not_a)
# summarize
print('P(A|B) = %.3f%%' % (result * 100))


P(A|B) = 0.339%


These example calculates the probability that a patient has cancer given the test returns a positive result, matching our manual calculation.