# Bayes' Theorem 

It is said that the Reverend Thomas Bayes developed his rule on inverse probability while he was trying to prove the existence of God somewhere around 1740. He came up with a method for calculating the probability of an event occurring given that another event has occurred. Starting out with the prior probability (or believe) $P(A)$, when given a likelihood) $P(B\ |\ A)$ and evidence $P(B)$ we arrive at the posterior probability $P(A\ |\ B)$. Bayes Rule proves to be a powerful tool and is widely used in diverging areas like economics, artificial intelligence, medicine, journalism, military, just to name a few. Most spam filters use Bayes Rule in one way or another. The Bayes' Theorem formula is, posterior = likelihood times prior, over evidence:

$$
P(A\ |\ B)=\frac{P(B\ |\ A)\cdot P(A)}{P(B)}
$$
The practical power of Bayes Rule is that we often can't find the posterior directly, yet we do know the likelihood of the test and $P(A)$.

In [1]:
print("-"*100)

----------------------------------------------------------------------------------------------------


Q1- What is the chance of someone having COPD (a life-threatening lung disease) given he or she is a smoker - $P(A|B)$. This statistic is hard to figure out, but we do know from medical studies the probability of someone being a smoker given that he/she has COPD - $P(B|A)$. We also know $P(B)$ - the probability that a person is a smoker and $P(A)$ - the chance that someone has COPD. The figures below are rough estimations:

$$
P(A)=0.07\ \small{having\ COPD}\\
P(B)=0.18\ \small{smokers}\\
P(B\ |\ A)=0.85\ \small{is\ or\ was\ smoker\ and\ given\ COPD\ diagnosis}
$$


In [41]:
#Q1- What is the probability of someone having COPD given the person is or was a smoker?
pA = 0.07
pB = 0.18
pBA = 0.85
pAB = (pBA * pA)/pB
print("Chance of someone having COPD (a life-threatening lung disease) given he or she is a smoker is", pAB)

Chance of someone having COPD (a life-threatening lung disease) given he or she is a smoker is 0.3305555555555556


In [42]:
#import packages
import numpy as np
import pandas as pd

In [43]:
# load dataset
df = pd.read_csv('Data/cancer_test_data.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2914 entries, 0 to 2913
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   patient_id   2914 non-null   int64 
 1   test_result  2914 non-null   object
 2   has_cancer   2914 non-null   bool  
dtypes: bool(1), int64(1), object(1)
memory usage: 48.5+ KB


In [46]:
negative = df["test_result"].value_counts()[0]
positive = df["test_result"].value_counts()[1]
cancerNo = df["has_cancer"].value_counts()[0]
cancerYes = df["has_cancer"].value_counts()[1]
total = df.shape[0]

2914

In [61]:
#Q2- What proportion of patients who tested positive has cancer?
positiveAndHasCancer = (df[(df["test_result"] == "Positive") & (df["has_cancer"] == True)].shape[0]*100)/total
print("Proportion of patients who tested positive has cancer is", str(round(positiveAndHasCancer, 3))+"%.")

Proportion of patients who tested positive has cancer is 9.506%.


In [63]:
#Q3- What proportion of patients who tested positive doesn't have cancer?
positiveAndNoCancer = (df[(df["test_result"] == "Positive") & (df["has_cancer"] == False)].shape[0])/total
print("Proportion of patients who tested positive doesn't have cancer is", str(round(positiveAndNoCancer, 3))+"%.")

Proportion of patients who tested positive doesn't have cancer is 0.182%.


In [62]:
#Q4- What proportion of patients who tested negative has cancer?
negativeAndHasCancer = (df[(df["test_result"] == "Negative") & (df["has_cancer"] == True)].shape[0]*100)/total
print("Proportion of patients who tested negative has cancer is", str(round(negativeAndHasCancer, 3))+"%.")

Proportion of patients who tested negative has cancer is 0.995%.


In [64]:
#Q5- What proportion of patients who tested negative doesn't have cancer?
negativeAndNoCancer = (df[(df["test_result"] == "Negative") & (df["has_cancer"] == False)].shape[0]*100)/total
print("Proportion of patients who tested negative doesn't have cancer is", str(round(negativeAndNoCancer, 3))+"%.")

Proportion of patients who tested negative doesn't have cancer is 71.277%.
