# Cancer Test Results

We will learn to apply the **Bays Rule** in this project. 

In this project, we will first find the following probabilities from the data given(prior probability): 
- Patients with cancer
- Patients without cancer
- Patients with cancer who tested positive
- Patients with cancer who tested negative
- Patients without cancer who tested positive
- Patients without cancer who tested negative

Then we will find the following probalities using Bays Rule(posterior probability):
- patients who tested positive has cancer
- patients who tested positive does not has cancer
- patients who tested negative has cancer
- patients who tested negative does not has cancer

In [1]:
# load dataset
import pandas as pd

df = pd.read_csv("cancer_test_data.csv")

df.head()

Unnamed: 0,patient_id,test_result,has_cancer
0,79452,Negative,False
1,81667,Positive,True
2,76297,Negative,False
3,36593,Negative,False
4,53717,Negative,False


In [4]:
# number of patients
print(df.duplicated().sum())
df.patient_id.nunique()

0


2914

In [11]:
# number of patients with cancer
df.has_cancer.value_counts()

False    2608
True      306
Name: has_cancer, dtype: int64

In [12]:
# number of patients without cancer
df.has_cancer.value_counts()

False    2608
True      306
Name: has_cancer, dtype: int64

In [18]:
# proportion of patients with cancer
prop_with_c = len(df.query("has_cancer == True"))/len(df)
prop_with_c

0.10501029512697323

In [19]:
# proportion of patients without cancer
prop_without_c = len(df.query("has_cancer == False"))/len(df)
prop_without_c

0.8949897048730268

In [22]:
# proportion of patients with cancer who test positive
prop_pos = len(df.query("has_cancer == True & test_result == 'Positive'"))/ len(df.query('has_cancer'))
prop_pos

2637.8366013071895

In [23]:
# proportion of patients with cancer who test negative
prop_neg = len(df.query("has_cancer == True & test_result == 'Negative'"))/ len(df.query('has_cancer'))
prop_neg

276.16339869281046

In [27]:
# proportion of patients without cancer who test positive
prop_pos_without = len(df.query("has_cancer == False & test_result == 'Positive'"))/ len(df.query("has_cancer == False"))
prop_pos_without

1.7352941176470589

In [28]:
# proportion of patients without cancer who test negative
prop_neg_without = len(df.query("has_cancer == False & test_result == 'Negative'"))/ len(df.query("has_cancer == False"))
prop_neg_without

0.7963957055214724

Based on the above proportions observed in the data, we can assume the following probabilities.

### Probability
- `P(cancer) = 0.105`	Probability a patient has cancer
- `P(~cancer) = 0.895`	Probability a patient does not have cancer
- `P(positive|cancer) = 0.905`	Probability a patient with cancer tests positive
- `P(negative|cancer) = 0.095`	Probability a patient with cancer tests negative
- `P(positive|~cancer) = 0.204`	Probability a patient without cancer tests positive
- `P(negative|~cancer) = 0.796`	Probability a patient without cancer tests negative

## Bays Rule

In [4]:
# What proportion of patients who tested positive has cancer?

'''P(c)P(pos|c) / P(c)P(pos|c) + P(~c)P(pos|~c) '''

(0.105*0.905) / (0.105*0.905 + 0.895*0.204) 

0.34230291241151994

In [5]:
# What proportion of patients who tested positive doesn't have cancer?

'''P(~c)P(pos|~c) / P(c)P(pos|c) + P(~c)P(pos|~c) '''

(0.895*0.204) / (0.105*0.905 + 0.895*0.204) 

0.65769708758848

In [6]:
# What proportion of patients who tested negative has cancer?

'''P(c)P(neg|c) / P(c)P(neg|c) + P(~c)P(neg|~c) '''

(0.105*0.095) / (0.105*0.095 + 0.895*0.796) 

0.013808235106832134

In [7]:
# What proportion of patients who tested negative doesn't have cancer?
'''P(~c)P(neg|~c) / P(c)P(neg|c) + P(~c)P(neg|~c) '''

(0.895*0.796) / (0.105*0.095 + 0.895*0.796) 

0.986191764893168

# Conclusion

Probalities using Bays Rule(posterior probability):
- patients who tested positive has cancer          `0.34`
- patients who tested positive does not has cancer `0.66`
- patients who tested negative has cancer          `0.01`
- patients who tested negative does not has cancer `0.99`