## Exercise 1
In Orange County, 51% of the adults are males. (It doesn't take too much advanced
mathematics to deduce that the other 49% are females.) One adult is randomly selected
for a survey involving credit card usage.

- **(a)** Find the probability that the selected person is a male.

- **(b)** It is later learned that the selected survey subject was smoking a cigar. Also, 9.5% of males smoke cigars, whereas 1.7% of females smoke cigars (based on data from the Substance Abuse and Mental Health Services Administration). Use this additional information to find the probability that the cigar−smoking respondent is a male

Use following notation: <br>
M = male <br>
F = female <br>
C = cigar smoker<br>
NC = not a cigar smoker<br>


In [1]:
# import NumPy
import numpy as np

In [85]:
# run times
n_runs = 100000

M = 0
M_C = 0
M_NC = 0
F = 0
F_C = 0
F_NC = 0
C = 0
NC = 0

for _ in range(n_runs):
    
    # Male is selected
    if np.random.random() < 0.51:
        M += 1
        
        # Selected male is smoker
        if np.random.random() < 0.095:
            M_C += 1
            C += 1
            
        # Selected male is NOT smoker
        else:
            M_NC += 1
            NC += 1
            
    # Female is selected
    if np.random.random() < 0.49:
        F += 1
        
        # Selected female is smoker
        if np.random.random() < 0.017:
            F_C += 1
            C += 1
            
        # Selected female is NOT smoker
        else:
            F_NC += 1
            NC += 1

In [86]:
P_M = M / n_runs * 100
P_M_C = M_C / n_runs * 100
P_M_NC = M_NC / n_runs * 100
P_F = F / n_runs * 100
P_F_C = F_C / n_runs * 100
P_F_NC = F_NC / n_runs * 100
P_C = C / n_runs * 100
P_NC = NC / n_runs * 100

print(f'Male is selected: {P_M:.2f}%')
print(f'Male is selected and a smoker: {P_M_C:.2f}%')
print(f'Male is selected and not a smoker: {P_M_NC:.2f}%')
print(f'Female is selected: {P_F:.2f}%')
print(f'Female is selected and a smoker: {P_F_C:.2f}%')
print(f'Female is selected and not a smoker: {P_F_NC:.2f}%')
print(f'Smoker: {P_C:.2f}%')
print(f'Not smoker: {P_NC:.2f}%')

Male is selected: 51.17%
Male is selected and a smoker: 4.93%
Male is selected and not a smoker: 46.25%
Female is selected: 49.02%
Female is selected and a smoker: 0.83%
Female is selected and not a smoker: 48.19%
Smoker: 5.76%
Not smoker: 94.43%


# Exercise 2

A diagnostic test has a probability 0.95 of giving a positive result when applied to a person suffering
from a certain disease, and a probability 0.10 of giving a (false) positive when applied to a non-sufferer. It is
estimated that 0.5 % of the population are sufferers. Suppose that the test is now administered to a person about
whom we have no relevant information relating to the disease (apart from the fact that he/she comes from this
population). 

Calculate the following probabilities:
- **(a)** that the test result will be positive;
- **(b)** that, given a positive result, the person is a sufferer;
- **(c)** that, given a negative result, the person is a non-sufferer;
- **(d)** that the person will be misclassified.

Use following notation:

T = test positive <br>
NT = test negative<br>
S = sufferer<br>
NS = non-sufferer<br>
M = misclassified<br>

Solve it by two approaches:
1. Arithmetically
2. By simulation

In [140]:
# run times
n_runs = 100000

S = 0
S_T = 0
S_NT = 0
NS = 0
NS_T = 0
NS_NT = 0
T = 0
NT = 0

for _ in range(n_runs):
    
    # Sufferer is selected
    if np.random.random() < 0.005:
        S += 1
        
        # Selected sufferer is tested positive
        if np.random.random() < 0.95:
            S_T += 1
            T += 1
            
        # Selected sufferer is tested negative
        else:
            S_NT += 1
            NT += 1
            
    # Non-sufferer is selected
    if np.random.random() < 0.995:
        NS += 1
        
        # Selected non-sufferer is tested positive
        if np.random.random() < 0.1:
            NS_T += 1
            T += 1
            
        # Selected non-sufferer is tested negative
        else:
            NS_NT += 1
            NT += 1
            
M = S_NT + NS_T

In [141]:
P_S = S / n_runs * 100
P_S_T = S_T / n_runs * 100
P_S_NT = S_NT / n_runs * 100
P_NS = NS / n_runs * 100
P_NS_T = NS_T / n_runs * 100
P_NS_NT = NS_NT / n_runs * 100
P_T = T / n_runs * 100
P_NT = NT / n_runs * 100
P_M = M / n_runs * 100

print(f'Sufferer is selected: {P_S:.2f}%')
print(f'Sufferer is selected and tested positive: {P_S_T:.2f}%')
print(f'Sufferer is selected and tested negative: {P_S_NT:.2f}%')
print(f'Non-sufferer is selected: {P_NS:.2f}%')
print(f'Non-sufferer is selected and tested positive: {P_NS_T:.2f}%')
print(f'Non-sufferer is selected and teste negative: {P_NS_NT:.2f}%')
print(f'Test positive: {P_T:.2f}%')
print(f'Test negative: {P_NT:.2f}%')
print(f'Misclassified: {P_M:.2f}%')

Sufferer is selected: 0.50%
Sufferer is selected and tested positive: 0.48%
Sufferer is selected and tested negative: 0.02%
Non-sufferer is selected: 99.49%
Non-sufferer is selected and tested positive: 9.75%
Non-sufferer is selected and teste negative: 89.74%
Test positive: 10.23%
Test negative: 89.76%
Misclassified: 9.77%


In [142]:
# Simulation

# (a) that the test result will be positive;
P_Test_Positive = (S_T + NS_T) / n_runs * 100
print(f'Test positive: {P_Test_Positive:.2f}%')

# (b) that, given a positive result, the person is a sufferer;
# P(Sufferer|Positive)
P_Sufferer_Positive = S_T / T * 100
print(f'Given a positive result, the person is a sufferer: {P_Sufferer_Positive:.2f}%')

# (c) that, given a negative result, the person is a non-sufferer;
# P(Non-sufferer|Negative)
P_Non_Sufferer_Negative = NS_NT / NT *100
print(f'Given a negative result, the person is a non-sufferer: {P_Non_Sufferer_Negative:.2f}%')

# (d) that the person will be misclassified.
# False positive + False negative
print(f'The person will be misclassified: {P_M:.2f}%')


Test positive: 10.23%
Given a positive result, the person is a sufferer: 4.67%
Given a negative result, the person is a non-sufferer: 99.98%
The person will be misclassified: 9.77%


In [143]:
# Arithmetically

# (a) that the test result will be positive;
# Sufferer * test positive + non-sufferer * test positive
answer_a = (0.005 * 0.95 + 0.995 * 0.1) * 100
print(f'Test positive: {answer_a:.2f}%')

# (b) that, given a positive result, the person is a sufferer;
# P(Sufferer|Positive) = P(Sufferer) * P(Positive|Sufferer) / P(Positive)
answer_b = ((0.005 * 0.005 * 0.95) / (0.005 * 0.95 + 0.995 * 0.1)) * 100
print(f'Given a positive result, the person is a sufferer: {answer_b:.2f}%')

# (c) that, given a negative result, the person is a non-sufferer;
# P(Non-sufferer|Negative) = P(Non-sufferer) * P(Negative|Non-sufferer) / P(Negative)
answer_c = ((0.995 * 0.995 * 0.9) / (0.005 * 0.05 + 0.995 * 0.9)) * 100
print(f'Given a negative result, the person is a non-sufferer: {answer_c:.2f}%')

# (d) that the person will be misclassified.
# False positive + False negative
answer_d = (0.005 * 0.05 + 0.995 * 0.1) * 100
print(f'The person will be misclassified: {answer_d:.2f}%')

Test positive: 10.43%
Given a positive result, the person is a sufferer: 0.02%
Given a negative result, the person is a non-sufferer: 99.47%
The person will be misclassified: 9.98%
