<a href="https://colab.research.google.com/github/annakasper1/QNC/blob/main/Frequentist_Versus_Bayesian_Approaches_Exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# v2

# EXERCISE 1: If someone gets a positive test, is it "statistically significant"
# at the p<0.05 level? Why or why not?

# From my understanding, if someone gets a positive test, whether or not it
# is statistically significant using the frequentist approach relies on
# the sample size. In the frequentist approach, for the positive test
# to be considered statistically significant, the probability that this positive
# test occurs given the null hypothesis of no infection needs to be less than
# 0.05. In stats, this would look like:
# p(data|null hypothesis) < 0.05
# The information provided would either provide evidence against the null
# hypothesis, or indicate a false positive, which would be uninformative.
# In this scenario, because the sample size given (n=1000) is relatively large
# in comaprison to the number of events, this result is less likely to be a
# false positive. The larger the sample size, the more informative this
# approach can be about the statistical significance of
# the positive test.

# ChatGPT code prompt for this exercise (with a couple of alterations):
# Python code for directly computing P(event | H0) given that there is 1 event
# out of 1000 samples

from scipy.stats import binom

# Parameters
n = 1000   # number of samples
k = 1      # observed number of events
p0 = 0.95 # null hypothesis probability of success (since false positive rate is
# 0.05, I set this to 1 - false positive rate).

# Probability of exactly k events under H0
p_exact = binom.pmf(k, n, p0)

# Probability of k or fewer events (cumulative)
p_cumulative = binom.cdf(k, n, p0)

print(f"P(X = {k} | H0) = {p_cumulative:.6f}")

# Result: The probability that 1 event occurs given the null hypothesis is
# essentially 0.



# EXERCISE 2: What is the probability that if someone gets a positive test,
# that person is infected?

# Using a Bayesian approach, I need to see how changing the probability of a
# hypothesis changes the proportion infected given a single positive test.


# ChatGPT code prompt for this exercise:
# python code for a bayesian approach to finding the proportion infected given
# a positive test. event number = 1, sample number = 1000. The calculation
# should be the probability of the hypothesis given the data equals the
 # (probability of the data given the hypothesis times the probability of the
 # hypothesis) divided by the probability of the data. Give me answers for
 # the proportion infected given a positive test when the probabilites of the
 # hypothesis range from 0 to 1 in steps of 0.1.

import numpy as np
from scipy.stats import binom

# Data
n = 1000  # number of samples
k = 1     # observed positives

# Hypotheses (possible infection proportions)
p_hypotheses = np.arange(0, 1.01, 0.1)

# Prior (assume uniform = 1 for simplicity)
prior = np.ones_like(p_hypotheses)

# Likelihood: P(data | hypothesis)
likelihood = binom.pmf(k, n, p_hypotheses)

# Unnormalized posterior
unnormalized_posterior = likelihood * prior

# Normalize to get posterior probabilities
posterior = unnormalized_posterior / unnormalized_posterior.sum()

# Print results
for p, post in zip(p_hypotheses, posterior):
    print(f"Hypothesis p={p:.1f} -> Posterior probability={post:.6f}")

# results output:
# Hypothesis p=0.0 -> Posterior probability=0.000000
# Hypothesis p=0.1 -> Posterior probability=0.000000
# Hypothesis p=0.2 -> Posterior probability=0.000000
# Hypothesis p=0.3 -> Posterior probability=0.000000
# Hypothesis p=0.4 -> Posterior probability=0.000000
# Hypothesis p=0.5 -> Posterior probability=0.000000
# Hypothesis p=0.6 -> Posterior probability=0.000000
# Hypothesis p=0.7 -> Posterior probability=0.000000
# Hypothesis p=0.8 -> Posterior probability=0.000000
# Hypothesis p=0.9 -> Posterior probability=0.000000
# Hypothesis p=1.0 -> Posterior probability=0.000000

# note, I didn't have time to double check this code, but upon first glance,
# it seems ChatGPT may have missed including the probability of data
# in the denominator (as seen in the exercise).



P(X = 1 | H0) = 0.000000
Hypothesis p=0.0 -> Posterior probability=0.000000
Hypothesis p=0.1 -> Posterior probability=1.000000
Hypothesis p=0.2 -> Posterior probability=0.000000
Hypothesis p=0.3 -> Posterior probability=0.000000
Hypothesis p=0.4 -> Posterior probability=0.000000
Hypothesis p=0.5 -> Posterior probability=0.000000
Hypothesis p=0.6 -> Posterior probability=0.000000
Hypothesis p=0.7 -> Posterior probability=0.000000
Hypothesis p=0.8 -> Posterior probability=0.000000
Hypothesis p=0.9 -> Posterior probability=0.000000
Hypothesis p=1.0 -> Posterior probability=0.000000
