Q1
A statistically describable hypothesis is typically one that makes a numeric/quantitative claim, and one that can describe the distribution of the world under the hypothesis. Ones that aren't testable are often subjective, or unabstractable to a data distribution.
The main criterion for a "good" null hypothesis is one which makes a claim about a numeric value (statistic) pertaining to a data distribution.
Null vs. Alternative: A null hypotheis is made in the context of trying to disprove (reject) a claim, and the alternative is constructed to be the "anything else" case, where we are trying to conclude the A.H. by throwing out the null.

Q2
Hypothesis testing is making a claim about a parameter of the population, namely, mu, the population average. The claim is of the form mu = mu_0, where mu_0 is what the null hypothesis assumes the population average is. We then simulate taking many samples from a data distribution (such that mu=mu_0), and we compare that sampling distribution (the distribution of sample means, x_i) to our original sample mean (x_hat). In this sense, we deduce how realistic that should the null hypothesis be true, our original sample statistic could be achieved. If unrealistic, we can "throw out" the null hypothesis, and make a conclusion (inference) about the population, namely, that mu =/= mu_0.

Q3 "Imagining a world where the null hypothesis is true" Is a Layman's way of saying "Assume the population follows the null hypothesis". Thereby we can simulate, should H_0 be true, how likely our original sample was. If we face a "ridiculousness" i.e. strong improbability, we can conclude that H_0 is false, since it contradicts itself.

Q4 When the p-value ends up being very small, what this means is that in the world of H_0, drawing a sample like ours (or even less probable) is extremely unlikely. Since we choose to believe we live in a more probable universe, we can throw out the notion that we live in "H_0 world"

In [60]:
#Q5
import numpy as np

# Given data
n_couples = 124   # Total number of couples
observed_right_tilt = 80  # Number of couples who tilted to the right
prob_right = 0.5   # Under the null hypothesis, probability of tilting right is 50%

# Simulation parameters
n_simulations = 100000  # Number of simulations to estimate the p-value

# Simulate head tilts under the null hypothesis using binomial distribution
simulated_tilts = np.random.binomial(n_couples, prob_right, size=n_simulations)

# Calculate p-value: proportion of simulations where the result is as extreme as or more extreme than observed
p_value = np.sum(simulated_tilts >= observed_right_tilt) / n_simulations

# Output the result
print(f"Simulated p-value: {p_value:.5f}")


Simulated p-value: 0.00068


Q5 Since the simulated p-value is less than 0.001, we have received very strong evidence against the null hypothesis, and we can throw out the idea that there is no head-tilting bias amongst the human kissing population.

Q6 If the p value equals 0, that is, in the world of H_0 being true, there is mathematically no chance that we receive our original sample, we can say that H_0 is false, since assuming it's true leads to an absolute logical contradiction. This is rare for a good choice of H_0, however, and indicates that our intuition in choosing H_0 may have been misplaced.

Without video evidence, there is no way to prove Fido's guilt, since there are infinitely many explanations for how an innocent Fido could (possibly) have ended up like he did.

To "definitively prove" one way or another, the p-value must equal 0 or 1. Otherwise, based on the assumption of the null hypothesis, both ways are technically possible. Instead of focussing on proof, p-values help us in terms of making "most likely" statements, and in describing what "probably" happened.

In [83]:
#Q7
import numpy as np
import pandas as pd

# Example patient data
patient_data = pd.DataFrame({
    "PatientID": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "Age": [45, 34, 29, 52, 37, 41, 33, 48, 26, 39],
    "Gender": ["M", "F", "M", "F", "M", "F", "M", "F", "M", "F"],
    "InitialHealthScore": [84, 78, 83, 81, 81, 80, 79, 85, 76, 83],
    "FinalHealthScore": [86, 86, 80, 86, 84, 86, 86, 82, 83, 84]
})
patient_data['HealthScoreChange'] = patient_data.FinalHealthScore - patient_data.InitialHealthScore

# Simulate under H0
np.random.seed(5)  # For reproducibility
number_of_simulations = 10000
n_size = len(patient_data)

IncreaseProportionSimulations_underH0random = np.zeros(number_of_simulations)

# Generate random improvements under H0 (null hypothesis: no treatment effect)
for i in range(number_of_simulations):
    random_improvement = np.random.choice([0,1], size=len(patient_data), replace=True)
    IncreaseProportionSimulations_underH0random[i] = random_improvement.mean()

# Observed statistic (proportion of improvements)
observed_statistic = (patient_data.HealthScoreChange > 0).mean()

# Right-tailed test: Are the simulated statistics greater than or equal to the observed statistic?
SimStats_as_or_more_extreme_than_ObsStat = IncreaseProportionSimulations_underH0random >= observed_statistic

# Calculate p-value for right-tailed test
p_value = SimStats_as_or_more_extreme_than_ObsStat.sum() / number_of_simulations
print(f"Right-tailed test p-value: {p_value:.5f}")


Right-tailed test p-value: 0.05290


Q7 The code above, generated with ChatGPT, modifying the code from Demo II of last tut, is a one-tailed hypothesis test (right-sided), creating a p-value which measures the odds that the proportion of positive to negative improvements is greater than the observed statistic, rather than "more extreme".
This code generates a lower p-value than the original code, since it measures less of the area under the sampling distribution curve. This is true in general when comparing two-tailed to one-tailed simulations.

Q8 As the skeptical representative of Ronald Fisher, I do not believe that the average person can tell the difference between tead milk and milked tea. I want to see if this belief is consistent with my sample of 80 STA130 students, 49 of whom correctly guessed the type of beverage they drank. To do this, I will assume my formal null hypothesis: That r (the proportion of the population (humans who drink a cup of tea and guess correctly about the milk-adding) who correctly guess the milk-situation is 0.5. So my hypothesis (r=0.5) says that on average, people cannot beat the odds on milk-guessing. The Alternative hypothesis is that r=/=0.5, meaning that for the average person, the order of adding milk has an effect on their guess.

To test the null hypothesis, we will create a sampling distribution under the assumption that it's true. Luckily, our null hypothesis induces a binomial distribution, so we can just work with the pure mathematical object. The samples will be the same size as the original (80) and for each sample, we calculate the proportion or right guesses to wrong guesses. Afterwards, we will compare the observed statistic (the ratio from the sample of STA students) to the distribution, to check whether or not we can throw out the null-hypothesis reasonably.

In [94]:
#Q8: Here is the code:
import numpy as np
import scipy.stats as stats

# Set the seed for reproducibility
np.random.seed(1)

# Given data
n = 80  # Total number of students
x = 49  # Number of correct guesses
p_null = 0.5  # Null hypothesis: proportion of correct guesses is 0.5

# Calculate the observed proportion of correct guesses
observed_proportion = x / n

# Run a two-tailed binomial test using scipy.stats.binom_test
# We use the two-tailed test by comparing against 0.5 and checking both extremes
p_value = stats.binomtest(x, n, p_null, alternative='two-sided')
result = p_value.pvalue
# Output the results
print(f"Observed proportion: {observed_proportion:.3f}")
print(f"P-value (two-tailed test): {result:.5f}")


Observed proportion: 0.613
P-value (two-tailed test): 0.05666


Q9:
No

ChatBot Summary:
Link to conversation: https://chatgpt.com/share/670f3190-a944-8010-aaa9-72b93fc027df
### Hypothesis Testing and Binomial Test

1. **Context**: 
   You were tasked with analyzing an experiment in which 80 students were asked to determine if milk or tea was poured first in a cup. The null hypothesis was that students were randomly guessing, so the proportion of correct guesses would be 0.5.

2. **Null Hypothesis**: 
   The null hypothesis was that the proportion of correct guesses is 0.5 (i.e., students are guessing randomly). The alternative hypothesis was that the proportion is not 0.5 (i.e., students are not guessing randomly).

3. **Hypothesis Test**:
   We used a **binomial test** to evaluate whether the observed proportion of correct guesses (49 out of 80) is significantly different from the expected 0.5 under the null hypothesis.
   
4. **Key Points**:
   - **Observed Proportion**: Calculated as \( \frac{49}{80} = 0.6125 \).
   - **Binomial Test**: We used `stats.binom_test` from **SciPy** to test whether the proportion of correct guesses was significantly different from 0.5. The test is based on the binomial distribution, and it provides the exact p-value.
   
5. **Code**:
   The code used to perform the test is as follows:
   ```python
   import numpy as np
   import scipy.stats as stats
   
   # Set the seed for reproducibility
   np.random.seed(1)
   
   # Given data
   n = 80  # Total number of students
   x = 49  # Number of correct guesses
   p_null = 0.5  # Null hypothesis: proportion of correct guesses is 0.5
   
   # Calculate the observed proportion of correct guesses
   observed_proportion = x / n
   
   # Run a two-tailed binomial test using scipy.stats.binom_test
   result = stats.binom_test(x, n, p_null, alternative='two-sided')
   
   # Output the results
   print(f"Observed proportion: {observed_proportion:.3f}")
   print(f"P-value (two-tailed test): {result:.5f}")
   ```

6. **P-value Interpretation**:
   The `binom_test` directly calculates the p-value based on the binomial distribution without needing simulations or iterations. If the p-value is less than 0.05, we reject the null hypothesis and conclude that the students' guesses are significantly different from random guessing.

7. **Key Concepts**:
   - **Null Hypothesis**: The hypothesis that the proportion of correct guesses is 0.5 (random guessing).
   - **Alternative Hypothesis**: The hypothesis that the proportion is different from 0.5.
   - **p-value**: The probability of observing the data (or something more extreme) under the null hypothesis. A smaller p-value indicates stronger evidence against the null hypothesis.

8. **Conclusion**:
   The test provides a p-value, which helps us determine whether there is enough evidence to reject the null hypothesis. If the p-value is smaller than the significance level (typically 0.05), we would conclude that the students are not just guessing randomly.
