<a href="https://colab.research.google.com/github/Fantompp/STA130-HW5/blob/main/STA130HW5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# STA130 HW5

### 2. The "Pre-lecture" video (above) suggested that the "standard error of the mean" could be used to create a confidence interval, but didn't describe exactly how to do this. How can we use the "standard error of the mean" to create a 95% confidence interval which "covers 95% of the bootstrapped sample means"? Explain this concisely in your own words.

The "standard error" of a statistic across many samples is calculated in the same way as the standard deviation of data.

Due to the central limit theorem, we can assume the distribution of the sample means will follow a normal distribution, and thus we can use the sample error of the mean in the same way we would use the standard distribution of a normal distribution.

Thus, the 95% confidence interval that "covers 95% of the bootstrapped sample means" would simply be the mean of the bootstrapped means, +- 2 standard errors of the mean.

### 4. The "Pre-lecture" video (above) mentioned that bootstrap confidence intervals could apply to other statistics of the sample, such as the "median". Work with a ChatBot to create code to produce a 95% bootstrap confidence interval for a population mean based on a sample that you have and comment the code to demonstrate how the code can be changed to produce a 95% bootstrap confidence interval for different population parameter (other than the population mean, such as the population median).


In [None]:
# Functions From Previous Homework

# Create a single boostrapped sample
def bootstrap_sample(data):
    # Sample with replacement the same size as the original data
    return data.sample(n=len(data), replace=True)

# Create multiple bootstrapped samples
def bootstrap_samples(data, n_samples=1000):
    samples = []
    for _ in range(n_samples):
        sample = bootstrap_sample(data)
        samples.append(sample)
    return samples

In [None]:
n_samples = 1000
bootstrapped_samples = bootstrap_samples(data, n_samples=n_samples)
# This code won't run, because we haven't defined a sample. However, to make it run, we'd simply need to set data to a sample

In [None]:
# Change the following line for different statistics
bootstrapped_means = [sample.mean() for sample in bootstrapped_samples]

x = bootstrapped_means

# Calculate the 95th percentiles for confidence intervals
percentiles = [2.5, 97.5]  # 95% confidence interval
lower_bound = np.percentile(x, percentiles[0])  # 2.5th percentile
upper_bound = np.percentile(x, percentiles[1])  # 97.5th percentile


The above code finds the 95% confidence interval for bootstrapped means (lower_bound and upper_bound).

To use the code for a different statistic, simply change `.mean()` to a different statistic, such as `.median()`.

No ChatGPT log is included, as the code is simply lightly editted from the previous homework.

# 8. Assignment

In [None]:
# Initialization
import pandas as pd
import numpy as np
import bokeh as bk
import bokeh.plotting as bkp
import bokeh.models as bkm
np.random.seed(0)

In [None]:
totalstudents = 80
correct = 49

In Fisher's original experiment, his colleague guessed correctly 8 out of 8 times, when presented with cups of tea with milk poured in before and after the tea. This inspired a statistical test to figure out the likelihood that his colleague had simply gotten lucky, rather than having any ability to tell whether or not the milk went in first.|

In our experiment, we posed a similar challenge to 80 students in a STA130 class. Each student was presented with a cup of tea, which either had the milk or the tea poured in first (with equal probability).

The intent and population are different, as we aim to capture the general ability of STA130 students, rather than for a single person. The experimental procedure is also a little different, as each student only has a single cup of tea. This is more accurate, as in Fisher's experiment, his colleague could guess the last cups based on his earlier answers (i.e. if he'd already found 4 cups with milk first, then it follows the last cups must be tea first).

In our test, 49 out of 80 students guessed correctly.

To see whether this suggests a better than random ability to guess whether milk came first, we will test our results against the null hypothesis.

In [None]:
# Creating 1000 random samples according to the null hypothesis
np.random.seed(1)
null_samples = []
test_count = 10000
for i in range(test_count):
  null_samples.append( np.random.binomial(n=1, p=0.5, size=totalstudents) ) # flipping 80 coins

## Hypothesis Testing
Formally, the null hypothesis is the following:

&nbsp; &nbsp;
H<sub>0</sub> : p = 0.5

This means that there is a 50% chance that a given student will guess correctly.

The alternative hypothesis (H<sub>1</sub> : p > 0.5) in this case is that there is a greater than 50% chance that a given student will guess correctly, meaning the student is more likely than not to guess correctly.

In [None]:
bk.io.output_notebook()

from bokeh.plotting import figure, show

x = [x.sum() for x in null_samples] # the number of correct guesses in each sample


p = figure(width=870, height=550, toolbar_location=None,
        title="Testing the Null Hypothesis Against the Observed Accuracy \n10000 Simulated Samples")

# Histogram
bins = np.linspace(24.5,59.5,36)
hist, edges = np.histogram(x, density=True, bins=bins)
colors = ["lightsalmon" if (edge > 48) else "skyblue" for edge in edges[:-1]]
p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
         fill_color=colors, line_color="white",
         legend_label="Simulated Samples (10000)")


# Add confidence intervals as vertical lines
p.line(x=[40,40], y=[0, max(hist)], line_color='black',
       line_width=2, legend_label='Null Hypothesis (p = 0.5)', line_dash='dashed')
p.line(x=[49,49], y=[0, max(hist)], line_color='red',
       line_width=2, legend_label='Observed Result (49 correct guesses)', line_dash='dashed')


# Add Label to red portion
label = bkm.Label(x=50.5, y=0.009,
                  text='>= 49 correct guesses')
p.add_layout(label)


p.y_range.start = 0

p.xaxis.axis_label = "Correct Guesses (out of 80)"
p.yaxis.axis_label = "Relative Frequency"

show(p)

In [None]:
tally = 0
for n in null_samples:
  if (n.sum() >= 49):
    tally += 1
p_value = tally / test_count
print("p-Value:", round(p_value,5))

p-Value: 0.0296


### Conclusion:

You can see above that the simulated p-value for our data is 0.0296.

This means that, when we simulated the null hypothesis 10000 times (see histogram), only 2.96% of those simulated samples guessed correctly 49 times or more.

If we consult the table for p-values, we'll see that 0.296 is in the 0.01 - 0.05 range, meaning it provides moderate evidence to reject the null hypothesis.

In this case, it means there's *some* evidence to suggest students are able to guess correctly more than half the time, but we can't draw any strong conclusions from our data.

### ChatGPT Logs

Link to Chatlog: https://chatgpt.com/share/670f3307-5994-800e-a0fa-70d431278416

Summary of chat by ChatGPT:

> Sure! Here’s a summary of our conversation:
>
> 1. **Hypothesis Testing with Coin Flips**:
   - You asked how to declare a null hypothesis for flipping a coin, which led to discussing the null hypothesis as \( H_0: p = 0.5 \) (the coin is fair).
   - We talked about generating random samples of 50/50 coin flips to test this hypothesis.
>
> 2. **Extending to Three Outcomes**:
   - You inquired about a null hypothesis with three possible outcomes. We discussed how to formulate it as \( H_0: p_1 = p_2 = p_3 = \frac{1}{3} \).
>
> 3. **Analyzing Coin Flip Data**:
   - We analyzed a scenario where you flipped a coin 80 times, obtaining 49 heads. We discussed the implications of this result and how to use a binomial test to determine if it provides evidence against the null hypothesis of a fair coin.
>
> 4. **P-Value Interpretation**:
   - I explained that a p-value of around 0.045 would suggest some evidence against the null hypothesis, particularly in a one-tailed test, but not strong enough to confidently assert the coin is biased in a two-tailed test.
>
> 5. **Bar Graph from Histogram**:
   - You asked how to turn a histogram into a bar graph. I provided steps for calculating histogram data, converting it to bar graph data (centers and counts), and plotting it using both Matplotlib and Bokeh.
>
> 6. **Shifting a Histogram**:
   - We discussed how to shift a histogram visually or through data manipulation, demonstrating methods to shift the underlying data or adjust the bin edges when plotting in both Matplotlib and Bokeh.
>
> If you have any more questions or need further clarification on any topic, feel free to ask!

