# One-Sample z-test - Lab

## Introduction
In this lab, you'll perform a few quick tests to help you better understand how hypothesis testing works.

## Objectives
You will be able to:

* Explain use cases for a 1-sample z-test
* Set up null and alternative hypotheses
* Use the z-table and scipy methods to acquire the p value for a given z-score
* Calculate and interpret p-value for significance of results

## Exercise 1
A fast-food chain claims that the mean time to order food at their restaurants is 60 seconds, with a standard deviation of 30 seconds. You decide to put this claim to the test and go to one of the restaurants to observe actual waiting times. You take a sample of 36 customers and find that the mean order time was 75 seconds. Does this finding provide enough evidence to contradict the fast food chain's claim of fast service?

Follow the 5 steps shown in previous lesson and use $\alpha$ = 0.05. 

In [6]:
# State your null and alternative hypotheses
# Ha : the time to order food is bigger than 60 seconds
# Ho : the time to order food is less than or equal to 60 sec

In [1]:
# Your solution here
from scipy.stats import norm

# Given values
x_bar = 75  # sample mean
mu = 60  # population mean
sigma = 30  # population standard deviation
n = 36  # sample size

# Calculate the z-score
z = (x_bar - mu) / (sigma / (n ** 0.5))

# Determine the p-value (two-tailed test)
p_value = 2 * (1 - norm.cdf(abs(z)))

z, p_value

# (p = 0.0013498980316301035, z = 3.0)

(3.0, 0.002699796063260207)

In [2]:
# Interpret the results in terms of the p-value

"""
- A p-value of 0.00135 means there is a 0.135% chance of observing a sample mean of 75 seconds or more extreme (further from the population mean of 60 seconds), given that the true population mean is indeed 60 seconds.
- Since the p-value (0.00135) is significantly less than 0.05, we reject the null hypothesis. This indicates that there is strong evidence against the null hypothesis, suggesting that the true mean order time is not 60 seconds as claimed by the fast-food chain.


The very low p-value suggests that the observed difference in mean order time (from 60 seconds to 75 seconds) is statistically significant and not due to random chance. Therefore, we have sufficient evidence to conclude that the fast-food chain's claim of a 60-second average order time is likely not accurate based on the sample data collected."""


"\n- A p-value of 0.00135 means there is a 0.135% chance of observing a sample mean of 75 seconds or more extreme (further from the population mean of 60 seconds), given that the true population mean is indeed 60 seconds.\n- Since the p-value (0.00135) is significantly less than 0.05, we reject the null hypothesis. This indicates that there is strong evidence against the null hypothesis, suggesting that the true mean order time is not 60 seconds as claimed by the fast-food chain.\n\n\nThe very low p-value suggests that the observed difference in mean order time (from 60 seconds to 75 seconds) is statistically significant and not due to random chance. Therefore, we have sufficient evidence to conclude that the fast-food chain's claim of a 60-second average order time is likely not accurate based on the sample data collected."

## Exercise 2

25 students complete a preparation program for taking the SAT test.  Here are the SAT scores from the 25 students who completed the program:

``
434 694 457 534 720 400 484 478 610 641 425 636 454 
514 563 370 499 640 501 625 612 471 598 509 531
``

We know that the population average for SAT scores is 500 with a standard deviation of 100.

Are our 25 students’ SAT scores significantly higher than the population's mean score? 

*Note that the SAT preparation program claims that it will increase (and not decrease) the SAT score.  So, you can conduct a one-directional test. (alpha = .05).*

In [None]:
# State your hypotheses 


"""- **Null Hypothesis (H0):** The mean SAT score of students who completed the program is 500 or less. 
This represents no improvement or a decrease in scores due to the program.
  


- **Alternative Hypothesis (H1):** The mean SAT score of students who completed the program is greater than 500.
 This represents an improvement in scores due to the program.
  
  """




In [3]:
# Give your solution here 
from scipy.stats import norm
import numpy as np

# Sample SAT scores
scores = np.array([434, 694, 457, 534, 720, 400, 484, 478, 610, 641, 425, 636, 454, 514, 563, 370, 499, 640, 501, 625, 612, 471, 598, 509, 531])

# Given values
mu = 500  # Population mean
sigma = 100  # Population standard deviation
n = 25  # Sample size
alpha = 0.05  # Significance level

# Calculate the sample mean
x_bar = np.mean(scores)

# Calculate the z-score
z = (x_bar - mu) / (sigma / np.sqrt(n))

# Determine the p-value (one-tailed test)
p_value = 1 - norm.cdf(z)

z, p_value


# p = 0.03593031911292577, z = 1.8

(1.8, 0.03593031911292577)

In [20]:
# Interpret the results in terms of the p-value

"""
- **P-value:** The p-value of 0.03593 is the probability of observing a sample mean as extreme as, or more extreme than, the one observed if the null hypothesis were true.

Since the p-value (0.03593) is less than the significance level (\(\alpha = 0.05\)), we reject the null hypothesis. This suggests that there is sufficient evidence to support the claim that the SAT preparation program leads to an increase in SAT scores.

**Conclusion:**

The results, with a p-value of 0.03593, indicate that the mean SAT score of the students who completed the preparation program is significantly higher than the population mean of 500 at the 5% significance level. This supports the effectiveness of the SAT preparation program in improving students' scores."""


## Summary

In this lesson, you conducted a couple of simple tests comparing sample and population means, in an attempt to reject our null hypotheses. This provides you with a strong foundation to move ahead with more advanced tests and approaches later on.