# Statistics Advance 4 Assignment

## Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

- **t-test** is used when the population standard deviation is unknown and/or the sample size is small (n < 30). Example: Comparing the mean test scores of two small classes.
- **z-test** is used when the population standard deviation is known and the sample size is large (n ≥ 30). Example: Testing if the average height of a large population differs from a known value.

## Q2: Differentiate between one-tailed and two-tailed tests.

- **One-tailed test:** Tests for a difference in one direction (greater than or less than).
- **Two-tailed test:** Tests for a difference in either direction (not equal to).

## Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

- **Type 1 Error (False Positive):** Rejecting a true null hypothesis. Example: Concluding a drug works when it actually doesn't.
- **Type 2 Error (False Negative):** Failing to reject a false null hypothesis. Example: Concluding a drug doesn't work when it actually does.

## Q4: Explain Bayes's theorem with an example.

Bayes's theorem describes the probability of an event based on prior knowledge of conditions related to the event.

P(A|B) = [P(B|A) * P(A)] / P(B)

**Example:** If 1% of people have a disease (A), and a test is 99% accurate (B), Bayes' theorem can be used to find the probability that a person who tests positive actually has the disease.

## Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

A **confidence interval** is a range of values that is likely to contain the population parameter with a certain level of confidence. Example: For a sample mean of 100, standard deviation 15, n=30, 95% CI = mean ± z*(std/sqrt(n)).

## Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

In [None]:
# Example: Disease test
# P(Disease) = 0.01, P(No Disease) = 0.99
# P(Positive|Disease) = 0.99, P(Positive|No Disease) = 0.05
P_D = 0.01
P_ND = 0.99
P_Pos_D = 0.99
P_Pos_ND = 0.05
P_Pos = P_Pos_D * P_D + P_Pos_ND * P_ND
P_D_Pos = (P_Pos_D * P_D) / P_Pos
print(f"Probability of having disease given positive test: {P_D_Pos:.4f}")

## Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.

In [None]:
import scipy.stats as stats
import numpy as np

mean = 50
std = 5
n = 30
confidence = 0.95
z = stats.norm.ppf(1 - (1-confidence)/2)
margin_error = z * (std / np.sqrt(n))
ci_lower = mean - margin_error
ci_upper = mean + margin_error
print(f"95% Confidence Interval: ({ci_lower:.2f}, {ci_upper:.2f})")

## Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

The **margin of error** is the range above and below the sample statistic in a confidence interval. A larger sample size decreases the margin of error, making the estimate more precise. Example: Surveying 1000 people gives a smaller margin of error than surveying 100 people.

## Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.

In [None]:
value = 75
mean = 70
std = 5
z = (value - mean) / std
print(f"z-score: {z}")
# Interpretation: The data point is {z} standard deviations above the mean.

## Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

In [None]:
import scipy.stats as stats

sample_mean = 6
population_mean = 0  # Null hypothesis: no weight loss
std_dev = 2.5
n = 50
alpha = 0.05
t_stat = (sample_mean - population_mean) / (std_dev / np.sqrt(n))
p_value = 1 - stats.t.cdf(t_stat, df=n-1)  # right-tailed test
print(f"t-statistic: {t_stat}, p-value: {p_value}")
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")

## Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

In [None]:
import scipy.stats as stats
import numpy as np

p_hat = 0.65
n = 500
confidence = 0.95
z = stats.norm.ppf(1 - (1-confidence)/2)
margin_error = z * np.sqrt((p_hat * (1 - p_hat)) / n)
ci_lower = p_hat - margin_error
ci_upper = p_hat + margin_error
print(f"95% Confidence Interval for Proportion: ({ci_lower:.3f}, {ci_upper:.3f})")

## Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

In [None]:
import numpy as np
import scipy.stats as stats

mean1, std1, n1 = 85, 6, 30
mean2, std2, n2 = 82, 5, 30
alpha = 0.01
se = np.sqrt((std1**2/n1) + (std2**2/n2))
t_stat = (mean1 - mean2) / se
df = min(n1, n2) - 1
p_value = 2 * (1 - stats.t.cdf(abs(t_stat), df=df))  # two-tailed test
print(f"t-statistic: {t_stat}, p-value: {p_value}")
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")

## Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.

In [None]:
import scipy.stats as stats
import numpy as np

sample_mean = 65
population_std = 8
n = 50
confidence = 0.90
z = stats.norm.ppf(1 - (1-confidence)/2)
margin_error = z * (population_std / np.sqrt(n))
ci_lower = sample_mean - margin_error
ci_upper = sample_mean + margin_error
print(f"90% Confidence Interval: ({ci_lower:.2f}, {ci_upper:.2f})")

## Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In [None]:
import scipy.stats as stats

sample_mean = 0.25
population_mean = 0.3  # Assume population mean reaction time is 0.3 seconds
std_dev = 0.05
n = 30
alpha = 0.10
t_stat = (sample_mean - population_mean) / (std_dev / np.sqrt(n))
p_value = stats.t.cdf(t_stat, df=n-1)  # left-tailed test
print(f"t-statistic: {t_stat}, p-value: {p_value}")
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")