In [1]:
# Question 1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.
# Answer:
# - A t-test is used when the sample size is small (typically n < 30) or when the population standard deviation is unknown.
# - A z-test is used when the sample size is large (n >= 30) and the population standard deviation is known.
# - Example for t-test: Comparing the means of two small groups, such as testing the effectiveness of a new drug with a small sample size.
# - Example for z-test: Comparing the means of two large groups, such as comparing the average heights of men and women in a large population.


In [2]:
# Question 2: Differentiate between one-tailed and two-tailed tests.
# Answer:
# - A one-tailed test is used when we are interested in testing if a sample mean is significantly greater than or less than a population mean.
# - A two-tailed test is used when we are interested in testing if a sample mean is significantly different (either higher or lower) from a population mean.
# - Example of one-tailed test: Testing if a new teaching method leads to higher scores than the traditional method.
# - Example of two-tailed test: Testing if a new drug has a different effect (either positive or negative) than a placebo.


In [3]:
# Question 3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.
# Answer:
# - Type 1 Error (False Positive): Rejecting the null hypothesis when it is actually true. This is also known as the significance level (alpha).
#   - Example: Concluding that a new drug is effective when it actually is not.
# - Type 2 Error (False Negative): Failing to reject the null hypothesis when it is actually false. This is related to the power of the test (1 - beta).
#   - Example: Concluding that a new drug is not effective when it actually is.


In [4]:
# Question 4: Explain Bayes's theorem with an example.
# Answer:
# - Bayes's Theorem describes the probability of an event based on prior knowledge of conditions that might be related to the event.
# - Formula: P(A|B) = [P(B|A) * P(A)] / P(B)
# - Example: Suppose you want to calculate the probability that a person has a disease given that they tested positive. You would use Bayes's Theorem to update the probability based on the test's accuracy and the prior probability of having the disease.


In [5]:
# Question 5: What is a confidence interval? How to calculate the confidence interval, explain with an example.
# Answer:
# - A confidence interval is a range of values that is likely to contain the true population parameter with a certain level of confidence (e.g., 95%).
# - Formula for confidence interval for a mean: CI = x̄ ± Z * (σ/√n)
# - Example: If the sample mean is 50, the standard deviation is 5, and the sample size is 30, the 95% confidence interval would be calculated as follows.

import scipy.stats as stats
import numpy as np

mean = 50
std_dev = 5
n = 30
confidence_level = 0.95

# Calculate the z-score for the desired confidence level
z_score = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate the margin of error
margin_of_error = z_score * (std_dev / np.sqrt(n))

# Calculate the confidence interval
confidence_interval = (mean - margin_of_error, mean + margin_of_error)
confidence_interval


(48.210805856282846, 51.789194143717154)

In [6]:
# Question 6: Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.
# Answer:
# Suppose there is a 1% chance that a person has a certain disease. A test for the disease is 99% accurate (both sensitivity and specificity). 
# If a person tests positive, what is the probability that they actually have the disease?

# Given:
# P(Disease) = 0.01 (Prior probability of having the disease)
# P(No Disease) = 0.99
# P(Test Positive | Disease) = 0.99 (Sensitivity)
# P(Test Positive | No Disease) = 0.01 (1 - Specificity)

# Applying Bayes' Theorem:
P_Disease = 0.01
P_No_Disease = 0.99
P_Positive_given_Disease = 0.99
P_Positive_given_No_Disease = 0.01

# P(Disease | Test Positive) = [P(Test Positive | Disease) * P(Disease)] / [P(Test Positive | Disease) * P(Disease) + P(Test Positive | No Disease) * P(No Disease)]
P_Disease_given_Positive = (P_Positive_given_Disease * P_Disease) / (
    P_Positive_given_Disease * P_Disease + P_Positive_given_No_Disease * P_No_Disease
)
P_Disease_given_Positive



0.5

In [7]:
# Question 7: Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.
# Answer:
# This is similar to Question 5. We already calculated the confidence interval, which can be interpreted as:
# "We are 95% confident that the true population mean lies within this interval."


In [8]:
# Question 8: What is the margin of error in a confidence interval? How does sample size affect the margin of error?
# Answer:
# - The margin of error is the range that defines the confidence interval around the sample mean.
# - It decreases with an increase in sample size because the standard error (σ/√n) decreases.
# - Example: A larger sample size in a survey will reduce the margin of error, leading to a narrower confidence interval.


In [9]:
# Question 9: Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.
# Answer:
# Formula: Z = (X - μ) / σ
x = 75
population_mean = 70
population_std_dev = 5

z_score = (x - population_mean) / population_std_dev
z_score

# Interpretation: A z-score of 1 indicates that the data point is 1 standard deviation above the mean.



1.0

In [10]:
# Question 10: In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.
# Answer:
# Null Hypothesis (H0): The mean weight loss is not significantly different from 0.
# Alternative Hypothesis (H1): The mean weight loss is significantly different from 0.

# Use a one-sample t-test:
sample_mean = 6
sample_std_dev = 2.5
n = 50
null_hypothesis_mean = 0

# Calculate the t-statistic
t_statistic = (sample_mean - null_hypothesis_mean) / (sample_std_dev / np.sqrt(n))

# Calculate the p-value
p_value = stats.t.sf(np.abs(t_statistic), df=n-1) * 2  # Two-tailed test

t_statistic, p_value

# If p-value < 0.05, reject the null hypothesis and conclude that the drug is significantly effective.



(16.970562748477143, 3.7168840835270203e-22)

In [11]:
# Question 11: In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.
# Answer:
n = 500
p_hat = 0.65
z_score = stats.norm.ppf(0.975)  # 95% confidence

# Calculate standard error
std_error = np.sqrt((p_hat * (1 - p_hat)) / n)

# Calculate confidence interval
confidence_interval = (p_hat - z_score * std_error, p_hat + z_score * std_error)
confidence_interval



(0.6081925393809212, 0.6918074606190788)

In [12]:
# Question 12: A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.
# Answer:
# Null Hypothesis (H0): There is no significant difference between the means of the two samples.
# Alternative Hypothesis (H1): There is a significant difference between the means of the two samples.

mean_A = 85
std_dev_A = 6
n_A = 30

mean_B = 82
std_dev_B = 5
n_B = 30

# Calculate the t-statistic for independent samples
t_statistic, p_value = stats.ttest_ind_from_stats(mean1=mean_A, std1=std_dev_A, nobs1=n_A, mean2=mean_B, std2=std_dev_B, nobs2=n_B)

t_statistic, p_value

# If p-value < 0.01, reject the null hypothesis and conclude that there is a significant difference between the two teaching methods.



(2.1038606199548298, 0.03973697161571063)

In [13]:
# Question 13: A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.
# Answer:
population_mean = 60
population_std_dev = 8
sample_size = 50
sample_mean = 65

# Calculate the z-score for 90% confidence level
z_score = stats.norm.ppf(0.95)

# Calculate the margin of error
margin_of_error = z_score * (population_std_dev / np.sqrt(sample_size))

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)
confidence_interval


(63.13906055411732, 66.86093944588268)

In [14]:
# Question 14: In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.
# Answer:
# Null Hypothesis (H0): The average reaction time is not significantly different from a specified mean (e.g., 0.3 seconds).
# Alternative Hypothesis (H1): The average reaction time is significantly different from 0.3 seconds.

# Use a one-sample t-test:
sample_mean = 0.25
sample_std_dev = 0.05
n = 30
null_hypothesis_mean = 0.3

# Calculate the t-statistic
t_statistic = (sample_mean - null_hypothesis_mean) / (sample_std_dev / np.sqrt(n))

# Calculate the p-value
p_value = stats.t.sf(np.abs(t_statistic), df=n-1) * 2  # Two-tailed test

t_statistic, p_value

# If p-value < 0.1, reject the null hypothesis and conclude that caffeine has a significant effect on reaction time.

(-5.47722557505166, 6.739145346941606e-06)