Q1: Estimation Statistics involves estimating population parameters based on sample statistics. Point estimate is a single value that serves as the best guess for the population parameter, while interval estimate provides a range of values within which the population parameter is likely to lie.

Q2: Hypothesis testing is a statistical method used to make inferences about population parameters based on sample data. It involves formulating null and alternative hypotheses, collecting sample data, and making a decision about the null hypothesis using statistical tests.

Q3: Type 1 error occurs when the null hypothesis is rejected when it is actually true. Type 2 error occurs when the null hypothesis is not rejected when it is actually false. An example of Type 1 error is convicting an innocent person (rejecting the null hypothesis of innocence) based on insufficient evidence. An example of Type 2 error is failing to convict a guilty person (failing to reject the null hypothesis of innocence) due to lack of evidence.

Q4: Bayes's theorem is a fundamental theorem in probability theory that describes the probability of an event, based on prior knowledge of conditions that might be related to the event. It is used to update the probability of an event occurring given new evidence. An example is predicting the probability of a disease given the results of a diagnostic test.

Q5: A confidence interval is a range of values that is likely to contain the true population parameter with a certain level of confidence. It is calculated using sample statistics and measures the precision of an estimate. For example, a 95% confidence interval for the population mean salary of employees in a company might be $40,000 to $50,000, indicating that we are 95% confident that the true mean salary falls within this range.

Q6: Bayes' Theorem is a fundamental theorem in probability theory that describes the probability of an event, based on prior knowledge of conditions that might be related to the event. It is used to update the probability of an event occurring given new evidence. The formula for Bayes' Theorem is: P(A|B) = (P(B|A) * P(A)) / P(B)

Q7: The z-score is a measure of how many standard deviations a data point is from the mean of a distribution. It indicates whether a data point is above or below the mean, and by how much. The z-score is important because it allows us to compare data points from different distributions and determine their relative positions.

Q8: The Central Limit Theorem states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. This theorem is important because it allows us to make inferences about population parameters based on sample statistics, even when the population distribution is unknown or non-normal.

Q9: A confidence interval is a range of values that is likely to contain the true population parameter with a certain level of confidence. It is calculated using sample statistics and measures the precision of an estimate. The confidence interval provides information about the reliability and precision of the sample estimate, and is used to make inferences about the population parameter.

In [1]:
import numpy as np
import scipy.stats as stats

# Question 10
# Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

# Given data
sample_mean = 6
sample_std_dev = 2.5
n = 50
pop_mean = 0  # Null hypothesis: The drug has no effect (mean difference is zero)

# Calculate t-score
t_score = (sample_mean - pop_mean) / (sample_std_dev / np.sqrt(n))

# Calculate p-value
p_value = stats.t.sf(np.abs(t_score), n - 1) * 2  # two-tailed test

# Compare p-value with significance level (0.05)
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. The drug is significantly effective.")
else:
    print("Fail to reject the null hypothesis. The drug is not significantly effective.")


Reject the null hypothesis. The drug is significantly effective.


In [2]:
# Question 11
# Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

# Given data
n = 500
p_hat = 0.65  # Sample proportion

# Calculate standard error
standard_error = np.sqrt((p_hat * (1 - p_hat)) / n)

# Calculate margin of error
margin_of_error = 1.96 * standard_error  # Z-score for 95% confidence level

# Calculate confidence interval
lower_bound = p_hat - margin_of_error
upper_bound = p_hat + margin_of_error

print(f"95% Confidence Interval: ({lower_bound:.4f}, {upper_bound:.4f})")

95% Confidence Interval: (0.6082, 0.6918)


In [3]:
# Question 12
# Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

# Given data
mean_A = 85
std_dev_A = 6
n_A = 30
mean_B = 82
std_dev_B = 5
n_B = 30

# Calculate pooled standard deviation
pooled_std_dev = np.sqrt(((n_A - 1) * std_dev_A ** 2 + (n_B - 1) * std_dev_B ** 2) / (n_A + n_B - 2))

# Calculate t-score
t_score = (mean_A - mean_B) / (pooled_std_dev * np.sqrt(1 / n_A + 1 / n_B))

# Compare t-score with critical value
alpha = 0.01
critical_value = stats.t.ppf(1 - alpha / 2, n_A + n_B - 2)  # two-tailed test
if np.abs(t_score) > critical_value:
    print("Reject the null hypothesis. There is a significant difference in student performance between the two teaching methods.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference in student performance between the two teaching methods.")

Fail to reject the null hypothesis. There is no significant difference in student performance between the two teaching methods.


In [4]:
# Question 13
# Calculate the 90% confidence interval for the true population mean.

# Given data
pop_mean = 60
pop_std_dev = 8
n = 50
alpha = 0.10  # 90% confidence level

# Calculate standard error
standard_error = pop_std_dev / np.sqrt(n)

# Calculate margin of error
margin_of_error = stats.norm.ppf(1 - alpha / 2) * standard_error

# Calculate confidence interval
lower_bound = pop_mean - margin_of_error
upper_bound = pop_mean + margin_of_error

print(f"90% Confidence Interval: ({lower_bound:.4f}, {upper_bound:.4f})")


90% Confidence Interval: (58.1391, 61.8609)


In [5]:
# Question 14
# Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

# Given data
sample_mean = 0.25
sample_std_dev = 0.05
n = 30
pop_mean = 0  # Null hypothesis: Caffeine has no effect on reaction time

# Calculate t-score
t_score = (sample_mean - pop_mean) / (sample_std_dev / np.sqrt(n))

# Calculate p-value
p_value = stats.t.sf(np.abs(t_score), n - 1)  # one-tailed test

# Compare p-value with significance level (0.10)
alpha = 0.10
if p_value < alpha:
    print("Reject the null hypothesis. Caffeine has a significant effect on reaction time.")
else:
    print("Fail to reject the null hypothesis. Caffeine does not have a significant effect on reaction time.")

Reject the null hypothesis. Caffeine has a significant effect on reaction time.
