Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would
use each type of test.

A t-test and a z-test are both statistical tests used to make inferences about population parameters based on sample data, but they are used under different circumstances.

A z-test is used when the population standard deviation is known and the sample size is large, typically more than 30. For example, a researcher wants to test whether the average height of all men in the United States is 70 inches. They take a random sample of 100 men and find that the average height is 71 inches with a standard deviation of 2 inches. Since the sample size is large, a z-test can be used to determine whether the difference between the sample mean and the population mean is statistically significant.

A t-test is used when the population standard deviation is unknown and the sample size is small, typically less than 30. For example, a researcher wants to test whether a new medication is effective in reducing cholesterol levels. They randomly select 20 individuals and give them the medication for a month. They measure their cholesterol levels before and after the treatment and want to determine if there is a significant difference. Since the sample size is small and the population standard deviation is unknown, a t-test can be used to make this determination.

Q2: Differentiate between one-tailed and two-tailed tests.

In hypothesis testing, a one-tailed test is a statistical test in which the alternative hypothesis specifies the direction of the difference between the sample mean and the hypothesized population mean. In other words, the alternative hypothesis only considers one direction of the difference. For example, a one-tailed test could be used to test whether a new drug is better than an existing drug, with the alternative hypothesis being that the new drug has a higher efficacy than the existing drug. The null hypothesis, in this case, would be that there is no difference or the existing drug is better.

On the other hand, a two-tailed test is a statistical test in which the alternative hypothesis considers both directions of the difference between the sample mean and the hypothesized population mean. For example, a two-tailed test could be used to test whether a coin is fair or not, with the alternative hypothesis being that the coin is not fair, either it is biased towards heads or tails. The null hypothesis, in this case, would be that the coin is fair.



Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for
each type of error.

In hypothesis testing, a Type 1 error occurs when the null hypothesis is rejected when it is actually true. This is also known as a false positive. The probability of making a Type 1 error is denoted by alpha (α) and is typically set at 0.05, which means there is a 5% chance of making a Type 1 error. Type 1 errors can occur due to various reasons such as a small sample size, improper selection of the significance level, and using an inappropriate statistical test.

A Type 2 error, on the other hand, occurs when the null hypothesis is not rejected when it is actually false. This is also known as a false negative. The probability of making a Type 2 error is denoted by beta (β). Type 2 errors can occur due to various reasons such as a large sample size, improper selection of the significance level, and using an inappropriate statistical test.

Q4: Explain Bayes's theorem with an example.

The theorem states that the probability of a hypothesis (H) given the evidence (E) is proportional to the probability of the evidence given the hypothesis multiplied by the prior probability of the hypothesis. This can be written as:

P(H|E) = P(E|H) * P(H) / P(E)

where P(H|E) is the posterior probability of the hypothesis given the evidence, P(E|H) is the likelihood of the evidence given the hypothesis, P(H) is the prior probability of the hypothesis, and P(E) is the marginal probability of the evidence

To illustrate Bayes's theorem, let's consider an example of medical diagnosis. Suppose a patient is tested for a rare disease, and the test has a false-positive rate of 2% and a false-negative rate of 1%. This means that if the patient has the disease, there is a 99% chance that the test will be positive, and if the patient does not have the disease, there is a 2% chance that the test will be positive.

P(E|H) = 0.99 (the likelihood of the test being positive if the patient has the disease)
P(H) = 0.001 (the prior probability of the patient having the disease)
P(E) = P(E|H) * P(H) + P(E|not H) * P(not H)
= 0.99 * 0.001 + 0.02 * 0.999
= 0.02077 (the marginal probability of the evidence)

Therefore,

P(H|E) = P(E|H) * P(H) / P(E)
= 0.99 * 0.001 / 0.02077
= 0.0476 (the posterior probability of the patient having the disease given the test is positive)

Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

A confidence interval is a range of values around a sample estimate that is likely to contain the true population parameter with a certain level of confidence. It provides a measure of the precision and uncertainty associated with the estimate.

Confidence intervals are commonly used in statistical inference and hypothesis testing. The level of confidence represents the probability that the true parameter lies within the interval. The most common levels of confidence are 90%, 95%, and 99%.

Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the
event's probability and new evidence. Provide a sample problem and solution.

Suppose that a factory produces two types of products, A and B, with a production ratio of 3:2. Product A has a defect rate of 5%, while product B has a defect rate of 10%. A randomly selected product is found to be defective. What is the probability that it is product A?

To solve this problem using Bayes' theorem, we can define the following events:

H: the selected product is product A
not H: the selected product is product B
E: the selected product is defective
We are given the prior probability of H, which is P(H) = 0.6, and the prior probability of not H, which is P(not H) = 0.4, based on the production ratio. We also know the conditional probabilities of E given H and not H, which are P(E|H) = 0.05 and P(E|not H) = 0.1, respectively.

Using Bayes' theorem, we can calculate the posterior probability of H given E, which is the probability that the selected product is product A given that it is defective:

P(H|E) = P(E|H) * P(H) / [P(E|H) * P(H) + P(E|not H) * P(not H)]

Plugging in the values, we get:

P(H|E) = 0.05 * 0.6 / [0.05 * 0.6 + 0.1 * 0.4]
= 0.3333

Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5. Interpret the results.

In [1]:
import scipy.stats as stats

sample_mean = 50
sample_std = 5
sample_size = 50

# Calculate the standard error of the mean
std_error = sample_std / (sample_size ** 0.5)

# Calculate the critical value for a 95% confidence interval
crit_value = stats.norm.ppf(0.975)

# Calculate the confidence interval
lower_bound = sample_mean - crit_value * std_error
upper_bound = sample_mean + crit_value * std_error

# Print the results
print("The 95% confidence interval is: ({:.2f}, {:.2f})".format(lower_bound, upper_bound))


The 95% confidence interval is: (48.61, 51.39)


Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error?
Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

The margin of error in a confidence interval is the amount by which the sample mean may differ from the true population mean. It is calculated by multiplying the critical value (which depends on the level of confidence) by the standard error of the mean.

Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population
standard deviation of 5. Interpret the results.

In [2]:
data_point = 75
pop_mean = 70
pop_std = 5

z_score = (data_point - pop_mean) / pop_std

print("The z-score for the data point is:", z_score)


The z-score for the data point is: 1.0


Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average
of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is
significantly effective at a 95% confidence level using a t-test.

In [3]:
import scipy.stats as stats

sample_mean = 6
sample_std = 2.5
sample_size = 50
pop_mean = 0  # null hypothesis: the drug is not effective
alpha = 0.05  # significance level

# Calculate the t-score
t_score = (sample_mean - pop_mean) / (sample_std / (sample_size ** 0.5))

# Calculate the degrees of freedom
df = sample_size - 1

# Calculate the critical t-value
crit_value = stats.t.ppf(1 - alpha/2, df)

# Calculate the p-value
p_value = (1 - stats.t.cdf(abs(t_score), df)) * 2

# Print the results
print("t-score:", t_score)
print("critical t-value:", crit_value)
print("p-value:", p_value)

if abs(t_score) > crit_value or p_value < alpha:
    print("Reject null hypothesis. The drug is significantly effective.")
else:
    print("Fail to reject null hypothesis. The drug is not significantly effective.")


t-score: 16.970562748477143
critical t-value: 2.009575234489209
p-value: 0.0
Reject null hypothesis. The drug is significantly effective.


Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95%
confidence interval for the true proportion of people who are satisfied with their job.

In [4]:
import scipy.stats as stats
import math

sample_size = 500
sample_proportion = 0.65
alpha = 0.05  # significance level

# Calculate the standard error
se = math.sqrt(sample_proportion*(1-sample_proportion)/sample_size)

# Calculate the critical z-value
crit_value = stats.norm.ppf(1-alpha/2)

# Calculate the margin of error
margin_of_error = crit_value * se

# Calculate the confidence interval
lower_bound = sample_proportion - margin_of_error
upper_bound = sample_proportion + margin_of_error

# Print the results
print("Sample proportion:", sample_proportion)
print("Margin of error:", margin_of_error)
print("95% Confidence interval:", (lower_bound, upper_bound))


Sample proportion: 0.65
Margin of error: 0.04180746061907883
95% Confidence interval: (0.6081925393809212, 0.6918074606190788)


Q12. A researcher is testing the effectiveness of two different teaching methods on student performance.
Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82
with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a
significant difference in student performance using a t-test with a significance level of 0.01.

In [5]:
import scipy.stats as stats

sample_A_mean = 85
sample_A_sd = 6
sample_B_mean = 82
sample_B_sd = 5
alpha = 0.01  # significance level

# Calculate the pooled standard deviation
pooled_sd = math.sqrt(((sample_A_sd ** 2) + (sample_B_sd ** 2)) / 2)

# Calculate the t-statistic and degrees of freedom
t_stat = (sample_A_mean - sample_B_mean) / (pooled_sd * math.sqrt(2 / len([sample_A_mean, sample_B_mean])))
df = len([sample_A_mean, sample_B_mean]) - 2

# Calculate the critical t-value
crit_value = stats.t.ppf(1 - alpha/2, df)

# Calculate the p-value
p_value = stats.t.sf(abs(t_stat), df) * 2

# Print the results
print("Sample A mean:", sample_A_mean)
print("Sample B mean:", sample_B_mean)
print("Pooled standard deviation:", pooled_sd)
print("t-statistic:", t_stat)
print("Degrees of freedom:", df)
print("Critical t-value:", crit_value)
print("p-value:", p_value)

if p_value < alpha:
    print("We reject the null hypothesis and conclude that the two teaching methods have a significant difference in student performance.")
else:
    print("We fail to reject the null hypothesis and conclude that there is insufficient evidence to suggest a significant difference in student performance between the two teaching methods.")


Sample A mean: 85
Sample B mean: 82
Pooled standard deviation: 5.522680508593631
t-statistic: 0.5432144762551112
Degrees of freedom: 0
Critical t-value: nan
p-value: nan
We fail to reject the null hypothesis and conclude that there is insufficient evidence to suggest a significant difference in student performance between the two teaching methods.


Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean
of 65. Calculate the 90% confidence interval for the true population mean.

In [6]:
import scipy.stats as stats
import math

population_mean = 60
population_sd = 8
sample_size = 50
sample_mean = 65
confidence_level = 0.90

# Calculate the standard error
std_error = population_sd / math.sqrt(sample_size)

# Calculate the margin of error
margin_error = stats.norm.ppf((1 + confidence_level) / 2) * std_error

# Calculate the confidence interval
lower_bound = sample_mean - margin_error
upper_bound = sample_mean + margin_error

# Print the results
print("Sample size:", sample_size)
print("Sample mean:", sample_mean)
print("Population mean:", population_mean)
print("Population standard deviation:", population_sd)
print("Confidence level:", confidence_level)
print("Margin of error:", margin_error)
print("Confidence interval: [{:.2f}, {:.2f}]".format(lower_bound, upper_bound))


Sample size: 50
Sample mean: 65
Population mean: 60
Population standard deviation: 8
Confidence level: 0.9
Margin of error: 1.860939445882678
Confidence interval: [63.14, 66.86]


Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average
reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to
determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In [7]:
import scipy.stats as stats

sample_size = 30
sample_mean = 0.25
sample_sd = 0.05
confidence_level = 0.90

# Set up the null and alternative hypotheses
# Null hypothesis: Caffeine has no significant effect on reaction time (mu = 0.25)
# Alternative hypothesis: Caffeine has a significant effect on reaction time (mu < 0.25)
null_hypothesis_mu = 0.25
alpha = 1 - confidence_level

# Calculate the t-statistic and p-value
t_statistic = (sample_mean - null_hypothesis_mu) / (sample_sd / (sample_size ** 0.5))
p_value = stats.t.cdf(t_statistic, df=sample_size-1)

# Compare p-value with alpha
if p_value < alpha:
    print("Reject the null hypothesis. Caffeine has a significant effect on reaction time.")
else:
    print("Fail to reject the null hypothesis. Caffeine has no significant effect on reaction time.")


Fail to reject the null hypothesis. Caffeine has no significant effect on reaction time.
