# Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

The main difference is that a t-test is used when you don't know the population standard deviation and typically for smaller sample sizes, while a z-test is used when the population standard deviation is known or for larger samples.

Example scenario for a t-test: Testing if a new drug's effectiveness differs from a placebo in a clinical trial with a small sample size.

Example scenario for a z-test: Comparing the mean exam scores of two large classes to see if one class performs significantly better than the other.

# Q2: Differentiate between one-tailed and two-tailed tests.

In hypothesis testing:

A one-tailed test (also called a one-sided test) is used when you have a specific directional hypothesis. It assesses if a sample statistic is significantly greater than or less than a population parameter, but not both. The critical region is located entirely on one side of the distribution.

A two-tailed test is used when you have a non-directional hypothesis, and you want to assess if a sample statistic significantly differs from a population parameter in either direction (greater or less than). The critical region is split into two parts, one on each side of the distribution.

For example:

One-tailed test: Testing if a new drug increases the average test scores of students.
Two-tailed test: Testing if a coin is biased, leading to either more heads or more tails in coin flips compared to a fair coin.

# Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

Type I Error (False Positive): This occurs when you mistakenly reject a null hypothesis that is actually true. In other words, you conclude that there is an effect or difference when there isn't one.

Example Scenario: A medical test falsely indicates that a healthy person has a disease.

Type II Error (False Negative): This occurs when you fail to reject a null hypothesis that is actually false. In other words, you conclude that there is no effect or difference when there is one.

Example Scenario: A medical test fails to detect a disease in a person who actually has it.

These errors highlight the trade-off in hypothesis testing. To reduce the risk of one type of error, you often increase the risk of the other. It's crucial to choose an appropriate significance level and conduct power analysis to balance the risks based on the specific context and consequences of the errors.

# Q4: Explain Bayes's theorem with an example.

Bayes’ Theorem is named after Reverend Thomas Bayes. It is a very important theorem in mathematics that is used to find the probability of an event, based on prior knowledge of conditions that might be related to that event. It is a further case of conditional probability. For example, There are 3 bags, each containing some white marble and some black marble in each bag. If a white marble is drawn at random. What is the probability to find that this white marble is from the first bag? In cases like such, we use Bayes’ Theorem. It is used where the probability of occurrence of a particular event is calculated based on other conditions which are also called conditional probability.

P(A|B) = P(B|A)P(A) / P(B)

where,
P(A) and P(B) are the probabilities of events A and B
P(A|B) is the probability of event A when event B happens
P(B|A) is the probability of event B when A happens

Examples of Bayes’ Theorem
Bayesian inference is very important and has found application in various activities, including medicine, science, philosophy, engineering, sports, law, etc. and Bayesian inference is directly derived from Bayes’ theorem. Example: Bayes’ theorem defines the accuracy of the medical test by taking into account how likely a person is to have a disease and what is the overall accuracy of the test.

# Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

A confidence interval is a range of values that is constructed around a sample statistic (such as a mean or proportion) to provide an estimate of where the true population parameter is likely to fall, along with a certain level of confidence. It quantifies the uncertainty associated with estimating a population parameter based on a sample.

To calculate a confidence interval, you typically need the following information:

The sample statistic (e.g., sample mean or proportion).
The standard error of the sample statistic (a measure of how much it tends to vary).
The desired level of confidence (often expressed as a percentage, like 95% or 99%).
The formula for calculating a confidence interval for a sample mean, for example, is:

Confidence Interval
=
Sample Mean
±
Margin of Error
Confidence Interval=Sample Mean±Margin of Error

Where the Margin of Error is determined by the standard error of the mean and the critical value from the t-distribution or z-distribution based on the desired level of confidence.

# Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

Prior Probability: Initially, there's a 50% chance of drawing a red card because half of the deck is red (hearts and diamonds).
New Evidence: You draw a face card.
Using Bayes' Theorem, you find that the probability of drawing a red card, given that it's a face card, is 75%.

# Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.

In [1]:
import scipy.stats as stats

# Sample mean and standard deviation
sample_mean = 50
std_dev = 5

# Sample size (you need to specify this)
sample_size = 30  # Replace with your actual sample size

# Calculate the standard error
standard_error = std_dev / (sample_size ** 0.5)

# Set the confidence level and degrees of freedom (for a normal distribution)
confidence_level = 0.95
degrees_of_freedom = sample_size - 1

# Calculate the margin of error using the t-distribution
margin_of_error = stats.t.ppf((1 + confidence_level) / 2, df=degrees_of_freedom) * standard_error

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(f"95% Confidence Interval: ({lower_bound:.2f}, {upper_bound:.2f})")

95% Confidence Interval: (48.13, 51.87)


# Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

The margin of error in a confidence interval is a measure of the range within which the true population parameter is likely to fall, based on a sample statistic. It quantifies the uncertainty associated with estimating the population parameter from a sample.

The margin of error is influenced by several factors, including:

Sample Size: Larger sample sizes tend to result in smaller margins of error. As the sample size increases, the sample statistic (e.g., sample mean) becomes a more accurate estimate of the population parameter, reducing uncertainty.

Standard Deviation: A larger standard deviation in the population tends to result in a larger margin of error because it indicates greater variability within the population.

Confidence Level: Higher confidence levels (e.g., 95% vs. 99%) result in larger margins of error because you are widening the range to be more certain that it contains the true parameter.

Here's an example to illustrate how a larger sample size leads to a smaller margin of error:

Suppose you want to estimate the average test score of students in a school. You have two scenarios:

Scenario 1 (Smaller Sample Size):

Sample Size (n1) = 30 students
Standard Deviation (σ) = 10 points
Scenario 2 (Larger Sample Size):

Sample Size (n2) = 300 students
Standard Deviation (σ) = 10 points (same as in Scenario 1)
In both scenarios, you use the same standard deviation (σ) because you assume the population variability is the same. However, in Scenario 2, you have a much larger sample size (10 times larger) compared to Scenario 1.

The margin of error for a 95% confidence interval in Scenario 1 might be, for example, 3 points, while in Scenario 2, it might be only 1 point. The larger sample size in Scenario 2 provides a more precise estimate of the population mean, resulting in a smaller margin of error and a more accurate confidence interval.

# Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.

In [2]:
# Given data
data_point = 75
population_mean = 70
population_std_dev = 5

# Calculate the z-score
z_score = (data_point - population_mean) / population_std_dev

print(f"Z-score: {z_score:.2f}")

Z-score: 1.00


# Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

In [3]:
import scipy.stats as stats

# Given data
sample_mean = 6
sample_std_dev = 2.5
sample_size = 50
confidence_level = 0.95

# Define hypotheses
null_hypothesis_mean = 0  # Null hypothesis: Mean weight loss is zero or negligible

# Calculate the standard error of the mean
standard_error = sample_std_dev / (sample_size ** 0.5)

# Calculate the t-statistic
t_statistic = (sample_mean - null_hypothesis_mean) / standard_error

# Calculate the degrees of freedom
degrees_of_freedom = sample_size - 1

# Calculate the p-value
p_value = 2 * (1 - stats.t.cdf(abs(t_statistic), df=degrees_of_freedom))

# Check if the p-value is less than alpha (0.05 for a 95% confidence level)
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. The drug is significantly effective.")
else:
    print("Fail to reject the null hypothesis. The drug is not significantly effective.")

print(f"t-statistic: {t_statistic:.2f}")
print(f"p-value: {p_value:.4f}")

Reject the null hypothesis. The drug is significantly effective.
t-statistic: 16.97
p-value: 0.0000


# Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

In [4]:
import scipy.stats as stats

# Given data
sample_proportion = 0.65  # 65% satisfaction rate
sample_size = 500
confidence_level = 0.95

# Calculate the critical value (Z) for a 95% confidence level
z_critical = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate the margin of error
margin_of_error = z_critical * ((sample_proportion * (1 - sample_proportion)) / sample_size) ** 0.5

# Calculate the confidence interval
lower_bound = sample_proportion - margin_of_error
upper_bound = sample_proportion + margin_of_error

print(f"95% Confidence Interval: ({lower_bound:.4f}, {upper_bound:.4f})")

95% Confidence Interval: (0.6082, 0.6918)


# Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

In [5]:
import scipy.stats as stats

# Given data for Sample A
mean_A = 85
std_dev_A = 6
sample_size_A = 30  # Replace with your actual sample size for Sample A

# Given data for Sample B
mean_B = 82
std_dev_B = 5
sample_size_B = 30  # Replace with your actual sample size for Sample B

# Set the significance level
alpha = 0.01

# Calculate the pooled standard error
pooled_std_error = ((std_dev_A ** 2 / sample_size_A) + (std_dev_B ** 2 / sample_size_B)) ** 0.5

# Calculate the t-statistic
t_statistic = (mean_A - mean_B) / pooled_std_error

# Calculate the degrees of freedom
degrees_of_freedom = sample_size_A + sample_size_B - 2

# Calculate the p-value
p_value = 2 * (1 - stats.t.cdf(abs(t_statistic), df=degrees_of_freedom))

# Compare the p-value to the significance level
if p_value < alpha:
    print("Reject the null hypothesis. There is a significant difference in student performance.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference in student performance.")

print(f"t-statistic: {t_statistic:.2f}")
print(f"p-value: {p_value:.4f}")

Fail to reject the null hypothesis. There is no significant difference in student performance.
t-statistic: 2.10
p-value: 0.0397


# Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.

In [6]:
import scipy.stats as stats

# Given data
sample_mean = 65
population_mean = 60
population_std_dev = 8
sample_size = 50
confidence_level = 0.90

# Calculate the critical value (Z) for a 90% confidence level
z_critical = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate the margin of error
margin_of_error = z_critical * (population_std_dev / (sample_size ** 0.5))

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(f"90% Confidence Interval: ({lower_bound:.2f}, {upper_bound:.2f})")

90% Confidence Interval: (63.14, 66.86)


# Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In [7]:
import scipy.stats as stats

# Given data
sample_mean = 0.25
sample_std_dev = 0.05
sample_size = 30
population_mean = 0  # Assuming the null hypothesis that caffeine has no effect

# Set the significance level
alpha = 0.10  # 90% confidence level, so alpha = 0.10 (1 - 0.90)

# Calculate the standard error of the mean
standard_error = sample_std_dev / (sample_size ** 0.5)

# Calculate the t-statistic
t_statistic = (sample_mean - population_mean) / standard_error

# Calculate the degrees of freedom
degrees_of_freedom = sample_size - 1

# Calculate the p-value for a one-tailed test (since we're testing if caffeine has an effect)
p_value = 1 - stats.t.cdf(abs(t_statistic), df=degrees_of_freedom)

# Compare the p-value to the significance level
if p_value < alpha:
    print("Reject the null hypothesis. Caffeine has a significant effect on reaction time.")
else:
    print("Fail to reject the null hypothesis. Caffeine does not have a significant effect on reaction time.")

print(f"t-statistic: {t_statistic:.2f}")
print(f"p-value: {p_value:.4f}")

Reject the null hypothesis. Caffeine has a significant effect on reaction time.
t-statistic: 27.39
p-value: 0.0000
