Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would
use each type of test.

Ans: The main difference between a t-test and a z-test lies in the information available about the population variance. Here's a brief overview:

1. **T-test:**
   - **When to use:** Use a t-test when the sample size is small, and the population standard deviation is unknown. T-tests are often used when dealing with smaller sample sizes and situations where the population variance is not known.
   - **Example Scenario:** Suppose you want to compare the average scores of two groups of students (Group A and Group B) on a test. You collect samples from each group, and since you don't know the population standard deviation, you use a t-test to compare the means of the two groups.

2. **Z-test:**
   - **When to use:** Use a z-test when the sample size is large, and the population standard deviation is known. Z-tests are more suitable for larger sample sizes and situations where the population variance is known.
   - **Example Scenario:** Consider a situation where you have a large dataset of exam scores for a population, and you know the population standard deviation. If you want to test whether the average exam score is significantly different from a certain value, you might use a z-test.

In summary, choose a t-test when dealing with smaller sample sizes and unknown population variances, and opt for a z-test when dealing with larger sample sizes and known population variances.

Q2: Differentiate between one-tailed and two-tailed tests.

Ans: The distinction between one-tailed and two-tailed tests is related to the directionality of the hypothesis and the area considered in the tail(s) of the distribution when conducting hypothesis testing. Here's a breakdown of the differences:

1. **One-Tailed Test:**
   - **Hypothesis:** In a one-tailed test, the null hypothesis (H0) and the alternative hypothesis (H1) are concerned with the direction of the effect (greater than or less than) but not both.
   - **Critical Region:** The critical region is located entirely in one tail of the distribution (either the left or right).
   - **Decision Rule:** If the test statistic falls into the critical region, you reject the null hypothesis. The decision is based on evidence favoring the direction specified in the alternative hypothesis.
   - **Example:** Testing whether a new drug increases the average performance on a cognitive task compared to a placebo.

2. **Two-Tailed Test:**
   - **Hypothesis:** In a two-tailed test, the null hypothesis (H0) usually states that there is no effect, and the alternative hypothesis (H1) typically asserts that there is an effect, without specifying the direction (either greater than or less than).
   - **Critical Region:** The critical region is divided between both tails of the distribution.
   - **Decision Rule:** If the test statistic falls into either tail, you reject the null hypothesis. The decision is based on evidence that the sample result is significantly different from what is expected under the null hypothesis, without specifying the direction of the difference.
   - **Example:** Testing whether a coin is fair (null: probability of heads = 0.5) or biased (alternative: probability of heads is not equal to 0.5).

In summary, one-tailed tests are used when you have a specific hypothesis about the direction of the effect, while two-tailed tests are more conservative and used when you are interested in detecting any significant difference, regardless of the direction.

Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for
each type of error.

Ans: **Type I and Type II errors** are potential mistakes made in hypothesis testing:

1. **Type I Error (False Positive):**
   - **Definition:** Rejecting a true null hypothesis. It occurs when you conclude that there is an effect or difference when there isn't one.
   - **Example Scenario:** Imagine a pharmaceutical company testing a new drug. The null hypothesis is that the drug has no side effects, but due to random variability or an experimental error, the test incorrectly concludes that there are significant side effects, leading to the drug being rejected even though it is safe.

2. **Type II Error (False Negative):**
   - **Definition:** Failing to reject a false null hypothesis. It occurs when you conclude that there is no effect or difference when there actually is one.
   - **Example Scenario:** Continuing with the drug example, the null hypothesis now states that the drug has no side effects, but in reality, it does have side effects. If the test fails to detect these side effects, it would be a Type II error. Patients might use the drug thinking it's safe, but they experience side effects that were not identified by the test.

In summary, Type I errors involve incorrectly concluding that an effect exists when it doesn't, while Type II errors involve failing to detect a real effect. The balance between Type I and Type II errors is often controlled by choosing an appropriate significance level (alpha) and considering the power of the test. Researchers aim to minimize both types of errors, but there is typically a trade-off between them.

Q4: Explain Bayes's theorem with an example.

Ans: **Bayes' Theorem** is a mathematical formula used in probability theory to update the probability of a hypothesis based on new evidence. It is named after the Reverend Thomas Bayes. The formula is as follows:

\[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \]

Where:
- \( P(A|B) \) is the probability of hypothesis A given evidence B.
- \( P(B|A) \) is the probability of evidence B given hypothesis A.
- \( P(A) \) is the prior probability of hypothesis A.
- \( P(B) \) is the probability of evidence B.

Let's go through an example to illustrate Bayes' Theorem:

**Example: Medical Diagnosis**
Suppose there's a rare disease that affects 1 in 1000 people (\( P(A) = 0.001 \)). There's also a medical test for this disease that is 99% accurate (\( P(B|A) = 0.99 \)) but has a 1% false positive rate (\( P(B|\neg A) = 0.01 \), where \( \neg A \) represents not having the disease).

Now, let's say a person takes the test and it comes back positive (\( B \)). We want to find the probability that the person actually has the disease (\( A \)).

1. **Prior Probability (\( P(A) \)):** The initial probability of having the disease is 0.001.

2. **Likelihood (\( P(B|A) \)):** The probability of getting a positive test result given that the person has the disease is 0.99.

3. **Evidence Probability (\( P(B) \)):** The probability of getting a positive test result can occur either because the person has the disease or because of a false positive.

\[ P(B) = P(B|A) \cdot P(A) + P(B|\neg A) \cdot P(\neg A) \]
\[ P(B) = (0.99 \cdot 0.001) + (0.01 \cdot 0.999) \]

4. **Posterior Probability (\( P(A|B) \)):** Using Bayes' Theorem to find the updated probability of having the disease given the positive test result.

\[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \]

Plug in the values to get the updated probability.

This example demonstrates how Bayes' Theorem can be used to update the probability of a hypothesis (having the disease) based on new evidence (positive test result).

Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

Ans: A **confidence interval** is a statistical tool used to estimate a range of values within which the true population parameter is likely to fall with a certain level of confidence. It provides a range rather than a single point estimate, acknowledging the uncertainty inherent in statistical sampling.

The general form of a confidence interval is:

\[ \text{Point Estimate} \pm \text{Margin of Error} \]

where the margin of error is influenced by the desired level of confidence and the variability in the data.

To calculate a confidence interval, you typically follow these steps:

1. **Determine the Point Estimate:** Calculate the sample mean or proportion, depending on the parameter of interest.

2. **Select the Confidence Level:** Choose a confidence level, often expressed as a percentage (e.g., 95%, 99%).

3. **Determine the Critical Value:** Find the z-score or t-score corresponding to the chosen confidence level. This value is based on the normal distribution for large sample sizes or the t-distribution for smaller sample sizes.

4. **Calculate the Margin of Error:** Multiply the standard error of the sample mean or proportion by the critical value to determine the margin of error.

5. **Construct the Confidence Interval:** Combine the point estimate and the margin of error to establish the interval.

Here's an example of calculating a confidence interval for the mean:

**Example: Confidence Interval for Mean Height**

Suppose you want to estimate the average height of a population. You collect a random sample of 30 individuals and find that the sample mean height is 65 inches, with a standard deviation of 3 inches. You want to calculate a 95% confidence interval for the true average height.

1. **Point Estimate:** Sample Mean (\( \bar{X} \)) = 65 inches.

2. **Confidence Level:** 95%.

3. **Critical Value:** Since the sample size is small (30), you consult the t-distribution. For a 95% confidence level with 29 degrees of freedom, the critical t-value is approximately 2.045.

4. **Margin of Error:** Standard Error (\( SE \)) = \( \frac{s}{\sqrt{n}} \) where \( s \) is the sample standard deviation and \( n \) is the sample size. Margin of Error = \( t \times SE \).

\[ SE = \frac{3}{\sqrt{30}} \approx 0.546 \]

\[ \text{Margin of Error} = 2.045 \times 0.546 \approx 1.116 \]

5. **Confidence Interval:** \( 65 \pm 1.116 \), which results in a confidence interval of \( (63.884, 66.116) \).

Interpretation: With 95% confidence, we estimate that the true average height of the population lies between 63.884 inches and 66.116 inches.

Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the
event's probability and new evidence. Provide a sample problem and solution.

Ans: Problem:

Suppose a factory produces widgets, and 2% of the widgets are defective. A quality control test correctly identifies 95% of defective widgets but also incorrectly identifies 3% of non-defective widgets as defective. If a randomly selected widget fails the quality control test, what is the probability that it is actually defective?

In [1]:
# Given probabilities
p_defective = 0.02  # Prior probability of a widget being defective
p_non_defective = 1 - p_defective  # Prior probability of a widget not being defective

p_positive_given_defective = 0.95  # Probability of a positive test given the widget is defective
p_positive_given_non_defective = 0.03  # Probability of a positive test given the widget is not defective

# Bayes' Theorem
p_defective_given_positive = (p_positive_given_defective * p_defective) / (
    (p_positive_given_defective * p_defective) + (p_positive_given_non_defective * p_non_defective)
)

# Display the result
print(f"The probability of a widget being defective given a positive test result is: {p_defective_given_positive:.4f}")


The probability of a widget being defective given a positive test result is: 0.3926


Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5. Interpret the results.

In [2]:
import scipy.stats as stats

# Given data
mean = 50
std_dev = 5
confidence_level = 0.95
sample_size = 30  # You should replace this with your actual sample size

# Calculate the critical value for a 95% confidence interval
critical_value = stats.t.ppf((1 + confidence_level) / 2, df=sample_size - 1)

# Calculate the margin of error
margin_of_error = critical_value * (std_dev / (sample_size ** 0.5))

# Calculate the confidence interval
confidence_interval = (mean - margin_of_error, mean + margin_of_error)

# Display the results
print(f"95% Confidence Interval: {confidence_interval}")


95% Confidence Interval: (48.1329693162095, 51.8670306837905)


Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error?
Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

Margin of Error in a Confidence Interval:

The margin of error (MOE) in a confidence interval represents the range within which we expect the true population parameter to fall. It is calculated as half of the width of the confidence interval.

Margin of Error (MOE)
=
Width of Confidence Interval
2
Margin of Error (MOE)= 
2
Width of Confidence Interval
​	
 

A larger margin of error indicates greater uncertainty and a wider range of possible values for the population parameter.

Effect of Sample Size on Margin of Error:

The margin of error is inversely proportional to the square root of the sample size. As the sample size increases, the margin of error decreases. Mathematically, it can be expressed as:

MOE
∝
1
Sample Size
MOE∝ 
Sample Size
​	
 
1
​	
 

Example Scenario:

Suppose you are conducting a political poll to estimate the percentage of voters who support a particular candidate. You want to calculate a 95% confidence interval for the true proportion of supporters. In one scenario, you survey 1000 voters, and in another scenario, you survey 5000 voters.



In [3]:
import scipy.stats as stats

def calculate_moe(sample_size, sample_proportion, confidence_level=0.95):
    critical_value = stats.norm.ppf((1 + confidence_level) / 2)
    margin_of_error = critical_value * ((sample_proportion * (1 - sample_proportion)) / sample_size) ** 0.5
    return margin_of_error

# Scenario 1: Sample size = 1000
sample_size_1 = 1000
sample_proportion_1 = 0.6  # Assuming 60% support
moe_1 = calculate_moe(sample_size_1, sample_proportion_1)

# Scenario 2: Sample size = 5000
sample_size_2 = 5000
sample_proportion_2 = 0.6  # Assuming 60% support
moe_2 = calculate_moe(sample_size_2, sample_proportion_2)

print(f"Scenario 1 - Margin of Error: {moe_1}")
print(f"Scenario 2 - Margin of Error: {moe_2}")


Scenario 1 - Margin of Error: 0.03036363148515984
Scenario 2 - Margin of Error: 0.01357902880891406


Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population
standard deviation of 5. Interpret the results.

In [4]:
# Given data
data_point = 75
population_mean = 70
population_std_dev = 5

# Calculate the z-score
z_score = (data_point - population_mean) / population_std_dev

# Display the result
print(f"The z-score for the data point {data_point} is: {z_score}")


The z-score for the data point 75 is: 1.0


Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average
of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is
significantly effective at a 95% confidence level using a t-test.

In [5]:
import scipy.stats as stats

# Given data
sample_mean = 6
sample_std_dev = 2.5
sample_size = 50
hypothesized_mean = 0  # Null hypothesis: The drug is not significantly effective

# Calculate the t-statistic
t_statistic = (sample_mean - hypothesized_mean) / (sample_std_dev / (sample_size ** 0.5))

# Set the significance level
alpha = 0.05

# Find critical values for a two-tailed test
critical_value_lower = stats.t.ppf(alpha/2, df=sample_size-1)
critical_value_upper = -critical_value_lower

# Make a decision
if t_statistic < critical_value_lower or t_statistic > critical_value_upper:
    print("Reject the null hypothesis. The drug is significantly effective.")
else:
    print("Fail to reject the null hypothesis. There is not enough evidence that the drug is significantly effective.")


Reject the null hypothesis. The drug is significantly effective.


Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95%
confidence interval for the true proportion of people who are satisfied with their job.

In [6]:
import scipy.stats as stats

# Given data
sample_proportion = 0.65
sample_size = 500
confidence_level = 0.95

# Calculate the critical value for a 95% confidence interval
z_score = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate the margin of error
margin_of_error = z_score * ((sample_proportion * (1 - sample_proportion)) / sample_size) ** 0.5

# Calculate the confidence interval
confidence_interval = (sample_proportion - margin_of_error, sample_proportion + margin_of_error)

# Display the results
print(f"95% Confidence Interval for job satisfaction: {confidence_interval}")


95% Confidence Interval for job satisfaction: (0.6081925393809212, 0.6918074606190788)


Q12. A researcher is testing the effectiveness of two different teaching methods on student performance.
Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82
with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a
significant difference in student performance using a t-test with a significance level of 0.01.

In [7]:
import scipy.stats as stats

# Given data for Sample A
mean_A = 85
std_dev_A = 6
sample_size_A = 30  # You should replace this with your actual sample size

# Given data for Sample B
mean_B = 82
std_dev_B = 5
sample_size_B = 30  # You should replace this with your actual sample size

# Significance level
alpha = 0.01

# Calculate the t-statistic
numerator = mean_A - mean_B
denominator = ((std_dev_A**2) / sample_size_A + (std_dev_B**2) / sample_size_B) ** 0.5
t_statistic = numerator / denominator

# Degrees of freedom for a two-sample t-test
degrees_of_freedom = min(sample_size_A - 1, sample_size_B - 1)

# Calculate the critical value
critical_value = stats.t.ppf(1 - alpha / 2, df=degrees_of_freedom)

# Compare the t-statistic with the critical value
if abs(t_statistic) > critical_value:
    print("Reject the null hypothesis. There is a significant difference in student performance between the two teaching methods.")
else:
    print("Fail to reject the null hypothesis. There is not enough evidence to conclude a significant difference.")


Fail to reject the null hypothesis. There is not enough evidence to conclude a significant difference.


Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean
of 65. Calculate the 90% confidence interval for the true population mean.

In [8]:
import scipy.stats as stats

# Given data
sample_mean = 65
population_std_dev = 8
sample_size = 50
confidence_level = 0.90

# Calculate the critical value for a 90% confidence interval
z_score = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate the margin of error
margin_of_error = z_score * (population_std_dev / (sample_size ** 0.5))

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

# Display the results
print(f"90% Confidence Interval for the true population mean: {confidence_interval}")


90% Confidence Interval for the true population mean: (63.13906055411732, 66.86093944588268)


Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average
reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to
determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In [9]:
import scipy.stats as stats

# Given data
sample_mean = 0.25
sample_std_dev = 0.05
sample_size = 30
hypothesized_mean = 0.20  # Null hypothesis: No effect

# Calculate the t-statistic
t_statistic = (sample_mean - hypothesized_mean) / (sample_std_dev / (sample_size ** 0.5))

# Degrees of freedom for a one-sample t-test
degrees_of_freedom = sample_size - 1

# Set the significance level
confidence_level = 0.90
alpha = 1 - confidence_level

# Find critical values for a two-tailed test
critical_value_lower = stats.t.ppf(alpha / 2, df=degrees_of_freedom)
critical_value_upper = -critical_value_lower

# Make a decision
if t_statistic < critical_value_lower or t_statistic > critical_value_upper:
    print("Reject the null hypothesis. Caffeine has a significant effect on reaction time.")
else:
    print("Fail to reject the null hypothesis. There is not enough evidence that caffeine has a significant effect on reaction time.")


Reject the null hypothesis. Caffeine has a significant effect on reaction time.
