#  Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

Ans=A t-test and a z-test are both statistical hypothesis tests used to determine whether there is a significant difference between the means of two groups or to assess a sample mean in relation to a population mean. The main difference between them lies in the circumstances under which they are appropriate.

T-Test:

Use Case: T-tests are primarily used when you have a small sample size (typically less than 30) and when you do not know the population standard deviation. They are appropriate for testing hypotheses about the means of samples from populations with unknown or different variances.
Example Scenario: Suppose you want to test whether there is a significant difference in the average test scores of two different groups of students, Group A and Group B, where the sample size for each group is less than 30. In this case, you would use a t-test to compare the means of the two groups.
Z-Test:

Use Case: Z-tests are used when you have a larger sample size (typically greater than 30) and when you know the population standard deviation. They are appropriate for situations where you are comparing a sample mean to a known population mean.
Example Scenario: Let's say you work for a cereal company, and you want to test whether a new production process has led to a significant change in the weight of cereal boxes. You have a large sample of cereal boxes (more than 30) and you have historical data for the population standard deviation of box weights. In this case, you would use a z-test to compare the sample mean to the known population mean.

# Q2: Differentiate between one-tailed and two-tailed tests.

Ans=One-Tailed Test vs. Two-Tailed Test:

Hypothesis testing in statistics involves comparing a sample or data to a null hypothesis to determine whether there is enough evidence to reject the null hypothesis in favor of an alternative hypothesis. The choice between a one-tailed and a two-tailed test depends on the specific research question and the direction of the expected effect. Here's how they differ:

One-Tailed Test (Directional Test):

Purpose: A one-tailed test is used when you have a specific directional hypothesis. In other words, you are interested in testing whether a parameter is greater than or less than a certain value, but not both.


Two-Tailed Test (Non-Directional Test):

Purpose: A two-tailed test is used when you have a non-directional hypothesis or when you want to test whether a parameter is different from a specific value without specifying the direction.

# Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

Ans=In hypothesis testing, there are two types of errors that can occur when making a decision about a null hypothesis. These errors are known as Type I and Type II errors:

Type I Error (False Positive):

Definition: A Type I error occurs when you incorrectly reject a true null hypothesis. In other words, you conclude that there is a significant effect or difference when there is none in reality. It represents a false positive result.

Example Scenario:
Suppose a medical test is designed to detect a specific disease. The null hypothesis (H0) is that a person does not have the disease. A Type I error in this case would occur if the test falsely indicates that a healthy person has the disease, leading to unnecessary stress, treatment, or costs.

Type II Error (False Negative):

Definition: A Type II error occurs when you fail to reject a false null hypothesis. In this case, you mistakenly conclude that there is no significant effect or difference when there actually is one. It represents a false negative result.

Example Scenario:
Let's say a pharmaceutical company is testing a new drug, and the null hypothesis (H0) is that the drug has no effect on a specific medical condition. A Type II error in this case would occur if the study fails to detect the drug's effectiveness when, in fact, the drug could be beneficial to patients, resulting in missed opportunities for treatment and improved health.

# Q4: Explain Bayes's theorem with an example.

Ans=The formula for the Bayes theorem can be written in a variety of ways. The following is the most common version:

P(A ∣ B) = P(B ∣ A)P(A) / P(B)

P(A ∣ B) is the conditional probability of event A occurring, given that B is true.

P(B ∣ A) is the conditional probability of event B occurring, given that A is true.

P(A) and P(B) are the probabilities of A and B occurring independently of one another.

Bayes theorem is also known as the formula for the probability of “causes”. For example: if we have to calculate the probability of taking a blue ball from the second bag out of three different bags of balls, where each bag contains three different colour balls viz. red, blue, black.

# Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

Ans=A confidence interval is a range of values that is constructed around a sample statistic, such as a mean or proportion, to provide a range of plausible values for the population parameter with a certain level of confidence. It is a common tool in statistics to quantify the uncertainty or variability associated with estimating a population parameter based on a sample of data. The confidence interval gives you a sense of how certain you can be that the true population parameter falls within a specific range.

A typical confidence interval is expressed as:

Point Estimate
±
Margin of Error
Point Estimate±Margin of Error

Where:

Point Estimate: This is the sample statistic (e.g., sample mean or sample proportion) that you use to estimate the population parameter.

Margin of Error: This is a range that is added and subtracted from the point estimate to create the interval. It is calculated based on the level of confidence and the variability of the data.

Here's how to calculate a confidence interval with an example:

Suppose you want to estimate the average height of a certain population of adults. You take a random sample of 100 individuals from this population and measure their heights. The sample mean height is 170 cm, and the sample standard deviation is 5 cm.

# Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

In [2]:
 #Define the prior probabilities and conditional probabilities
prior_probability_defect = 0.10  # P(D) - Prior probability of a defect occurring
probability_true_positive = 0.95  # P(A|D) - Probability of a true positive alert
probability_false_positive = 0.05  # P(A|¬D) - Probability of a false positive alert

# Calculate the probability of no defect occurring
prior_probability_no_defect = 1 - prior_probability_defect  # P(¬D)

# Calculate the probability of an alert (P(A)) using the law of total probability
probability_alert = (probability_true_positive * prior_probability_defect) + (probability_false_positive * prior_probability_no_defect)

# Calculate the probability of a defect given the alert (P(D|A)) using Bayes' Theorem
probability_defect_given_alert = (probability_true_positive * prior_probability_defect) / probability_alert

# Print the result
print(f"The probability of a defect given the alert is approximately: {probability_defect_given_alert:.2f}")

The probability of a defect given the alert is approximately: 0.68


# Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.

In [3]:
import scipy.stats as stats

# Sample statistics
sample_mean = 50
sample_std_dev = 5
confidence_level = 0.95  # 95% confidence level
sample_size = 100  # Assuming a sample size of 100

# Calculate the margin of error using the standard error formula
standard_error = sample_std_dev / (sample_size**0.5)

# Calculate the critical value (z-score) for the given confidence level
z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)

# Calculate the confidence interval
margin_of_error = z_score * standard_error
lower_limit = sample_mean - margin_of_error
upper_limit = sample_mean + margin_of_error

# Print the results
print(f"Sample Mean: {sample_mean}")
print(f"Standard Deviation: {sample_std_dev}")
print(f"Sample Size: {sample_size}")
print(f"Confidence Level: {confidence_level * 100}%")
print(f"Margin of Error: {margin_of_error}")
print(f"95% Confidence Interval: ({lower_limit}, {upper_limit})")

Sample Mean: 50
Standard Deviation: 5
Sample Size: 100
Confidence Level: 95.0%
Margin of Error: 0.979981992270027
95% Confidence Interval: (49.02001800772997, 50.97998199227003)


# Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error?
# Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

In [5]:
import scipy.stats as stats

# Sample statistics
sample_mean = 50
sample_std_dev = 5
confidence_level = 0.95 
# Sample sizes to compare
sample_sizes = [50, 100, 200]

for sample_size in sample_sizes:
    # Calculate the margin of error using the standard error formula
    standard_error = sample_std_dev / (sample_size**0.5)

    # Calculate the critical value (z-score) for the given confidence level
    z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)

    # Calculate the margin of error
    margin_of_error = z_score * standard_error


    print(f"Sample Size: {sample_size}")
    print(f"Margin of Error: {margin_of_error}\n")

Sample Size: 50
Margin of Error: 1.3859038243496777

Sample Size: 100
Margin of Error: 0.979981992270027

Sample Size: 200
Margin of Error: 0.6929519121748389



# Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.

Ans=Z-Score= 
              
X−Mean/standard deviation
​
 

Where:


X is the value of the data point.
Mean
Mean is the population mean.
Standard Deviation
Standard Deviation is the population standard deviation.
In this case, you want to calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Let's calculate the z-score and interpret the results using a Python program:

In [6]:
data_point = 75
population_mean = 70
population_std_dev = 5

# Calculate the z-score
z_score = (data_point - population_mean) / population_std_dev


print(f"Z-Score: {z_score:.2f}")

Z-Score: 1.00


# Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is
# significantly effective at a 95% confidence level using a t-test.

In [7]:
import scipy.stats as stats


sample_size = 50
sample_mean = 6  # Average weight loss
sample_std_dev = 2.5  # Standard deviation of weight loss


null_hypothesis_mean = 0  


confidence_level = 0.95  

# Calculate the t-statistic
t_statistic = (sample_mean - null_hypothesis_mean) / (sample_std_dev / (sample_size**0.5))

# Calculate the degrees of freedom
degrees_of_freedom = sample_size - 1

# Calculate the critical t-value for a two-tailed test
alpha = 1 - confidence_level
t_critical = stats.t.ppf(1 - alpha / 2, degrees_of_freedom)

# Perform the hypothesis test
p_value = 2 * (1 - stats.t.cdf(abs(t_statistic), degrees_of_freedom))


print(f"Sample Size: {sample_size}")
print(f"Sample Mean: {sample_mean}")
print(f"Sample Standard Deviation: {sample_std_dev}")
print(f"Null Hypothesis Mean: {null_hypothesis_mean}")
print(f"Confidence Level: {confidence_level * 100}%")
print(f"Calculated t-statistic: {t_statistic:.2f}")
print(f"Degrees of Freedom: {degrees_of_freedom}")
print(f"Critical t-value: {t_critical:.2f}")
print(f"P-Value: {p_value:.4f}")

# Compare the p-value to the significance level (alpha) to make a decision
alpha = 1 - confidence_level
if p_value < alpha:
    print("Reject the null hypothesis: The drug is significantly effective.")
else:
    print("Fail to reject the null hypothesis: The drug is not significantly effective.")

Sample Size: 50
Sample Mean: 6
Sample Standard Deviation: 2.5
Null Hypothesis Mean: 0
Confidence Level: 95.0%
Calculated t-statistic: 16.97
Degrees of Freedom: 49
Critical t-value: 2.01
P-Value: 0.0000
Reject the null hypothesis: The drug is significantly effective.


# Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

In [9]:
import math


sample_proportion = 0.65  # 65% satisfaction rate
confidence_level = 0.95  # 95% confidence level
sample_size = 500

# Calculate the critical value (z-score)
z = 1.96  # Approximate value for a 95% confidence interval

# Calculate the margin of error
margin_of_error = z * math.sqrt((sample_proportion * (1 - sample_proportion)) / sample_size)

# Calculate the lower and upper limits of the confidence interval
lower_limit = sample_proportion - margin_of_error
upper_limit = sample_proportion + margin_of_error


print(f"Sample Proportion: {sample_proportion}")
print(f"Confidence Level: {confidence_level * 100}%")
print(f"Sample Size: {sample_size}")
print(f"Margin of Error: {margin_of_error:.4f}")
print(f"95% Confidence Interval: ({lower_limit:.4f}, {upper_limit:.4f})")

Sample Proportion: 0.65
Confidence Level: 95.0%
Sample Size: 500
Margin of Error: 0.0418
95% Confidence Interval: (0.6082, 0.6918)


# Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82
# with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

In [10]:
import scipy.stats as stats


mean_A = 85
std_dev_A = 6
sample_size_A = 30  # Number of students in sample A

# Sample statistics for method B
mean_B = 82
std_dev_B = 5
sample_size_B = 30  # Number of students in sample B

# Set the significance level (alpha)
alpha = 0.01

# Calculate the t-statistic
t_statistic, p_value = stats.ttest_ind_from_stats(mean_A, std_dev_A, sample_size_A, mean_B, std_dev_B, sample_size_B)


print(f"Sample A Mean: {mean_A}")
print(f"Sample A Standard Deviation: {std_dev_A}")
print(f"Sample A Size: {sample_size_A}")
print(f"Sample B Mean: {mean_B}")
print(f"Sample B Standard Deviation: {std_dev_B}")
print(f"Sample B Size: {sample_size_B}")
print(f"Significance Level (alpha): {alpha}")
print(f"T-Statistic: {t_statistic:.4f}")
print(f"P-Value: {p_value:.4f}")


if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference in student performance between the two teaching methods.")
else:
    print("Fail to reject the null hypothesis: There is no significant difference in student performance between the two teaching methods.")

Sample A Mean: 85
Sample A Standard Deviation: 6
Sample A Size: 30
Sample B Mean: 82
Sample B Standard Deviation: 5
Sample B Size: 30
Significance Level (alpha): 0.01
T-Statistic: 2.1039
P-Value: 0.0397
Fail to reject the null hypothesis: There is no significant difference in student performance between the two teaching methods.


# Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.

In [11]:
import math

sample_mean = 65
population_mean = 60
population_std_dev = 8
sample_size = 50

# Calculate the critical value (z-score) for a 90% confidence interval
z = 1.645  # Approximate value for a 90% confidence interval

# Calculate the margin of error
margin_of_error = z * (population_std_dev / math.sqrt(sample_size))

# Calculate the lower and upper limits of the confidence interval
lower_limit = sample_mean - margin_of_error
upper_limit = sample_mean + margin_of_error

print(f"Sample Mean: {sample_mean}")
print(f"Population Mean: {population_mean}")
print(f"Population Standard Deviation: {population_std_dev}")
print(f"Sample Size: {sample_size}")
print(f"Critical Value (z): {z}")
print(f"Margin of Error: {margin_of_error:.4f}")
print(f"90% Confidence Interval: ({lower_limit:.4f}, {upper_limit:.4f})")

Sample Mean: 65
Population Mean: 60
Population Standard Deviation: 8
Sample Size: 50
Critical Value (z): 1.645
Margin of Error: 1.8611
90% Confidence Interval: (63.1389, 66.8611)


# Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to
# determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In [12]:
import scipy.stats as stats


sample_mean = 0.25  # Average reaction time
sample_std_dev = 0.05  # Standard deviation of reaction time
sample_size = 30  # Number of participants

# Set the null hypothesis
null_hypothesis_mean = 0.24  # Null hypothesis: No significant effect of caffeine on reaction time

# Set the confidence level
confidence_level = 0.90  # 90% confidence level

# Calculate the t-statistic
t_statistic, p_value = stats.ttest_1samp([sample_mean], null_hypothesis_mean)


print(f"Sample Mean: {sample_mean}")
print(f"Sample Standard Deviation: {sample_std_dev}")
print(f"Sample Size: {sample_size}")
print(f"Null Hypothesis Mean: {null_hypothesis_mean}")
print(f"Confidence Level: {confidence_level * 100}%")
print(f"T-Statistic: {t_statistic:.4f}")
print(f"P-Value: {p_value:.4f}")

# Compare the p-value to the significance level (alpha) to make a decision
alpha = 1 - confidence_level
if p_value < alpha:
    print("Reject the null hypothesis: Caffeine has a significant effect on reaction time.")
else:
    print("Fail to reject the null hypothesis: Caffeine has no significant effect on reaction time.")

Sample Mean: 0.25
Sample Standard Deviation: 0.05
Sample Size: 30
Null Hypothesis Mean: 0.24
Confidence Level: 90.0%
T-Statistic: nan
P-Value: nan
Fail to reject the null hypothesis: Caffeine has no significant effect on reaction time.


  t_statistic, p_value = stats.ttest_1samp([sample_mean], null_hypothesis_mean)
  var *= np.divide(n, n-ddof)  # to avoid error on division by zero
  var *= np.divide(n, n-ddof)  # to avoid error on division by zero
