Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would
use each type of test.

#### Difference : 
#### 1) Population Standard Deviation:
#### Z-Test: Assumes that the population standard deviation (σ) is known.
#### T-Test: Used when the population standard deviation (σ) is unknown, and the sample standard deviation (s) is used as an estimate.
#### 2) Sample Size:
#### Z-Test: Typically used for larger sample sizes (usually n≥30).
#### T-Test: Appropriate for smaller sample sizes, especially when the population standard deviation is unknown.
#### 3) Distribution:
#### Z-Test: Assumes a normal distribution of the population or a sufficiently large sample size due to the Central Limit Theorem.
#### T-Test: More robust in cases of smaller sample sizes and does not strictly rely on normality assumptions.
#### Example for a Z-Test:
#### Imagine you are analyzing the scores of a standardized test that claims to have a known population standard deviation of 10 points. You want to test whether the average score of a sample of 100 students is significantly different from the claimed population mean of 500 points. In this case, you could use a z-test because the population standard deviation is known.

In [1]:
# Example in Python using z-test
import scipy.stats as stats

sample_mean = 505  # sample mean score
population_mean = 500  # hypothesized population mean
population_std_dev = 10  # known population standard deviation
sample_size = 100

# Calculate the z-statistic
z_statistic = (sample_mean - population_mean) / (population_std_dev / (sample_size ** 0.5))

# Perform the hypothesis test
p_value = 2 * (1 - stats.norm.cdf(abs(z_statistic)))  # two-tailed test

if p_value < 0.05:
    print("Reject the null hypothesis. There is evidence of a significant difference.")
else:
    print("Fail to reject the null hypothesis. There is not enough evidence of a significant difference.")


Reject the null hypothesis. There is evidence of a significant difference.


#### Example Scenario for a T-Test:
#### Now, let's consider a scenario where you are studying the average height of a sample of 20 individuals, and you don't know the population standard deviation. In this case, you would use a t-test.

In [2]:
# Example in Python using t-test
import scipy.stats as stats

sample_mean_height = 170  # sample mean height in centimeters
hypothesized_population_mean = 175  # hypothesized population mean height
sample_std_dev_height = 8  # sample standard deviation of height
sample_size_height = 20

# Calculate the t-statistic
t_statistic_height = (sample_mean_height - hypothesized_population_mean) / (sample_std_dev_height / (sample_size_height ** 0.5))

# Degrees of freedom for a one-sample t-test
degrees_of_freedom_height = sample_size_height - 1

# Significance level
alpha_height = 0.05

# Calculate the critical value (one-tailed test)
critical_value_height = stats.t.ppf(1 - alpha_height, degrees_of_freedom_height)

# Perform the hypothesis test
if t_statistic_height < critical_value_height:
    print("Reject the null hypothesis. There is evidence that the average height is less than 175 cm.")
else:
    print("Fail to reject the null hypothesis. There is not enough evidence that the average height is less than 175 cm.")


Reject the null hypothesis. There is evidence that the average height is less than 175 cm.


Q2: Differentiate between one-tailed and two-tailed tests.

### One-Tailed Test: 
#### 1) Also known as a "directional" or "one-sided" test. Focuses on testing whether a parameter is greater than or less than a certain value. The null hypothesis (H0) specifies a particular direction, and the alternative hypothesis (H1) focuses on that direction.
#### 2) The critical region is on one side of the distribution (either the right or left side).
#### 3) Used when you are specifically interested in determining if the parameter is greater than or less than a certain value, not both.
#### Example:
#### H0:μ=10 (population mean is equal to 10)
#### H1:μ>10 (population mean is greater than 10)
### Two-Tailed Test:
#### 1) Also known as a "non-directional" or "two-sided" test. Examines whether a parameter is significantly different from a certain value, regardless of the direction. The null hypothesis (H0) usually states that there is no difference, and the alternative hypothesis (H1) states that there is a significant difference.
#### 2) The critical region is split between both sides of the distribution (both the right and left sides).
#### 3) Used when you are interested in determining if the parameter is different (greater or less) from a certain value.
#### Example:
#### H0:μ=10 (population mean is equal to 10)
#### H1:μ!=10 (population mean is not equal to 10)

Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for
each type of error.

### Type I Error (False Positive): 
#### Occurs when the null hypothesis (H0) is incorrectly rejected when it is actually true. The probability of committing a Type I error is denoted by the significance level (α). The significance level is the probability of rejecting the null hypothesis when it is true, leading to an incorrect conclusion that there is a significant effect or difference.
#### Example Scenario:
#### Suppose a medical researcher is testing a new drug's effectiveness on a certain condition, and the null hypothesis is that the drug has no effect. A Type I error occurs if the researcher incorrectly concludes that the drug is effective (rejects the null hypothesis) when, in reality, it has no effect.
### Type II Error (False Negative):
#### Occurs when the null hypothesis (H0) is not rejected when it is actually false. The probability of committing a Type II error is denoted by β. It is related to the power of a statistical test, where power is the probability of correctly rejecting a false null hypothesis.
#### Example Scenario:
#### Continuing with the drug example, a Type II error occurs if the researcher fails to reject the null hypothesis (concludes the drug has no effect) when, in reality, the drug does have a positive effect.

Q4: Explain Bayes's theorem with an example.

#### Bayes's Theorem is a mathematical formula that describes the probability of an event based on prior knowledge of conditions related to the event. It is named after the Reverend Thomas Bayes, who first formulated the theorem. Bayes's Theorem is widely used in statistics and probability theory and is particularly useful for updating probabilities based on new evidence.
#### The formula for Bayes's Theorem is given by: P(A|B)=(P(B|A)*P(A))/P(B)
#### where : 
##### P(A|B) is the probability of event A occurring given that B has occurred (the posterior probability).
##### P(B|A) is the probability of event B occurring given that A has occurred (the likelihood).
##### P(A) is the prior probability of event A.
##### P(B) is the prior probability of event B.
#### Example :
#### Suppose there is a rare disease (event A) that affects 1 in 1,000 people in a population. A medical test (event B) is available to detect the disease, and the test is 99% accurate for both people with and without the disease. Let's use Bayes's Theorem to calculate the probability of having the disease given a positive test result.
#### P(A) : Prior probability of having the disease = 0.001 (1 in 1,000).
#### P(B|A) : Probability of testing positive given that the person has the disease = 0.99 (99% accuracy).
#### P(B|-A) : Probability of testing positive given that the person does not have the disease = 0.01 (1% false positive rate).
#### Now, we can use Bayes's Theorem to calculate the posterior probability :
#### P(A|B)=(P(B|A) * P(A))/P(B) = (0.99 * 0.001)/P(B) 
#### To find P(B), we can use the Law of Total Probability:
#### P(B)=P(B|A) * P(A)+P(B|-A) * P(-A) = (0.99 * 0.001)+(0.01 * 0.999) 
#### Now, we can substitute this value back into Bayes's Theorem: 
#### P(A|B)=(0.99 * 0.001)/((0.99 * 0.001)+(0.01 * 0.999))
#### After calculating this expression, we find the updated probability of having the disease given a positive test result.

Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

#### A confidence interval is a range of values that is likely to contain the true parameter of interest with a certain level of confidence. In statistics, it provides a way to express the uncertainty associated with estimating a population parameter from a sample.
#### The general form of a confidence interval is:
#### Confidence Interval = Point Estimate +- Margin of error 
#### where:
##### Point Estimate: The sample statistic (e.g. the sample mean or proportion) used to estimate the population parameter.
##### Margin of Error: A measure of the uncertainty or variability in the estimate.
#### The formula for a confidence interval for the population mean (μ) is given by:
#### Confidence interval = x_bar +- Z * (sigma/sqrt(n))
#### where : 
##### x_bar is the sample mean.
##### Z is the critical value from the standard normal distribution corresponding to the desired confidence level.
##### sigma is the population standard deviation.
##### n is the sample size.
#### Example: Calculating a 95% Confidence Interval for the Mean:
#### Suppose you want to estimate the average height of a population. You take a random sample of 40 individuals and find that the sample mean height (x_bar) is 175 cm, and the sample standard deviation (sigma) is 8 cm.
#### 1) Determine the Critical Value (Z):
#### For a 95% confidence interval, the critical value is approximately 1.96 (obtained from a standard normal distribution table).
#### 2) Calculate the Margin of Error:
#### Margin of Error = 1.96 * (8/sqrt(40))
#### 3) Calculate the Confidence Interval:
#### Confidence interval = 175 +- (1.96 * (8/sqrt(40)))
#### The resulting interval, in this case, would be the range of heights within which we are 95% confident the true average height of the population lies.

Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the
event's probability and new evidence. Provide a sample problem and solution.

### Example Scenario: Medical Test for a Rare Disease
#### Suppose there is a rare disease, and a diagnostic test has been developed to detect it. Let's denote the following:
#### A: The event of having the disease.
#### B: The event of testing positive for the disease.
#### The goal is to calculate P(A∣B), the probability of actually having the disease given a positive test result.
#### Given probabilities:
#### P(A) : Prior probability of having the disease = 0.001 (1 in 1,000).
#### P(B|A) : Probability of testing positive given that the person has the disease = 0.99 (99% accuracy).
#### P(B|-A) : Probability of testing positive given that the person does not have the disease = 0.01 (1% false positive rate).
#### Now, we can use Bayes's Theorem to calculate the posterior probability :
#### P(A|B)=(P(B|A) * P(A))/P(B) = (0.99 * 0.001)/P(B) 
#### To find P(B), we can use the Law of Total Probability:
#### P(B)=P(B|A) * P(A)+P(B|-A) * P(-A) = (0.99 * 0.001)+(0.01 * 0.999)  where P(-A) is the complement of P(A) . 
#### Now, we can substitute this value back into Bayes's Theorem: 
#### P(A|B)=(0.99 * 0.001)/((0.99 * 0.001)+(0.01 * 0.999)) = 0.0902 
#### So, given a positive test result, the probability of actually having the disease is approximately 0.0902 or 9.02%. 

Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5. Interpret the results.

#### The formula for a confidence interval for the population mean (μ) is given by:
#### Confidence interval = x_bar +- Z * (sigma/sqrt(n))
#### where : 
##### x_bar is the sample mean.
##### Z is the critical value from the standard normal distribution corresponding to the desired confidence level.
##### sigma is the population standard deviation.
##### n is the sample size
#### Given :
##### For a 95% confidence interval, the critical value is approximately 1.96 (obtained from a standard normal distribution table).
##### x_bar = 50
##### sigma = 5
##### We'll use a common convention of a sufficiently large sample size, such as n=30.
#### Margin of Error = Z * (sigma/sqrt(n)) = 1.96 * (5/sqrt(30))
#### Confidence interval = 50 +- (1.96 * (5/sqrt(30))) = (48.21 , 51.79)
#### Interpretation : "We are 95% confident that the true population mean lies between the lower bound and the upper bound of the interval. In this case, the confidence interval is centered around the sample mean of 50, and it is adjusted by the margin of error calculated based on the standard deviation and sample size."

Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error?
Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

#### The margin of error (ME) in a confidence interval is a measure of the precision or uncertainty associated with estimating a population parameter from a sample. It represents the range within which we are reasonably confident that the true parameter lies. The formula for the margin of error in a confidence interval for the population mean (μ) with a known standard deviation (σ) is given by:
#### Margin of error = Z * (σ/sqrt(n)
#### where:
##### Z is the critical value from the standard normal distribution corresponding to the desired confidence level.
##### σ is the population standard deviation.
##### n is the sample size.
#### How Sample Size Affects the Margin of Error:
#### 1) Inverse Relationship: The margin of error is inversely proportional to the square root of the sample size. As the sample size increases, the margin of error decreases, leading to a more precise estimate.
#### 2) Larger Sample Size, Smaller Margin of Error: Increasing the sample size results in a smaller margin of error because the standard error (σ/sqrt(n)) becomes smaller. A larger sample provides more information about the population, reducing the uncertainty associated with the estimate.
#### Example Scenario : Let's consider a scenario where we want to estimate the average height of a certain population. We conduct two separate studies, one with a sample size of 50 (n1=50) and another with a sample size of 200 (n2=200). For both studies, we use a 95% confidence level.
#### Assuming the population standard deviation (σ) is known to be 5 cm, we can calculate the margin of error for each study:
#### ME1 = 1.96 * 5/sqrt(50) 
#### ME2 = 1.96 * 5/sqrt(200)
#### We will observe that the margin of error for Study 2 (ME2) is smaller than the margin of error for Study 1 (ME1), demonstrating the inverse relationship between sample size and margin of error. The larger sample size in Study 2 leads to a more precise estimate of the population mean.

Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population
standard deviation of 5. Interpret the results.

#### The z-score is a measure that describes how many standard deviations a particular data point is from the mean of a population. The formula for calculating the z-score is:
#### Z= (X−μ)/σ
#### Given :
#### X=75
#### μ=70
#### σ=5 
#### z-score = Z = (75-70)/5 = 1
#### Interpretation:
#### The calculated z-score of 1 indicates that the data point with a value of 75 is 1 standard deviation above the population mean. In a standard normal distribution, a z-score of 1 corresponds to the data point being at the 84th percentile, meaning it is higher than approximately 84% of the data points in the distribution.

Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average
of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is
significantly effective at a 95% confidence level using a t-test.

#### H0: The mean weight loss with the drug is not significant (μ=0).
#### H1: The mean weight loss with the drug is significant (μ!=0).
#### Given information:
#### Sample mean (x_bar): 6 pounds
####  standard deviation (s): 2.5 pounds
#### Sample size (n): 50
#### Significance level (α): 0.05 (for a 95% confidence level)
#### The formula for the t-statistic is: 
#### t = x_bar/(s/sqrt(n)) = 6/(2.5/sqrt(50)) = 16.97 
#### Now, we need to compare the calculated t-statistic with the critical t-value for a two-tailed test at a 95% confidence level with n−1 degrees of freedom. In this case, degrees of freedom is 49.
#### We can use a t-distribution table or statistical software to find the critical t-value. For a two-tailed test with α=0.05 and 49 degrees of freedom, the critical t-value is approximately ±2.0096.
#### Since 16.97 is greater than 2.0096, we reject the null hypothesis.
#### Interpretation:
#### At a 95% confidence level, we have enough evidence to conclude that the weight loss drug is significantly effective, as the mean weight loss is different from zero.

Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95%
confidence interval for the true proportion of people who are satisfied with their job.

#### Confidence interval = p_hat +- ( Z * sqrt((p_hat * (1-p_hat))/n )) 
#### where:
##### p_hat is the sample proportion (percentage converted to a decimal),
##### Z is the critical value from the standard normal distribution corresponding to the desired confidence level,
##### n is the sample size.
#### Given information:
##### Sample proportion (p_hat): 65% or 0.65 (converted to a decimal)
##### Sample size (n): 500
##### Confidence level: 95%
##### For a 95% confidence level, the critical value is approximately 1.96 i.e Z=1.96 
####  Confidence interval = 0.65 +- ( 1.96 * sqrt((0.65 * (1-0.65))/500 ) ) = (0.608 , 0.692)
#### Interpretation:
#### "We are 95% confident that the true proportion of people satisfied with their job is between the lower and upper bounds of the interval."

Q12. A researcher is testing the effectiveness of two different teaching methods on student performance.
Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82
with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a
significant difference in student performance using a t-test with a significance level of 0.01.

#### The null hypothesis (H0) and alternative hypothesis (H1) are defined as follows:
#### H0:μ1−μ2=0
#### H0:μ1−μ2!=0
#### Sample A : x_bar1 = 85 , s1 = 6
#### Sample B : x_bar2 = 82 , s2 = 5 
#### Significance level (α): 0.01
#### Assumptions : Assume the sample sizes are reasonably large (often considered n≥30). Assume equal variances (pooled standard deviation approach).
#### Find the Critical Value : For a two-tailed test with α=0.01, the critical t-value is approximately ±2.626.
#### Calculate the t-Statistic:
#### t = (x_bar1-x_bar2)/sqrt((s1^2/n1)+(s2^2/n2)) = (85-82)/sqrt((36/n1)+(25/n2))
#### Compare with the Critical Value : If ∣t∣ > critical value , then reject the null hypothesis.
#### If the calculated t-statistic falls outside the critical range, you would reject the null hypothesis, suggesting a significant difference in student performance between the two teaching methods.

Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean
of 65. Calculate the 90% confidence interval for the true population mean.

#### The formula for a confidence interval for the population mean (μ) is given by:
#### Confidence interval = x_bar +- Z * (sigma/sqrt(n))
#### where : 
##### x_bar is the sample mean.
##### Z is the critical value from the standard normal distribution corresponding to the desired confidence level.
##### sigma is the population standard deviation.
##### n is the sample size
#### Given :
##### For a 90% confidence interval, the critical value is approximately 1.645 (obtained from a standard normal distribution table).
##### x_bar = 65
##### sigma = 8
##### n=50.
#### Margin of Error = Z * (sigma/sqrt(n)) = 1.645 * (8/sqrt(50))
#### Confidence interval = 65 +- (1.645 * (8/sqrt(50))) 
#### Interpretation:
#### "We are 90% confident that the true population mean is between the lower and upper bounds of the interval."

Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average
reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to
determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

#### H0: Caffeine has no significant effect on reaction time (μ=0).
#### H1: Caffeine has significant effect on reaction time (μ!=0).
#### Given information:
#### Sample mean (x_bar): 0.25 pounds
####  standard deviation (s): 0.05 pounds
#### Sample size (n): 30
#### Significance level (α): 0.10 (for a 90% confidence level)
#### The formula for the t-statistic is: 
#### t = x_bar/(s/sqrt(n)) = 0.25/(0.05/sqrt(30)) = 27.38 
#### Now, we need to compare the calculated t-statistic with the critical t-value for a two-tailed test at a 90% confidence level with n−1 degrees of freedom. In this case, degrees of freedom is 29.
#### We can use a t-distribution table or statistical software to find the critical t-value. For a two-tailed test with α=0.10 and 29 degrees of freedom, the critical t-value is approximately ±1.699.
#### Since 27.38 is greater than 1.699, we reject the null hypothesis.
#### Interpretation:
#### At a 90% confidence level, we have enough evidence to conclude that caffeine has a significant effect on reaction time, as the mean reaction time is different from zero.