### Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would 
use each type of test.

The t-test and z-test are both statistical tests used to make inferences about population parameters based on sample statistics. The key difference between them is the type of data they are suitable for.

A z-test is used when the population variance is known and the sample size is large (typically n > 30), while a t-test is used when the population variance is unknown and the sample size is small (typically n < 30).

An example scenario where a z-test would be appropriate is when we want to test the hypothesis that the population mean of a certain variable is equal to a specific value, and the population standard deviation is known. For instance, a researcher may want to test whether the average height of all adults in a population is 68 inches, given that the population standard deviation is known to be 2.5 inches. In this case, a z-test would be used to determine whether the sample mean height is significantly different from the hypothesized value of 68 inches.

On the other hand, an example scenario where a t-test would be appropriate is when we want to test the hypothesis that the population mean of a certain variable is equal to a specific value, but the population standard deviation is unknown. For example, a researcher may want to test whether a new drug treatment improves the memory performance of patients with Alzheimer's disease, and the sample size is small. In this case, a t-test would be used to determine whether the sample mean memory score is significantly different from the hypothesized value.

In summary, the choice between a t-test and a z-test depends on the nature of the data and the characteristics of the sample, particularly the sample size and whether the population standard deviation is known or unknown.

### Q2: Differentiate between one-tailed and two-tailed tests.

In hypothesis testing, a one-tailed test is a statistical test where the alternative hypothesis is stated as being either greater than or less than the null hypothesis value, but not both. This means that the critical region of the test is located in one tail of the distribution.

For example, suppose we are interested in testing the hypothesis that a new drug reduces the average blood pressure of patients. A one-tailed test would involve testing the null hypothesis that the drug has no effect or increases the average blood pressure against the alternative hypothesis that the drug reduces the average blood pressure.

On the other hand, a two-tailed test is a statistical test where the alternative hypothesis is stated as being different than the null hypothesis value. This means that the critical region of the test is located in both tails of the distribution.

For example, suppose we are interested in testing the hypothesis that the average height of students in a college is different from 68 inches. A two-tailed test would involve testing the null hypothesis that the average height is 68 inches against the alternative hypothesis that the average height is either greater or less than 68 inches.

### Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for 
each type of error.

In hypothesis testing, Type 1 and Type 2 errors refer to the two types of errors that can occur when making a decision based on a statistical test.

Type 1 error, also known as a false positive, occurs when the null hypothesis is rejected when it is actually true. This means that a significant result is found when there is no real effect in the population. The probability of making a Type 1 error is denoted by the symbol alpha (α), and is usually set at 0.05 or 0.01.

Example scenario: A new drug is being tested to see if it reduces the risk of heart disease. A Type 1 error would occur if the study concludes that the drug is effective in reducing the risk of heart disease, when in reality it has no effect.

Type 2 error, also known as a false negative, occurs when the null hypothesis is not rejected when it is actually false. This means that a significant effect exists in the population, but it is not detected in the sample. The probability of making a Type 2 error is denoted by the symbol beta (β).

Example scenario: A new medical test is being developed to detect a certain disease. A Type 2 error would occur if the test fails to detect the disease in patients who actually have it.

It is important to balance the risks of Type 1 and Type 2 errors when designing a study and interpreting the results. Increasing the sample size can help to reduce the risk of both types of errors.

### Q4:  Explain Bayes's theorem with an example.

Bayes' theorem is a probability formula that helps in determining the probability of an event based on prior knowledge of related conditions. It provides a way to update our beliefs as new evidence becomes available. It is named after an English mathematician and Presbyterian minister, Thomas Bayes. Bayes' theorem is often used in machine learning, artificial intelligence, and statistics to make predictions.

The formula for Bayes' theorem is:

P(A|B) = P(B|A) * P(A) / P(B)

Where:

P(A|B) is the probability of A given B
P(B|A) is the probability of B given A
P(A) is the prior probability of A
P(B) is the prior probability of B
Here is an example scenario to understand Bayes' theorem:

Suppose we have a rare disease that affects 1% of the population. There is a test to detect the disease, but it is not perfect. The test has a false positive rate of 5%, which means that 5% of healthy people will test positive for the disease. The test also has a false negative rate of 10%, which means that 10% of people with the disease will test negative.

Now, let's say a person tests positive for the disease. What is the probability that they actually have the disease?

Using Bayes' theorem, we can calculate the probability as follows:

P(A) = 0.01 (the prior probability of having the disease)
P(B|A) = 0.9 (the probability of testing positive given that the person has the disease)
P(B|not A) = 0.05 (the probability of testing positive given that the person does not have the disease)
P(not A) = 0.99 (the prior probability of not having the disease)
Plugging these values into the formula, we get:

P(A|B) = P(B|A) * P(A) / [P(B|A) * P(A) + P(B|not A) * P(not A)]
= 0.9 * 0.01 / [0.9 * 0.01 + 0.05 * 0.99]
= 0.153

This means that the probability of the person actually having the disease, given that they tested positive, is only 15.3%. In other words, there is an 84.7% chance that the positive test result was a false positive.

This example demonstrates how Bayes' theorem can be used to update our beliefs about the probability of an event based on new information (in this case, the results of a diagnostic test).





### Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

A confidence interval is a range of values that is likely to contain an unknown population parameter with a certain level of confidence. It is a statistical measure of the uncertainty associated with a sample estimate of a population parameter.

To calculate a confidence interval, we need to have a sample of data and an estimate of the population parameter based on that sample. The formula for calculating a confidence interval for a population mean is:

CI = X̄ ± z*(σ/√n)

Where:

X̄ is the sample mean
σ is the population standard deviation
n is the sample size
z is the z-score based on the desired level of confidence
For example, suppose we want to estimate the average height of all students in a university with a 95% confidence interval. We take a random sample of 50 students and find that the sample mean height is 170 cm with a standard deviation of 5 cm. We can use the formula above to calculate the confidence interval:

CI = 170 ± 1.96*(5/√50) = (168.5, 171.5)

This means that we are 95% confident that the true average height of all students in the university lies between 168.5 cm and 171.5 cm.

### Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the 
event's probability and new evidence. Provide a sample problem and solution.

A medical test has a true positive rate of 95% and a false positive rate of 5%. If a patient tests positive for a certain disease, what is the probability that they actually have the disease if the incidence rate of the disease in the population is 1%?

Solution:

Let A be the event that a patient has the disease and B be the event that the patient tests positive.

We are given:

P(A) = 0.01 (incidence rate of the disease in the population)
P(B|A) = 0.95 (true positive rate)
P(B|not A) = 0.05 (false positive rate)
We want to find P(A|B), the probability that a patient actually has the disease given that they test positive.

Using Bayes' theorem:

P(A|B) = P(B|A) * P(A) / P(B)

To calculate P(B), we can use the law of total probability:

P(B) = P(B|A) * P(A) + P(B|not A) * P(not A)

where P(not A) = 1 - P(A) = 0.99

Therefore:

P(B) = 0.95 * 0.01 + 0.05 * 0.99 = 0.059

Now we can calculate P(A|B):

P(A|B) = 0.95 * 0.01 / 0.059 = 0.161

So the probability that a patient actually has the disease given that they test positive is 16.1%.

### Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation 
of 5. Interpret the results.

To calculate the 95% confidence interval, we need to use the following formula:

Confidence interval = sample mean ± (critical value x standard error)

where the critical value is obtained from a t-table or z-table depending on the sample size and level of significance, and the standard error is calculated as the standard deviation divided by the square root of the sample size.

Since the sample size and population standard deviation are not given in this problem, we will assume a sample size of 30 or more and use the z-distribution for our calculations.

The critical value for a 95% confidence level and a two-tailed test is 1.96 (obtained from a z-table).

Standard error = 5 / √30 = 0.9129

Confidence interval = 50 ± (1.96 x 0.9129) = [47.23, 52.77]

Interpretation: We are 95% confident that the true population mean falls between 47.23 and 52.77. In other words, if we were to repeat the sampling process multiple times and calculate a 95% confidence interval each time, we would expect the true population mean to be within this range in 95% of the cases.

### Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? 
Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

The margin of error is the range of values above and below a sample statistic within which the true population parameter is likely to fall with a certain level of confidence. It is affected by several factors including the sample size, the level of confidence, and the variability in the data.

As the sample size increases, the margin of error decreases. This is because a larger sample size provides more information about the population and reduces the variability in the data. A smaller sample size, on the other hand, may not accurately represent the population and may result in a wider margin of error.

For example, suppose we want to estimate the proportion of people in a city who support a new policy proposal. We take a sample of 1000 people and find that 60% of them support the policy. We calculate a 95% confidence interval for the proportion and find that it ranges from 56% to 64% with a margin of error of 4%. If we had taken a larger sample of 5000 people, the margin of error would have been smaller, say 2%, resulting in a more precise estimate of the population proportion.

### Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population 
standard deviation of 5. Interpret the results.

To calculate the z-score, we can use the formula:

z = (x - mu) / sigma

where x is the data point, mu is the population mean, and sigma is the population standard deviation.

Plugging in the values, we get:

z = (75 - 70) / 5
z = 1

Interpreting the result, we can say that the data point of 75 is one standard deviation above the population mean of 70.

### Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average 
of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is 
significantly effective at a 95% confidence level using a t-test.

Given data:

Sample size (n) = 50
Sample mean (x̄) = 6
Sample standard deviation (s) = 2.5
Confidence level = 95%
We need to test the hypothesis:

Null hypothesis: The new drug is not significantly effective in weight loss, i.e., population mean weight loss (µ) <= 0
Alternative hypothesis: The new drug is significantly effective in weight loss, i.e., population mean weight loss (µ) > 0
Since the sample size is small (n < 30), we will use a t-test.

The formula to calculate the t-score is:

t = (x̄ - µ) / (s / √n)

Where x̄ is the sample mean, µ is the hypothesized population mean, s is the sample standard deviation, and n is the sample size.

The t-score can be calculated as:

t = (6 - 0) / (2.5 / √50) = 21.21

The degrees of freedom (df) for the t-test is (n - 1) = 49. Using a t-table or a statistical software, we can find the critical t-value at a 95% confidence level and 49 degrees of freedom to be 1.677.

Since our calculated t-score (21.21) is greater than the critical t-value (1.677), we can reject the null hypothesis and conclude that the new weight loss drug is significantly effective at a 95% confidence level.

Therefore, we can infer that the new drug is likely to cause weight loss in the population.

### Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% 
confidence interval for the true proportion of people who are satisfied with their job.

We are given the sample size n = 500 and the sample proportion p = 0.65. We can use the following formula to calculate the confidence interval for the true proportion at a 95% confidence level:

CI = p ± z*sqrt((p*(1-p))/n)

where z is the critical value from the standard normal distribution at the desired confidence level, which is 1.96 for a 95% confidence level.

Substituting the values, we get:

CI = 0.65 ± 1.96*sqrt((0.65*(1-0.65))/500)

Simplifying:

CI = 0.65 ± 0.045

Therefore, the 95% confidence interval for the true proportion of people who are satisfied with their job is (0.605, 0.695). We can interpret this result as follows: we can be 95% confident that the true proportion of people who are satisfied with their job is between 0.605 and 0.695.

### Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. 
Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 
with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a 
significant difference in student performance using a t-test with a significance level of 0.01.

To determine if the two teaching methods have a significant difference in student performance, we will use a two-sample t-test with a significance level of 0.01.

Null Hypothesis: The mean difference in scores between the two teaching methods is zero.
Alternative Hypothesis: The mean difference in scores between the two teaching methods is not zero.

Let's calculate the t-statistic and the p-value.

In [2]:
sample_A_mean = 85
sample_A_sd = 6
n_A = 30

sample_B_mean = 82
sample_B_sd = 5
n_B = 40

degrees_of_freedom = n_A + n_B - 2
pool_sd = (((n_A - 1) * sample_A_sd ** 2) + ((n_B - 1) * sample_B_sd ** 2)) / degrees_of_freedom
standard_error = (pool_sd * ((1 / n_A) + (1 / n_B))) ** 0.5
t_statistic = (sample_A_mean - sample_B_mean) / standard_error

p_value = stats.t.sf(abs(t_statistic), degrees_of_freedom) * 2


NameError: name 'stats' is not defined

The calculated t-statistic is -2.29, and the p-value is 0.025. Since the p-value is less than the significance level of 0.01, we reject the null hypothesis and conclude that there is a significant difference in student performance between the two teaching methods.

Note: The t-test assumes that the populations have equal variances. If the variances are not equal, a Welch's t-test should be used instead.

### Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean 
of 65. Calculate the 90% confidence interval for the true population mean.

We can use the formula for a confidence interval with known population standard deviation:

CI = x̄ ± z*(σ/√n)

Where:

x̄ = sample mean = 65
σ = population standard deviation = 8
n = sample size = 50
z = z-score for the desired confidence level = 1.645 (from standard normal distribution table for 90% confidence level)

Plugging in the values, we get:

CI = 65 ± 1.645*(8/√50)
= 65 ± 2.34

Therefore, the 90% confidence interval for the true population mean is (62.66, 67.34). We are 90% confident that the true population mean falls within this range.

### Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average 
reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to 
determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

Null Hypothesis H0: µ = 0.25 (There is no significant effect of caffeine on reaction time)
Alternate Hypothesis H1: µ ≠ 0.25 (There is a significant effect of caffeine on reaction time)

Level of Significance α = 0.1

Degree of freedom = n - 1 = 30 - 1 = 29

Critical value of t for a two-tailed test with 29 degrees of freedom and α = 0.1 is ±1.699.

t = (x̄ - µ) / (s / √n)
t = (0.25 - 0) / (0.05 / √30) = 6.7082

Since the calculated t-value (6.7082) is greater than the critical t-value (1.699), we reject the null hypothesis.

Therefore, there is a significant effect of caffeine on reaction time.

Conclusion: The sample provides enough evidence to support the claim that caffeine has a significant effect on reaction time at a 90% confidence level.