## Q1. What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

t-test and z-test are both statistical tests that are used to determine whether there is a significant difference between two means. However, they differ in their assumptions about the population variance and the sample size.

A z-test is used when the population variance is known and the sample size is large (usually greater than 30).

Example:

    Suppose a company wants to determine if the average weight of their product is equal to the weight specified on the packaging, which is 50 grams. They take a random sample of 100 products and measure their weights. The population variance of the weights is known to be 4 grams. In this case, a z-test would be appropriate to test the hypothesis that the mean weight of the products is 50 grams.

A t-test is used when the population variance is unknown and the sample size is small (usually less than 30).

Example:

    Suppose a researcher wants to determine if a new treatment is effective in reducing blood pressure. They randomly assign 20 participants to receive the new treatment or a placebo and measure their blood pressure before and after the treatment. The researcher does not know the population variance of the blood pressure measurements. In this case, a t-test would be appropriate to test the hypothesis that the mean reduction in blood pressure is greater in the treatment group compared to the placebo group.

## Q2. Differentiate between one-tailed and two-tailed tests.

A one-tailed test, also known as a directional test, is a statistical test in which the hypothesis is tested in only one direction. It is used when there is a priori information or a specific prediction about the direction of the difference between two sample groups. The hypothesis is tested either to determine whether the sample mean is greater than or less than a known value, or whether it is different from that value in a particular direction.

A two-tailed test, also known as a non-directional test, is a statistical test in which the hypothesis is tested in both directions. It is used when there is no a priori information or specific prediction about the direction of the difference between two sample groups. The hypothesis is tested to determine whether the sample mean is different from a known value in any direction

## Q3. Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

## Type 1 Error:
A Type 1 error occurs when the null hypothesis is rejected when it should not be. It means that a significant difference is found when there is actually no difference. The probability of making a Type 1 error is denoted by the alpha level, which is typically set at 0.05 or 0.01.

Example:

    Suppose a new drug is tested to see if it is effective in reducing pain. The null hypothesis is that the drug has no effect on pain, but in reality, it does. However, the study concludes that the drug is not effective, and the null hypothesis is rejected, leading to a Type 1 error. This can result in a potentially harmful drug being rejected, leading to patients not receiving the treatment they need.

## Type 2 Error:
A Type 2 error occurs when the null hypothesis is not rejected when it should be. It means that no significant difference is found when there is actually a difference. The probability of making a Type 2 error is denoted by the beta level, which is typically set at 0.10 or 0.20.

Example:

    Suppose a new drug is tested to see if it is effective in reducing pain. The null hypothesis is that the drug has no effect on pain, and in reality, it doesn't. However, the study concludes that the drug is effective, and the null hypothesis is not rejected, leading to a Type 2 error. This can result in a potentially ineffective drug being approved, leading to patients receiving a treatment that does not work.

## Q4. Explain Bayes's theorem with an example.


Bayes's theorem is a mathematical formula used in probability theory to calculate the conditional probability of an event based on prior knowledge of related events. It provides a way to update the probability of a hypothesis based on new evidence.

The formula for Bayes's theorem is:

    P(A|B) = (P(B|A) * P(A)) / P(B)

    Where:
    P(A|B) is the probability of event A given that event B has occurred.
    P(B|A) is the probability of event B given that event A has occurred.
    P(A) is the prior probability of event A.
    P(B) is the prior probability of event B.

Example:

    Suppose a doctor knows that the probability of a patient having a certain disease is 0.05. The doctor also knows that if a person has the disease, the probability of a certain test detecting it is 0.95. However, if a person does not have the disease, there is still a 5% chance that the test will produce a false positive result.

If a patient tests positive for the disease, what is the probability that the patient actually has the disease?

Using the formula, we can calculate the probability of the patient having the disease given that they have tested positive as follows:

    P(A|B) = (P(B|A) * P(A)) / (P(B|A) * P(A) + P(B|~A) * P(~A))
    = (0.95 * 0.05) / (0.95 * 0.05 + 0.05 * 0.95)
    = 0.5 or 50%



## Q5. What is a confidence interval? How to calculate the confidence interval, explain with an example.

A confidence interval is a range of values that is likely to contain the true value of a population parameter with a certain level of confidence. It is used in statistics to estimate an unknown population parameter, such as a mean or proportion, based on a sample of data.

The formula for calculating a confidence interval for a population mean is:

    CI = X̄ ± z * (s / sqrt(n))

    Where:
    CI is the confidence interval
    X̄ is the sample mean
    z is the z-score associated with the desired level of confidence (e.g., 1.96 for a 95% confidence interval)
    s is the sample standard deviation
    n is the sample size

Example:

    Suppose we want to estimate the mean height of all adult males in a certain population. We take a random sample of 50 adult males and measure their heights. The sample mean height is 175 cm, and the sample standard deviation is 8 cm.

We want to calculate a 95% confidence interval for the population mean height.

First, we need to find the z-score associated with a 95% confidence level. From a standard normal distribution table, we can see that the z-score for a 95% confidence level is 1.96.

Next, we can use the formula to calculate the confidence interval:

    CI = 175 ± 1.96 * (8 / sqrt(50))
    = 175 ± 2.83
    = (172.17, 177.83)

Therefore, we can be 95% confident that the true population mean height falls between 172.17 and 177.83 cm.

## Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

Suppose a company sells two types of products: product A and product B. Historically, the company has found that 40% of its customers buy product A and 60% buy product B. The company is now introducing a new product, product C, and wants to estimate the probability that a customer will buy it. They conduct a survey of 100 randomly selected customers and find that 30 of them bought product A, 40 bought product B, and 10 bought product C.

Using Bayes' Theorem, we have:

    P(C | data) = P(data | C) * P(C) / P(data)

We can calculate P(data | C), P(C) and P(data) as follows:

    P(data | C) = 10/100 = 0.1, since 10 out of 100 customers bought product C.
    P(C) = 1/N = 10/100 = 0.1, where N is the total number of customers in the sample.
    P(data) = P(data | A) * P(A) + P(data | B) * P(B) + P(data | C) * P(C) = (30/100 * 0.4) + (40/100 * 0.6) + (10/100 * 0.1)     = 0.34.

Finally, plug the values in the formula as:

    P(C | data) = P(data | C) * P(C) / P(data)
    = 0.1 * 0.1 / 0.34
    = 0.029 or about 2.9%

So, based on the survey data, we estimate that the probability that a customer will buy product C is about 2.9%.

## Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.




The final conclusion is:

    The range of values that lie inside 95% confidence interval region are: 48.04 to 51.96
    
 ![7.jpeg](attachment:2658c082-c2eb-4cd9-ab32-d90dbfc8c62c.jpeg)


# Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

The margin of error is a measure of the uncertainty in an estimated population parameter based on a sample. It is the range of values around the point estimate that is likely to contain the true population parameter with a certain level of confidence.

Sample size affects the margin of error in that a larger sample size will generally result in a smaller margin of error. This is because a larger sample size provides more information about the population and reduces the impact of random sampling variability on the point estimate. Specifically, the margin of error decreases as the square root of the sample size increases.

Example:

    Suppose we want to estimate the proportion of people in a city who support a certain policy. We take a random sample of 500 people and find that 60% of them support the policy. We also calculate a 95% confidence interval with a margin of error of 4%. This means that we are 95% confident that the true proportion of people who support the policy in the city falls between 56% and 64%.

    Now, suppose we increase the sample size to 1000. We recalculate the point estimate as 60% and the margin of error as 2.8%. This means that we are 95% confident that the true proportion of people who support the policy in the city falls between 57.2% and 62.8%. As we can see, a larger sample size has resulted in a smaller margin of error and a narrower confidence interval.


## Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5 Interpret the results.

Final answer is:

    Z-score = 1
    Which means that the point is 1 standard deviation to the right of mean.
    
![9.jpeg](attachment:7826052f-d34d-48e0-b8a9-e6a2b91731d1.jpeg)
    

## Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

Final answer is:

    Reject the null hypothesis
    Drug is sinificantly effective
    
![10.jpeg](attachment:9b047db4-0282-4466-977b-9e4f9ef5fe22.jpeg)
    
    
    
![10-2.jpeg](attachment:3e4f2dc0-2430-4b26-af07-a5caf9ec5752.jpeg)



## Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

Final answer is:
 
    Confidence Interval: 0.606 to 0.694
    
   ![11.jpeg](attachment:ae223d5f-d75e-43f8-914f-28547fd70a22.jpeg) 
    

## Q 12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

Final answer is:

    Failed to reject null hypothesis

No significant difference in students performance between the two teaching methods

![12.jpeg](attachment:12.jpeg)







![12-2.jpeg](attachment:12-2.jpeg)

## Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.

Final answer is:

    Confidence Interval: 62.076 to 67.924
    
![13.jpeg](attachment:13.jpeg)

## Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

Final answer is:

    Failed to reject null hypothesis
    Caffeine has no significant affect on reaction time
    
![14.jpeg](attachment:14.jpeg)



![14-2.jpeg](attachment:14-2.jpeg)