# Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

The main difference between a t-test and a z-test is the way they handle the standard deviation. A t-test is used when the population standard deviation is unknown and must be estimated from the sample data, while a z-test is used when the population standard deviation is known.

A t-test is typically used for smaller sample sizes (less than 30) and when the population standard deviation is unknown. For example, suppose a researcher wants to test whether the average weight of a sample of 20 male rats is different from the average weight of a sample of 20 female rats. Since the standard deviation of rat weights in the population is unknown, the researcher would use a two-sample t-test to compare the means of the two groups.

On the other hand, a z-test is used for larger sample sizes (greater than 30) and when the population standard deviation is known. For example, suppose a researcher wants to test whether a new drug is more effective than an existing drug for treating a particular disease. The researcher has a sample of 100 patients who have been randomly assigned to one of the two groups. Since the standard deviation of patient response in the population is known, the researcher would use a two-sample z-test to compare the means of the two groups.

# Q2: Differentiate between one-tailed and two-tailed tests.

In hypothesis testing, a one-tailed test is a statistical test in which the null hypothesis is rejected if the sample statistic is significantly greater than or less than the hypothesized population parameter in only one direction. The alternative hypothesis is defined in terms of this direction only. In other words, a one-tailed test tests for a directional effect.

On the other hand, a two-tailed test is a statistical test in which the null hypothesis is rejected if the sample statistic is significantly different from the hypothesized population parameter in either direction. The alternative hypothesis is defined in terms of a difference, not a specific direction. In other words, a two-tailed test tests for a non-directional effect.

For example, suppose a researcher is interested in testing whether the mean height of a sample of students is significantly different from the population mean height. If the researcher uses a two-tailed test, the null hypothesis would be that the sample mean is equal to the population mean, and the alternative hypothesis would be that the sample mean is not equal to the population mean. If the researcher uses a one-tailed test, the null hypothesis would be that the sample mean is equal to the population mean, and the alternative hypothesis would be that the sample mean is either greater than or less than the population mean.

# Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

- Type 1 and Type 2 errors are two types of errors that can occur in hypothesis testing.

Type 1 error occurs when a null hypothesis is rejected even though it is actually true. In other words, it is the error of concluding that there is a statistically significant effect or difference when one does not truly exist. The probability of making a Type 1 error is denoted by the symbol alpha (α) and is also called the level of significance. The level of significance is typically set at 0.05 or 0.01, which means that there is a 5% or 1% chance of making a Type 1 error, respectively.

Example scenario for Type 1 error:
Suppose a researcher is testing a new drug and wants to see if it is effective in reducing pain. They set the level of significance at 0.05 and find that the drug has a significant effect on pain reduction. However, it turns out that the drug does not actually work, and the significant result is due to random chance. This is a Type 1 error.

Type 2 error occurs when a null hypothesis is not rejected even though it is actually false. In other words, it is the error of failing to detect a real effect or difference. The probability of making a Type 2 error is denoted by the symbol beta (β) and depends on factors such as sample size, effect size, and level of significance.

Example scenario for Type 2 error:
Suppose a researcher is testing a new drug and wants to see if it is effective in reducing pain. They set the level of significance at 0.05 and find that the drug does not have a significant effect on pain reduction. However, it turns out that the drug does actually work, but the sample size was too small to detect the effect. This is a Type 2 error.

# Q4: Explain Bayes's theorem with an example.

Bayes's theorem is a fundamental concept in probability theory that describes the relationship between conditional probabilities. It provides a way to update our beliefs about the likelihood of an event occurring based on new evidence or information.

Bayes's theorem can be stated as follows:

P(A|B) = P(B|A) * P(A) / P(B)

where:

P(A|B) is the probability of event A occurring given that event B has occurred
P(B|A) is the probability of event B occurring given that event A has occurred
P(A) is the prior probability of event A occurring
P(B) is the prior probability of event B occurring

# Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

A confidence interval is a range of values that is likely to contain the true value of a population parameter with a certain level of confidence. It is a measure of the precision or uncertainty of an estimate, such as a sample mean or proportion. The confidence level is typically expressed as a percentage, such as 95%, and represents the proportion of intervals that would contain the true population parameter if the sampling process were repeated many times.

To calculate a confidence interval, we need to know the sample statistics, such as the sample mean, sample size, and sample standard deviation. The formula for calculating the confidence interval varies depending on the type of parameter being estimated and the assumptions about the population distribution.

Here's an example of calculating a confidence interval for a population mean using the t-distribution:

Suppose we want to estimate the average height of all male students in a certain university using a random sample of 50 male students. We measure their heights and obtain a sample mean of 175 cm and a sample standard deviation of 5 cm. We want to calculate a 95% confidence interval for the population mean height.

First, we need to determine the t-score for a 95% confidence interval with 49 degrees of freedom (50 - 1). We can look up this value in a t-distribution table or use a calculator to find that the t-score is 2.009.

Next, we can use the formula for a confidence interval for the population mean:

CI = x̄ ± t*(s/√n)

where x̄ is the sample mean, s is the sample standard deviation, n is the sample size, and t is the t-score.

Plugging in the values, we get:

CI = 175 ± 2.009*(5/√50)

CI = 175 ± 1.42

CI = (173.58, 176.42)

This means that we can be 95% confident that the true population mean height of male students in this university falls between 173.58 cm and 176.42 cm. If we repeated the sampling process many times, approximately 95% of the calculated confidence intervals would contain the true population mean height.

# Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

A clinic performs a test for a certain disease. The test is 95% accurate, meaning that it correctly identifies positive cases 95% of the time and correctly identifies negative cases 95% of the time. The disease occurs in 1% of the population. If a patient tests positive for the disease, what is the probability that they actually have the disease?

Solution:

Let's use the following notation for the problem:

D: the event that the patient has the disease
D': the event that the patient does not have the disease
+: the event that the patient tests positive for the disease
-: the event that the patient tests negative for the disease
We are asked to find P(D | +), the probability that the patient has the disease given that they test positive.

We can start by using Bayes' Theorem:

P(D | +) = P(+ | D) P(D) / P(+)

where P(+ | D) is the probability of testing positive given that the patient has the disease (i.e., the test sensitivity), P(D) is the prior probability of the patient having the disease (i.e., 1% in this case), and P(+) is the probability of testing positive, which can be calculated using the law of total probability:

P(+) = P(+ | D) P(D) + P(+ | D') P(D')

where P(+ | D') is the probability of testing positive given that the patient does not have the disease (i.e., the test specificity), and P(D') is the complement of P(D), which is 1% in this case.

We are given that the test sensitivity and specificity are both 95%, so we have:

P(+ | D) = 0.95
P(+ | D') = 1 - 0.95 = 0.05

Plugging in these values, we get:

P(+) = 0.95 * 0.01 + 0.05 * 0.99 = 0.0595

Now we can calculate the desired probability:

P(D | +) = P(+ | D) P(D) / P(+) = 0.95 * 0.01 / 0.0595 = 0.159

So the probability that a patient actually has the disease given a positive test result is about 16%.

# Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.

To calculate the 95% confidence interval, we need to use the formula:

CI = x̄ ± z*(σ/√n)

where:

CI = confidence interval
x̄ = sample mean (50)
z = z-score for the desired confidence level (1.96 for 95% confidence)
σ = population standard deviation (5)
n = sample size

Plugging in the values, we get:

CI = 50 ± 1.96*(5/√n)

Since we don't have information on the sample size (n), we can't calculate the exact confidence interval. However, we can interpret the results if we assume a sample size of 100:

CI = 50 ± 1.96*(5/√100) = 50 ± 0.98

This means that we can be 95% confident that the true population mean falls within the range of 49.02 to 50.98. In other words, if we were to take multiple samples from the same population and calculate their 95% confidence intervals, we would expect 95% of those intervals to contain the true population mean.






# Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

The margin of error is the amount of error that is allowed for when making a prediction or estimation about a population based on a sample. It represents the range of values within which the true population parameter is likely to fall with a certain level of confidence.

Sample size affects the margin of error in a confidence interval in that larger sample sizes generally result in smaller margins of error. This is because larger sample sizes provide more information about the population, and therefore the estimate of the population parameter is more precise.

For example, suppose a pollster wants to estimate the percentage of voters in a city who support a particular candidate. If the pollster takes a sample of 100 voters and finds that 60% support the candidate, the margin of error for a 95% confidence interval would be around +/- 9.8%. However, if the pollster takes a sample of 1000 voters and finds that 60% support the candidate, the margin of error for a 95% confidence interval would be around +/- 3.1%. This means that the larger sample size has resulted in a smaller margin of error.

In general, a larger sample size results in a smaller margin of error because it reduces the effect of random sampling variation on the estimate of the population parameter.





# Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.

The z-score can be calculated using the formula:

z = (x - μ) / σ

where x is the data point, μ is the population mean, and σ is the population standard deviation.

Plugging in the values given in the problem, we get:

z = (75 - 70) / 5

z = 1

The z-score for this data point is 1.

Interpreting the results, we can say that the data point of 75 is one standard deviation above the population mean of 70. This information can be used to compare the data point to other data points in the distribution that have been standardized using the same population mean and standard deviation.







# Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

To conduct a hypothesis test using a t-test, we need to set up the null and alternative hypotheses:

Null hypothesis: The weight loss drug is not significantly effective.
Alternative hypothesis: The weight loss drug is significantly effective.

We will use a two-tailed test since we are interested in determining if the drug is significantly effective, whether it leads to weight loss or weight gain.

The significance level, alpha, is 0.05 (95% confidence level).

We have a sample size of 50, sample mean (x̄) of 6 pounds, and sample standard deviation (s) of 2.5 pounds.

First, we need to calculate the t-statistic:

t = (x̄ - μ) / (s / sqrt(n))

where x̄ is the sample mean, μ is the population mean (null hypothesis value), s is the sample standard deviation, and n is the sample size.

t = (6 - 0) / (2.5 / sqrt(50))
t = 16.97

Next, we need to find the critical t-value for a two-tailed test with 49 degrees of freedom (n-1) and alpha = 0.05. Using a t-table or calculator, the critical t-value is approximately ±2.009.

Since our calculated t-value (16.97) is greater than the critical t-value (±2.009), we reject the null hypothesis and conclude that the weight loss drug is significantly effective at the 95% confidence level.

# Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

To calculate the 95% confidence interval for the true proportion of people who are satisfied with their job, we can use the following formula:

CI = p ± z√(p(1-p)/n)

where:
p = sample proportion (0.65)
z = z-score for 95% confidence level (1.96)
n = sample size (500)

Plugging in the values, we get:

CI = 0.65 ± 1.96√(0.65(1-0.65)/500)
CI = 0.65 ± 0.047
CI = (0.603, 0.697)

Therefore, we can be 95% confident that the true proportion of people who are satisfied with their job is between 60.3% and 69.7%.

# Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

To determine if there is a significant difference in student performance between the two teaching methods, we can conduct a two-sample t-test. The null hypothesis is that there is no significant difference in means between the two samples, while the alternative hypothesis is that there is a significant difference in means between the two samples.

The t-test formula is given by:

t = (x1 - x2) / [s^2pooled * (1/n1 + 1/n2)]^0.5

where:
x1 = sample mean of sample A
x2 = sample mean of sample B
s1 = standard deviation of sample A
s2 = standard deviation of sample B
n1 = sample size of sample A
n2 = sample size of sample B
spooled = pooled standard deviation, calculated as:
spooled = [(n1-1)*s1^2 + (n2-1)*s2^2] / (n1+n2-2)

We will use a significance level of 0.01, which corresponds to a 99% confidence level.

First, we need to calculate the pooled standard deviation:

spooled = [(15-1)*6^2 + (12-1)*5^2] / (15+12-2) = 5.459

Then, we can calculate the t-statistic:

t = (85 - 82) / [5.459 * (1/15 + 1/12)]^0.5 = 2.397

Using a t-table or calculator with 25 degrees of freedom (n1 + n2 - 2), we can find the critical t-value for a two-tailed test at a significance level of 0.01 to be 2.796.

Since our calculated t-statistic of 2.397 is less than the critical t-value of 2.796, we fail to reject the null hypothesis. We can conclude that there is not enough evidence to suggest that there is a significant difference in student performance between the two teaching methods.

Therefore, we cannot conclude that one teaching method is better than the other based on this study.

# Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.

To calculate the 90% confidence interval for the true population mean, we can use the following formula:

Confidence Interval = sample mean ± (Z-value x Standard Error)

Where Z-value is the critical value from the standard normal distribution for the desired confidence level, and Standard Error is the standard deviation of the sample mean.

First, we need to find the Z-value for a 90% confidence level, which can be found using a Z-table or calculator. For a 90% confidence level, the Z-value is 1.645.

Next, we can calculate the Standard Error using the following formula:

Standard Error = population standard deviation / square root of sample size

Standard Error = 8 / √50
Standard Error = 1.131

Now, we can plug in the values into the formula for the confidence interval:

Confidence Interval = 65 ± (1.645 x 1.131)
Confidence Interval = 65 ± 1.860
Confidence Interval = (63.14, 66.86)

Therefore, we can say with 90% confidence that the true population mean falls between 63.14 and 66.86.






# Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

To conduct a hypothesis test to determine if caffeine has a significant effect on reaction time, we can use a one-sample t-test with the following null and alternative hypotheses:

Null hypothesis: The mean reaction time for the population is equal to 0.25 seconds.
Alternative hypothesis: The mean reaction time for the population is not equal to 0.25 seconds.

We will use a significance level of 0.10 for the test.

First, we calculate the t-value:

t = (sample mean - hypothesized population mean) / (sample standard deviation / sqrt(sample size))
t = (0.25 - 0) / (0.05 / sqrt(30))
t = 8.66025

Next, we find the degrees of freedom:

df = sample size - 1
df = 30 - 1
df = 29

Using a t-table or calculator, we find the critical t-value for a two-tailed test with 29 degrees of freedom and a significance level of 0.10 to be approximately ±1.699.

Since our calculated t-value of 8.66025 is greater than the critical t-value of ±1.699, we reject the null hypothesis and conclude that caffeine has a significant effect on reaction time at a 90% confidence level.




