# Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

A t-test and a z-test are both statistical hypothesis tests used to compare means of two populations or samples. However, they have different applications and assumptions based on the characteristics of the data.

1. T-test:
A t-test is used when the sample size is small (typically less than 30) or when the population standard deviation is unknown. It is based on the t-distribution, which takes into account the added uncertainty introduced by estimating the population standard deviation from a small sample.

Example scenario: Suppose you want to compare the average exam scores of two groups of students, Group A and Group B. You have sample sizes of 20 students in each group. Since the sample size is relatively small, you would use a t-test to determine if there is a statistically significant difference in the average exam scores between the two groups.

2. Z-test:
A z-test is used when the sample size is large (typically greater than 30) or when the population standard deviation is known. It is based on the standard normal distribution (z-distribution), which is a special case of the t-distribution when the sample size is large and the population standard deviation is known.

Example scenario: Let's say you have access to the heights of 500 randomly selected adult males and you want to test whether their average height is significantly different from the known average height of the male population (e.g., based on previous research). In this case, you have a large sample size (500) and know the population standard deviation, so you would use a z-test to perform the hypothesis test.



# Q2: Differentiate between one-tailed and two-tailed tests.

One-tailed and two-tailed tests are two types of hypothesis tests used in statistical analysis to assess whether there is a significant difference between groups or variables. The difference lies in the directionality of the hypothesis being tested.

1. One-tailed test:
In a one-tailed test, the null hypothesis specifies a particular direction of the effect, either an increase or a decrease, but not both. The alternative hypothesis, on the other hand, asserts that the effect is significant and occurs only in that specified direction.

For example, let's say you want to test whether a new drug increases the average response time of participants in a cognitive task. The hypotheses for a one-tailed test would be:
- Null hypothesis (H0): The new drug has no effect on response time (µ = µ0).
- Alternative hypothesis (Ha): The new drug increases response time (µ > µ0).

In this case, the statistical test will only consider values in the tail of the distribution that are greater than the critical value for the chosen significance level (e.g., α = 0.05), as the test focuses solely on the possibility of an increase in response time.

2. Two-tailed test:
In a two-tailed test, the null hypothesis does not specify any particular direction of the effect; it merely states that there is no significant difference between the groups or variables being compared. The alternative hypothesis, on the other hand, asserts that there is a significant difference between the groups or variables but does not specify the direction of the effect.

Using the same example as before, the hypotheses for a two-tailed test would be:
- Null hypothesis (H0): The new drug has no effect on response time (µ = µ0).
- Alternative hypothesis (Ha): The new drug has a significant effect on response time (µ ≠ µ0).

In this case, the statistical test will consider values in both tails of the distribution that are greater or smaller than the critical values for the chosen significance level (e.g., α = 0.05). It takes into account the possibility of both an increase and a decrease in response time due to the new drug.



# Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

Type 1 and Type 2 errors are two types of mistakes that can occur in hypothesis testing when making decisions about the null hypothesis (H0) and the alternative hypothesis (Ha). These errors are related to the concept of statistical significance and the level of confidence we place on our conclusions.

1. Type 1 error (False Positive):
A Type 1 error occurs when we reject the null hypothesis (H0) when it is actually true. In other words, we incorrectly conclude that there is a significant effect or difference when, in reality, there is no effect or difference in the population.

Example scenario: Let's say a pharmaceutical company is testing a new drug to reduce blood pressure. The null hypothesis (H0) is that the drug has no effect on blood pressure, while the alternative hypothesis (Ha) is that the drug does reduce blood pressure. After conducting the study, the statistical analysis shows that there is a significant reduction in blood pressure in the drug group. However, in reality, the drug has no effect, and the observed difference is due to random chance or other factors. The conclusion that the drug is effective would be a Type 1 error.

2. Type 2 error (False Negative):
A Type 2 error occurs when we fail to reject the null hypothesis (H0) when it is actually false. In other words, we fail to detect a significant effect or difference when there is, in fact, a real effect or difference in the population.

Example scenario: Continuing with the previous example, suppose the new drug actually does reduce blood pressure in the population. However, due to a small sample size or other factors, the statistical analysis fails to show a significant difference in blood pressure between the drug and placebo groups. As a result, the study concludes that there is no evidence of an effect, and the drug is not approved for reducing blood pressure, leading to a Type 2 error.

In summary:
- Type 1 error (False Positive) occurs when we mistakenly reject a true null hypothesis.
- Type 2 error (False Negative) occurs when we mistakenly fail to reject a false null hypothesis.

Researchers try to strike a balance between these two types of errors by choosing an appropriate significance level (alpha, typically set to 0.05) and conducting power analysis to ensure an adequate sample size. The goal is to minimize the risk of both types of errors, but it's important to acknowledge that reducing one type of error may increase the risk of the other, and there is often a trade-off between them in hypothesis testing.

# Q4: Explain Bayes's theorem with an example.

Bayes's theorem, named after the Reverend Thomas Bayes, is a fundamental concept in probability theory and statistics. It allows us to update the probability of a hypothesis (an event) based on new evidence or information. The theorem mathematically describes how prior beliefs (prior probability) are combined with observed evidence (likelihood) to calculate the updated belief (posterior probability).

The formula for Bayes's theorem is as follows:

P(A|B) = (P(B|A) * P(A)) / P(B)

Where:
- P(A|B) is the posterior probability of event A given event B has occurred.
- P(B|A) is the likelihood or conditional probability of observing event B given that event A has occurred.
- P(A) is the prior probability of event A (the probability of event A occurring before considering any new evidence).
- P(B) is the prior probability of event B (the probability of event B occurring before considering any new evidence).

Now, let's illustrate Bayes's theorem with a classic example known as the "diagnostic test" scenario:

Example scenario:
Suppose you are a doctor and you have a patient who is showing some symptoms of a particular disease. The disease is relatively rare in the general population, occurring in about 1% of people (P(A) = 0.01). You have access to a diagnostic test for this disease, and the test has been found to be accurate in clinical trials:

- If a person has the disease (A), the test correctly identifies it as positive 95% of the time (P(B|A) = 0.95).
- If a person does not have the disease (not A), the test incorrectly indicates a positive result 3% of the time (P(B|not A) = 0.03).

Now, the patient takes the test, and the result comes back positive (B). You want to calculate the probability that the patient actually has the disease (P(A|B)) based on the test result.

Using Bayes's theorem:

P(A|B) = (P(B|A) * P(A)) / P(B)

P(A|B) = (0.95 * 0.01) / (P(B|A) * P(A) + P(B|not A) * P(not A))
P(A|B) = (0.95 * 0.01) / (0.95 * 0.01 + 0.03 * 0.99)
P(A|B) = 0.0095 / (0.0095 + 0.0297)
P(A|B) = 0.243

The result, P(A|B) ≈ 0.243, indicates that the probability that the patient actually has the disease given a positive test result is approximately 24.3%. Even though the test is 95% accurate in detecting the disease when it is present, due to the low prevalence of the disease in the population, the probability of a false positive (test indicating the disease when it's not present) is relatively high. Therefore, a positive test result doesn't necessarily mean the patient has the disease, and further confirmatory tests or considerations are needed.

# Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

A confidence interval (CI) is a range of values that provides an estimated range of plausible values for a population parameter (such as the population mean or population proportion) based on a sample from the population. It is a measure of the uncertainty associated with the sample estimate and provides a sense of how precise the estimate is.

The confidence interval is typically expressed as a range, with an associated confidence level, which represents the probability that the true population parameter lies within the interval. Commonly used confidence levels are 95% and 99%.

To calculate the confidence interval, you need three pieces of information:
1. Sample statistic: The value calculated from the sample, such as the sample mean or sample proportion.
2. Standard error (SE): The standard deviation of the sampling distribution of the sample statistic. It measures the variability of the sample statistic across different random samples.
3. Confidence level: The desired level of confidence, often denoted by (1 - α), where α is the significance level (e.g., 0.05 for a 95% confidence level).

The formula for calculating the confidence interval for a population mean (assuming a large sample or known population standard deviation) is:

Confidence Interval = Sample Mean ± (Critical Value * SE)

The critical value is obtained from the standard normal distribution (z-distribution) or t-distribution, depending on the sample size and whether the population standard deviation is known.

Now, let's illustrate how to calculate a confidence interval with an example:

Example:
Suppose a random sample of 100 students was taken, and their heights were measured. The sample mean height was found to be 165 cm, and the sample standard deviation was 5 cm. We want to calculate a 95% confidence interval for the population mean height.

1. Sample mean (x̄): 165 cm
2. Sample standard deviation (s): 5 cm
3. Sample size (n): 100
4. Confidence level (1 - α): 95%, which means α = 0.05 (for a two-tailed test)

Since the sample size is relatively large (n > 30), we can use the z-distribution for a 95% confidence level. The critical value for a 95% confidence level and a two-tailed test is approximately 1.96.

Standard error (SE) = s / √n = 5 / √100 = 5 / 10 = 0.5

Confidence Interval = 165 ± (1.96 * 0.5)
Confidence Interval = 165 ± 0.98

The 95% confidence interval for the population mean height is (164.02 cm, 165.98 cm). This means that we can be 95% confident that the true population mean height lies between 164.02 cm and 165.98 cm based on the sample data.

# Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.


Sample problem:
Suppose a factory produces light bulbs, and it is known from historical data that 5% of the bulbs are defective (P(Defective) = 0.05). A quality control inspector uses a special testing machine that correctly identifies a defective bulb with a probability of 90% (P(Test positive | Defective) = 0.90). However, the machine also produces false positives for non-defective bulbs, and it incorrectly identifies a non-defective bulb as defective with a probability of 10% (P(Test positive | Not Defective) = 0.10).

Now, a randomly selected light bulb is tested using the machine, and the result comes back positive (Test positive).

We want to calculate the probability that the bulb is actually defective (P(Defective | Test positive)).

Solution:
To solve the problem, we will use Bayes's Theorem:

P(Defective | Test positive) = (P(Test positive | Defective) * P(Defective)) / P(Test positive)

To calculate P(Test positive), we can use the law of total probability:

P(Test positive) = P(Test positive | Defective) * P(Defective) + P(Test positive | Not Defective) * P(Not Defective)

P(Test positive) = 0.90 * 0.05 + 0.10 * (1 - 0.05)
P(Test positive) = 0.045 + 0.095
P(Test positive) = 0.14

Now, we can calculate P(Defective | Test positive) using Bayes's Theorem:

P(Defective | Test positive) = (0.90 * 0.05) / 0.14
P(Defective | Test positive) = 0.045 / 0.14
P(Defective | Test positive) ≈ 0.3214

The result, P(Defective | Test positive) ≈ 0.3214 or approximately 32.14%, indicates that given the positive test result, there is about a 32.14% probability that the bulb is actually defective. The majority of the positive test results (67.86%) are due to false positives from non-defective bulbs.

# Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.


Confidence Interval = Sample Mean ± (Critical Value * (Standard Deviation / √Sample Size))

Given the following information:
Sample Mean (x̄) = 50
Standard Deviation (σ) = 5
Sample Size (n) = Since sample size is not given, Let's assume it to be a large sample size, such as 100

Step 1: Find the critical value for a 95% confidence level. For a 95% confidence interval, the critical value is approximately 1.96 (for a two-tailed test) based on the standard normal distribution (z-distribution).

Step 2: Calculate the standard error (SE) of the sample mean.
SE = σ / √n = 5 / √100 = 5 / 10 = 0.5

Step 3: Calculate the confidence interval.
Confidence Interval = 50 ± (1.96 * 0.5)
Confidence Interval = 50 ± 0.98

The 95% confidence interval for the sample mean is (49.02, 50.98).

Interpretation:
The interpretation of the 95% confidence interval is as follows: If we were to take many random samples from the population and calculate the sample means for each of those samples, approximately 95% of those sample means would fall within the range of 49.02 to 50.98. In other words, we can be 95% confident that the true population mean lies within this interval based on the sample data.

Since the confidence interval includes the value of 50 (the sample mean), it suggests that the sample mean of 50 is a plausible estimate for the population mean. However, it's important to note that this is only an estimate, and there is still some uncertainty associated with it. The wider the confidence interval, the more uncertainty there is in our estimate. The narrower the confidence interval, the more precise our estimate becomes. The confidence interval provides a measure of the precision or accuracy of the sample mean as an estimate of the population mean.

# Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

The margin of error (MOE) is a measure of the uncertainty or precision associated with a confidence interval estimate. It quantifies the amount by which the estimated value (e.g., sample mean or proportion) may vary from the true population value within the confidence interval. A smaller margin of error indicates a more precise estimate, while a larger margin of error suggests greater uncertainty in the estimate.

The formula for calculating the margin of error for a confidence interval (assuming a large sample size or known population standard deviation) is:

Margin of Error = Critical Value * (Standard Deviation / √Sample Size)

The margin of error is directly affected by the critical value (which depends on the chosen confidence level) and the sample size. As the sample size increases, the margin of error decreases. In other words, a larger sample size results in a more precise estimate, as there is more information available to estimate the population parameter.

Example scenario:
Suppose a market researcher wants to estimate the average age of customers visiting a shopping mall. The researcher takes two random samples of customers—one with a sample size of 50 and another with a sample size of 500.

Given information:
- The standard deviation of ages (σ) is known to be 10 years (this is just for illustration purposes).
- The critical value for a 95% confidence level is approximately 1.96 (from the standard normal distribution).

For the sample with a sample size of 50:
MOE = 1.96 * (10 / √50) ≈ 2.78 years

For the sample with a sample size of 500:
MOE = 1.96 * (10 / √500) ≈ 0.71 years

In this example, the margin of error for the larger sample (500) is significantly smaller (0.71 years) compared to the margin of error for the smaller sample (50), which is 2.78 years. This means that the estimate of the average age based on the larger sample is more precise and has less uncertainty compared to the estimate based on the smaller sample.

Having a smaller margin of error is desirable because it means we have more confidence in the estimate. To achieve a smaller margin of error, researchers often aim for larger sample sizes when conducting surveys or studies, as it allows for more accurate and reliable estimates of population parameters.


# Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.



z = (x - μ) / σ

z = (75 - 70) / 5
z = 5 / 5
z = 1

The z-score is 1.

Interpretation:
The z-score measures how many standard deviations the data point is away from the population mean. In this case, with a z-score of 1, the data point (75) is one standard deviation above the population mean (70). A positive z-score indicates that the data point is above the mean, while a negative z-score would indicate that the data point is below the mean.

The z-score helps in understanding the relative position of the data point within the distribution of the population. A z-score of 1 suggests that the data point is higher than most of the data in the population, as it lies one standard deviation above the mean. It provides a standardized measure of how unusual or extreme the data point is compared to the rest of the data. Additionally, z-scores allow for comparisons across different datasets with different units and scales, as they standardize the data based on the population mean and standard deviation.

# Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.


Null hypothesis (H0): The new weight loss drug has no significant effect on weight loss (μ = 0).
Alternative hypothesis (Ha): The new weight loss drug is significantly effective (μ ≠ 0).

Here, μ represents the population mean weight loss with the drug.

Step 1: Calculate the test statistic (t-value):
The formula for the t-value in a one-sample t-test is:

t = (x̄ - μ) / (s / √n)

Where:
x̄ is the sample mean (6 pounds),
μ is the hypothesized population mean under the null hypothesis (0 pounds),
s is the sample standard deviation (2.5 pounds),
n is the sample size (50 participants).

t = (6 - 0) / (2.5 / √50)
t = 6 / (2.5 / 7.0711)
t = 6 / 0.3536
t ≈ 16.97

Step 2: Determine the degrees of freedom (df):
In a one-sample t-test, the degrees of freedom are equal to the sample size minus 1.

df = 50 - 1 = 49

Step 3: Find the critical t-value:
At a 95% confidence level and 49 degrees of freedom, the critical t-value is approximately ±2.009 (for a two-tailed test). 

Step 4: Compare the t-value with the critical t-value:
Since the calculated t-value (16.97) is much larger than the critical t-value (±2.009), we reject the null hypothesis (H0). The result is statistically significant at the 95% confidence level. This means that there is strong evidence to suggest that the weight loss drug is significantly effective in reducing weight, as the sample mean weight loss of 6 pounds is significantly different from zero.

In conclusion, based on the t-test, the study provides evidence that the new weight loss drug is significantly effective at a 95% confidence level.

# Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.


Confidence Interval = Sample Proportion ± (Critical Value * Standard Error)

Where:
- Sample Proportion (p̂) = 65% = 0.65 (proportion of people satisfied with their job in the sample)
- Critical Value: The critical value for a 95% confidence level and a two-tailed test is approximately 1.96 (from the standard normal distribution).
- Standard Error (SE) = √[(p̂ * (1 - p̂)) / n], where n is the sample size.

Given:
- Sample Size (n) = 500



Sample Proportion (p̂) = 0.65
Standard Error (SE) = √[(0.65 * (1 - 0.65)) / 500] ≈ √[(0.65 * 0.35) / 500] ≈ √[0.2275 / 500] ≈ √0.000455 ≈ 0.0213

Confidence Interval = 0.65 ± (1.96 * 0.0213)
Confidence Interval = 0.65 ± 0.0418

The 95% confidence interval for the true proportion of people who are satisfied with their job is approximately (0.6082, 0.6918).

Interpretation:
This means that we can be 95% confident that the true proportion of people who are satisfied with their job lies within the range of 60.82% to 69.18%, based on the sample data from the survey. The interval provides a measure of uncertainty around the sample proportion, indicating that the true proportion in the entire population is likely to fall within this range. The wider the confidence interval, the more uncertainty there is in our estimate, while a narrower interval indicates a more precise estimate.


# Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

Null hypothesis (H0): There is no significant difference in student performance between the two teaching methods (μA = μB).
Alternative hypothesis (Ha): There is a significant difference in student performance between the two teaching methods (μA ≠ μB).

Where:
μA is the population mean score for Sample A.
μB is the population mean score for Sample B.

For a significance level (α) of 0.01, the critical region for a two-tailed test is  ± 0.005.



t = (x̄A - x̄B) / √[(sA^2 / nA) + (sB^2 / nB)]

Since the sample sizes (nA and nB) are not given in the question, let's assume they are both equal to 30 for illustration purposes.

Step 1: Calculate the pooled standard error (sp) and degrees of freedom (df):

Pooled standard error (sp) = √[((sA^2 * (nA - 1)) + (sB^2 * (nB - 1))) / (nA + nB - 2)]
sp = √[((6^2 * (30 - 1)) + (5^2 * (30 - 1))) / (30 + 30 - 2)]
sp ≈ √[(1800 + 870) / 58]
sp ≈ √(2670 / 58)
sp ≈ √46.03
sp ≈ 6.78

Degrees of freedom (df) = nA + nB - 2
df = 30 + 30 - 2 = 58

Step 2: Calculate the t-value:

t = (85 - 82) / (6.78 * √(1/30 + 1/30))
t = 3 / (6.78 * √(1/15))
t ≈ 3 / (6.78 * 0.2582)
t ≈ 3 / 1.75
t ≈ 1.714

Step 3: Find the critical t-value:

At a significance level of 0.01 and 58 degrees of freedom, the critical t-value is approximately ±2.660 (from the t-table).

Step 4: Compare the t-value with the critical t-value:

Since the calculated t-value (1.714) is smaller than the critical t-value (±2.660), we fail to reject the null hypothesis (H0). The result is not statistically significant at the 0.01 significance level. This means that based on the sample data, we do not have sufficient evidence to conclude that there is a significant difference in student performance between the two teaching methods. The difference observed in the sample means (85 for Sample A and 82 for Sample B) may be due to random chance or other factors, and we cannot confidently attribute it to the teaching methods themselves. Further research with a larger sample size or additional experiments may be needed to draw more conclusive results.



# Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.


Confidence Interval = Sample Mean ± (Critical Value * (Standard Deviation / √Sample Size))

Confidence Interval = 65 ± (1.645 * (8 / √50))


SE = σ / √n = 8 / √50 ≈ 8 / 7.0711 ≈ 1.1314

Confidence Interval = 65 ± (1.645 * 1.1314)
Confidence Interval = 65 ± 1.8606

The 90% confidence interval for the true population mean is approximately (63.1394, 66.8606).




# Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.


Null hypothesis (H0): Caffeine has no significant effect on reaction time (μ = μ0).
Alternative hypothesis (Ha): Caffeine has a significant effect on reaction time (μ ≠ μ0).



t = (x̄ - μ0) / (s / √n)


Step 1: Calculate the degrees of freedom (df):

Degrees of freedom (df) = n - 1 = 30 - 1 = 29

Step 2: Calculate the t-value:

t = (0.25 - μ0) / (0.05 / √30)

Since μ0 is not provided in the question, let's assume it to be the average reaction time without caffeine, which might be, for example, 0.30 seconds.

t = (0.25 - 0.30) / (0.05 / √30)
t = -0.05 / (0.05 / √30)
t = -0.05 / (0.05 / 5.4772)
t ≈ -0.05 / 0.0091
t ≈ -5.4945

Step 3: Find the critical t-value:

At a significance level of 0.10 and 29 degrees of freedom, the critical t-value is approximately ±1.699 (from the t-table or a statistical software tool).

Step 4: Compare the t-value with the critical t-value:

Since the calculated t-value (-5.4945) is much smaller in magnitude than the critical t-value (±1.699), we reject the null hypothesis (H0). The result is statistically significant at the 90% confidence level. This means that based on the sample data, we have sufficient evidence to conclude that caffeine has a significant effect on reaction time. The sample mean reaction time of 0.25 seconds is significantly different from the hypothesized population mean reaction time without caffeine (μ0 = 0.30 seconds). The study suggests that caffeine intake affects reaction time in some way.