Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would
use each type of test.

In [None]:
The primary difference between a t-test and a z-test lies in the situations and assumptions related to their application.

- Z-test:
  - Scenario: Used when the population standard deviation is known and the sample size is relatively large (typically n > 30).
  - Assumption: Assumes a known population standard deviation.
  - Example: Suppose you're testing the mean height of adult males in a city where the population standard deviation for heights is known to be 4 inches. You collect a large sample (n > 30) and want to determine if the mean height of your sample significantly differs from the known population mean.

- T-test:
  - Scenario: Appropriate when the population standard deviation is unknown or when the sample size is small (typically n < 30).
  - Assumption: Assumes an unknown population standard deviation and uses the sample standard deviation as an estimate.
  - Example: You're studying the effect of a new teaching method on student performance. You gather a small sample of 20 students and want to test if the mean exam scores of students taught using the new method differ significantly from the population mean score.

In summary, Z-tests are utilized with larger samples (n > 30) and a known population standard deviation, while t-tests are suitable for smaller samples (n < 30) or when the population standard deviation is unknown.

Q2: Differentiate between one-tailed and two-tailed tests.

In [None]:
One-tailed and two-tailed tests refer to the directional nature of the hypothesis being tested and the area considered in the distribution for rejecting the null hypothesis.

- **One-tailed test:
  - **Directionality: Assesses whether the sample statistic is significantly greater than or less than a certain value, but not both.
  - **Hypothesis: Has either a > or < symbol in the alternative hypothesis.
  - **Rejection Region: The critical region for rejecting the null hypothesis is located entirely in one tail of the distribution.
  - Example: Testing if a new drug improves performance (one-tailed >) or decreases performance (one-tailed <) compared to a standard drug.

- Two-tailed test:
  - Directionality: Assesses whether the sample statistic is significantly different from a certain value in either direction.
  - Hypothesis: Uses a ≠ symbol in the alternative hypothesis.
  - Rejection Region: The critical region for rejecting the null hypothesis is split between both tails of the distribution.
  - Example: Testing if a coin is biased (two-tailed ≠), indicating that it's neither fair (50% heads) nor unfair (biased toward heads or tails).

In essence, one-tailed tests focus on a specific direction of change, while two-tailed tests consider any significant deviation from the null hypothesis in both directions. The choice between one-tailed and two-tailed tests depends on the research question and the direction of the effect being investigated.

Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for
each type of error.

In [None]:
Type I and Type II errors are associated with hypothesis testing and represent the two potential mistakes that can occur when making decisions based on statistical testing.

- **Type I error (False Positive):** Occurs when the null hypothesis is incorrectly rejected when it is actually true. It signifies concluding that there is an effect or difference when, in reality, there isn't.

- **Type II error (False Negative):** Occurs when the null hypothesis is not rejected when it is actually false. It implies failing to detect an effect or difference that genuinely exists.

**Example Scenarios:**

- **Type I Error:** Imagine a pharmaceutical company testing a new drug for a disease. The null hypothesis is that the drug has no effect. A Type I error would occur if the company erroneously concludes that the drug is effective (rejecting the null) when, in fact, it's not, leading to the drug being marketed but having no actual impact on the disease.

- **Type II Error:** Using the same drug scenario, a Type II error would occur if the company fails to recognize the drug's efficacy (fails to reject the null) when it actually does have a beneficial effect on the disease. Consequently, the company decides not to proceed with the drug, missing a potential treatment opportunity.

Both errors are important in hypothesis testing, as minimizing one type of error can increase the likelihood of the other occurring. The balance between Type I and Type II errors is often managed by choosing an appropriate significance level (α) and considering the power of the test to detect effects.

Q4: Explain Bayes's theorem with an example.

In [None]:
Bayes' theorem is a fundamental concept in probability theory that describes the probability of an event based on prior knowledge of conditions that might be related to the event. It's a way to update or revise beliefs about the probability of an event as new evidence or information becomes available.

The theorem can be expressed as:

\[ P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} \]

Where:
- \( P(A|B) \) is the probability of event A occurring given that event B has occurred.
- \( P(B|A) \) is the probability of event B occurring given that event A has occurred.
- \( P(A) \) and \( P(B) \) are the probabilities of events A and B occurring independently.

**Example:**

Consider a scenario involving a medical test for a rare disease, where the disease affects 1 in 1000 people (0.1% prevalence). The test correctly identifies the disease in 99% of cases and produces a false positive result in 5% of healthy individuals.

Let:
- Event A: Having the disease
- Event B: Testing positive

Using Bayes' theorem, we can calculate the probability of having the disease given a positive test result:

- \( P(A) = 0.001 \) (prevalence of the disease)
- \( P(B|A) = 0.99 \) (probability of testing positive given having the disease)
- \( P(\neg A) = 1 - P(A) = 0.999 \) (probability of not having the disease)
- \( P(B|\neg A) = 0.05 \) (probability of testing positive given not having the disease)

Now, applying Bayes' theorem:

\[ P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} \]

\[ P(A|B) = \frac{0.99 \times 0.001}{(0.001 \times 0.99) + (0.999 \times 0.05)} \]

After calculation, the probability of actually having the disease given a positive test result might be much lower than expected due to the test's false positive rate, even though the test is quite accurate in detecting the disease.

Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

In [None]:
A confidence interval is a range of values derived from sample data that is used to estimate an unknown population parameter, such as the population mean or proportion, with a specified level of confidence. It provides a range of plausible values for the parameter instead of a single point estimate.

The formula for calculating a confidence interval for the population mean (\( \mu \)) using sample data when the population standard deviation (\( \sigma \)) is known is:

\[ \text{Confidence Interval} = \bar{x} \pm Z \times \frac{\sigma}{\sqrt{n}} \]

Where:
- \( \bar{x} \) is the sample mean
- \( Z \) is the Z-score corresponding to the desired confidence level (e.g., for 95% confidence, \( Z \approx 1.96 \))
- \( \sigma \) is the population standard deviation
- \( n \) is the sample size

**Example:**

Suppose you want to estimate the average weight of apples in a farm. You collect a random sample of 50 apples and find the sample mean weight to be 150 grams. You know from past data that the population standard deviation of apple weights is 20 grams. You want to calculate a 95% confidence interval for the average weight of all apples on the farm.

Given:
- Sample Mean (\( \bar{x} \)): 150 grams
- Population Standard Deviation (\( \sigma \)): 20 grams
- Sample Size (\( n \)): 50
- Confidence Level: 95% (corresponds to a Z-score of approximately 1.96)

Substitute the values into the formula:

\[ \text{Confidence Interval} = 150 \pm 1.96 \times \frac{20}{\sqrt{50}} \]

\[ \text{Confidence Interval} = 150 \pm 1.96 \times \frac{20}{\sqrt{50}} \approx 150 \pm 5.54 \]

Therefore, the 95% confidence interval for the average weight of apples on the farm is approximately \( (144.46, 155.54) \) grams. This means we are 95% confident that the true average weight of all apples on the farm lies within this range based on the sample data collected.

Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the
event's probability and new evidence. Provide a sample problem and solution.

In [None]:
Certainly! Let's consider a scenario involving a diagnostic test for a rare disease.

Suppose a disease affects 1 in 5000 people (0.02% prevalence) in a population. A test for the disease is known to correctly identify the disease in 98% of cases and produce a false positive result in 3% of healthy individuals.

Let:
- Event A: Having the disease
- Event B: Testing positive

We want to calculate the probability of an individual having the disease given a positive test result using Bayes' theorem:

\[ P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} \]

Where:
- \( P(A) \) is the probability of having the disease (prevalence) = 0.0002
- \( P(B|A) \) is the probability of testing positive given having the disease = 0.98
- \( P(\neg A) \) is the probability of not having the disease = 1 - \( P(A) \) = 0.9998
- \( P(B|\neg A) \) is the probability of testing positive given not having the disease = 0.03

Now, let's calculate \( P(A|B) \) using Bayes' theorem:

\[ P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} \]
\[ P(A|B) = \frac{0.98 \times 0.0002}{(0.98 \times 0.0002) + (0.03 \times 0.9998)} \]
\[ P(A|B) = \frac{0.000196}{0.000196 + 0.029994} \]
\[ P(A|B) = \frac{0.000196}{0.03019} \]
\[ P(A|B) \approx 0.00649 \]

Therefore, even with a positive test result, the probability of an individual actually having the disease is approximately 0.65% or 0.00649, demonstrating the impact of the test's false positive rate on the final probability estimation.

Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5. Interpret the results.

In [None]:
To calculate the 95% confidence interval for a sample with a mean of 50 and a standard deviation of 5, we'll use the formula for a confidence interval for the population mean when the population standard deviation is known:

\[ \text{Confidence Interval} = \bar{x} \pm Z \times \frac{\sigma}{\sqrt{n}} \]

Where:
- \( \bar{x} \) = sample mean = 50
- \( \sigma \) = standard deviation = 5
- \( Z \) for a 95% confidence interval ≈ 1.96 (from the standard normal distribution table)
- \( n \) is the sample size (unknown in this case, but not needed for calculating the interval)

Substituting the values into the formula:

\[ \text{Confidence Interval} = 50 \pm 1.96 \times \frac{5}{\sqrt{n}} \]

The value of \( n \) is required to calculate the exact width of the interval. However, based on the given information, we can still compute the interval using the formula.

\[ \text{Confidence Interval} = 50 \pm 1.96 \times \frac{5}{\sqrt{n}} \]

For a known sample size, the calculation would provide the precise confidence interval, for instance:

- If \( n = 100 \), then \( \text{Confidence Interval} = 50 \pm 1.96 \times \frac{5}{\sqrt{100}} = 50 \pm 0.98 \)

Interpretation:
The 95% confidence interval for the population mean is \( (49.02, 50.98) \) with a given sample mean of 50 and standard deviation of 5. This implies that we are 95% confident that the true population mean falls within this range based on the sample data collected.

Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error?
Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

In [None]:
The margin of error in a confidence interval represents the amount of uncertainty or variability around an estimate from a sample that could be attributed to chance. It indicates the potential difference between the sample statistic (like the sample mean) and the true population parameter (like the population mean).

The formula for the margin of error in a confidence interval for a population mean (\( \mu \)) when the population standard deviation (\( \sigma \)) is known is:

\[ \text{Margin of Error} = Z \times \frac{\sigma}{\sqrt{n}} \]

Where:
- \( Z \) is the critical value from the standard normal distribution corresponding to the desired confidence level.
- \( \sigma \) is the population standard deviation.
- \( n \) is the sample size.

Regarding the impact of sample size on the margin of error:
- **Larger Sample Size:** As the sample size (\( n \)) increases, the margin of error decreases. A larger sample provides more precise estimates of the population parameter, resulting in a narrower confidence interval and reduced uncertainty.

**Example:**
Suppose you want to estimate the average height of students in a school. With a smaller sample size, say 25 students, the margin of error might be around 3 inches using a 95% confidence interval. However, if you increase the sample size to 100 students, the margin of error might reduce to 1.5 inches, resulting in a more precise estimation of the true average height of all students in the school.

Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population
standard deviation of 5. Interpret the results.

In [None]:
The formula to calculate the z-score, which measures how many standard deviations a data point is from the mean in a normal distribution, is:

\[ \text{Z-score} = \frac{\text{data point} - \text{mean}}{\text{standard deviation}} \]

Given:
- Data point = 75
- Population mean = 70
- Population standard deviation = 5

Using the formula:

\[ \text{Z-score} = \frac{75 - 70}{5} = \frac{5}{5} = 1 \]

Interpretation:
A z-score of 1 indicates that the data point of 75 is exactly 1 standard deviation above the population mean of 70 in the distribution. In other words, the value of 75 falls one standard deviation above the mean in the context of the population's distribution.

Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average
of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is
significantly effective at a 95% confidence level using a t-test.

In [None]:
To conduct a hypothesis test for the effectiveness of the weight loss drug using a t-test, we'll assess whether the sample mean weight loss is significantly different from zero (indicating no effect) at a 95% confidence level.

Given:
- Sample size (\( n \)) = 50
- Sample mean (\( \bar{x} \)) = 6 pounds
- Sample standard deviation (\( s \)) = 2.5 pounds

The null hypothesis (\( H_0 \)) is that the drug has no effect, which can be expressed as \( H_0: \mu = 0 \).
The alternative hypothesis (\( H_1 \)) is that the drug is effective, which is two-tailed and can be expressed as \( H_1: \mu \neq 0 \).

The formula for the t-statistic for testing the population mean when the population standard deviation is unknown is:

\[ t = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}} \]

Where:
- \( \bar{x} \) is the sample mean
- \( \mu_0 \) is the hypothesized population mean under the null hypothesis (0 in this case)
- \( s \) is the sample standard deviation
- \( n \) is the sample size

Substitute the values into the formula:

\[ t = \frac{6 - 0}{\frac{2.5}{\sqrt{50}}} = \frac{6}{\frac{2.5}{\sqrt{50}}} \approx \frac{6}{0.3536} \approx 16.98 \]

At a 95% confidence level for a two-tailed test with 49 degrees of freedom (sample size - 1), the critical t-value is approximately ±2.009.

As the calculated t-statistic (16.98) is far larger than the critical t-value (±2.009), we reject the null hypothesis. This suggests strong evidence that the weight loss drug has a significant effect on weight reduction at a 95% confidence level based on the sample data.

Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95%
confidence interval for the true proportion of people who are satisfied with their job.

In [None]:
To calculate the confidence interval for a proportion, we can use the formula for the confidence interval of a population proportion:

\[ \text{Confidence Interval} = \hat{p} \pm Z \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

Where:
- \( \hat{p} \) is the sample proportion
- \( Z \) for a 95% confidence interval is approximately 1.96 (corresponding to the standard normal distribution for a 95% confidence level)
- \( n \) is the sample size

Given:
- Sample proportion (\( \hat{p} \)) = 0.65
- Sample size (\( n \)) = 500

Substitute the values into the formula:

\[ \text{Confidence Interval} = 0.65 \pm 1.96 \times \sqrt{\frac{0.65 \times (1-0.65)}{500}} \]

\[ \text{Confidence Interval} = 0.65 \pm 1.96 \times \sqrt{\frac{0.65 \times 0.35}{500}} \]

\[ \text{Confidence Interval} = 0.65 \pm 1.96 \times \sqrt{\frac{0.2275}{500}} \]

\[ \text{Confidence Interval} = 0.65 \pm 1.96 \times \sqrt{0.000455} \]

\[ \text{Confidence Interval} = 0.65 \pm 1.96 \times 0.02134 \]

\[ \text{Confidence Interval} = 0.65 \pm 0.04185 \]

The 95% confidence interval for the true proportion of people satisfied with their job is approximately \( (0.608, 0.692) \). This means that we can be 95% confident that the true proportion of people satisfied with their job falls within this range based on the survey data.

Q12. A researcher is testing the effectiveness of two different teaching methods on student performance.
Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82
with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a
significant difference in student performance using a t-test with a significance level of 0.01.

In [None]:
To test if there's a significant difference in student performance between the two teaching methods, we'll conduct an independent samples t-test. Given the two samples (Sample A and Sample B), we'll compare their means.

The hypotheses for the t-test are:
- Null Hypothesis (\(H_0\)): The means of the two samples are equal. (\(\mu_1 = \mu_2\))
- Alternative Hypothesis (\(H_1\)): The means of the two samples are different. (\(\mu_1 \neq \mu_2\))

Given:
- Sample A: Mean (\(\bar{x}_1\)) = 85, Standard Deviation (\(s_1\)) = 6
- Sample B: Mean (\(\bar{x}_2\)) = 82, Standard Deviation (\(s_2\)) = 5
- Significance level = 0.01

We'll use a two-sample independent t-test assuming unequal variances (Welch's t-test) due to different standard deviations.

The formula for the t-statistic for independent samples t-test is:

\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

Where:
- \(\bar{x}_1\) and \(\bar{x}_2\) are the sample means
- \(s_1\) and \(s_2\) are the sample standard deviations
- \(n_1\) and \(n_2\) are the sample sizes

Substitute the given values into the formula:

\[ t = \frac{85 - 82}{\sqrt{\frac{6^2}{n_1} + \frac{5^2}{n_2}}} \]

Since the sample sizes are not provided, the degrees of freedom will be calculated based on the size of each sample. However, without the exact sample sizes, we cannot compute the t-statistic or determine the critical t-value for the significance level of 0.01.

Once the sample sizes are known, plug the values into the formula and compare the calculated t-statistic to the critical t-value at the given significance level. If the calculated t-statistic exceeds the critical t-value, we reject the null hypothesis and conclude that there is a significant difference in the performance of the two teaching methods.

Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean
of 65. Calculate the 90% confidence interval for the true population mean.

In [None]:
To calculate the confidence interval for the population mean when the population standard deviation is known, we'll use the formula:

\[ \text{Confidence Interval} = \bar{x} \pm Z \times \frac{\sigma}{\sqrt{n}} \]

Where:
- \(\bar{x}\) is the sample mean
- \(Z\) for a 90% confidence interval is approximately 1.645 (corresponding to the standard normal distribution for a 90% confidence level)
- \(\sigma\) is the population standard deviation
- \(n\) is the sample size

Given:
- Population mean (\(\mu\)) = 60
- Population standard deviation (\(\sigma\)) = 8
- Sample size (\(n\)) = 50
- Sample mean (\(\bar{x}\)) = 65

Substitute the values into the formula:

\[ \text{Confidence Interval} = 65 \pm 1.645 \times \frac{8}{\sqrt{50}} \]

\[ \text{Confidence Interval} = 65 \pm 1.645 \times \frac{8}{\sqrt{50}} \]

\[ \text{Confidence Interval} = 65 \pm 1.645 \times \frac{8}{7.071} \]

\[ \text{Confidence Interval} = 65 \pm 1.645 \times 1.133 \]

\[ \text{Confidence Interval} = 65 \pm 1.863 \]

The 90% confidence interval for the true population mean is approximately \( (63.137, 66.863) \). This means that we are 90% confident that the true population mean falls within this range based on the sample data collected.

Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average
reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to
determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In [None]:
To conduct a hypothesis test to determine if caffeine has a significant effect on reaction time, we'll perform a one-sample t-test with the given sample data.

Given:
- Sample size (\(n\)) = 30
- Sample mean (\(\bar{x}\)) = 0.25 seconds
- Sample standard deviation (\(s\)) = 0.05 seconds
- Confidence level = 90%

The hypotheses for the one-sample t-test are:
- Null Hypothesis (\(H_0\)): Caffeine has no significant effect on reaction time (\(\mu = \mu_0\))
- Alternative Hypothesis (\(H_1\)): Caffeine has a significant effect on reaction time (\(\mu \neq \mu_0\))

Where \(\mu_0\) is the hypothesized population mean.

The formula for the t-statistic for testing the population mean when the population standard deviation is unknown is:

\[ t = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}} \]

Where:
- \(\bar{x}\) is the sample mean
- \(\mu_0\) is the hypothesized population mean under the null hypothesis
- \(s\) is the sample standard deviation
- \(n\) is the sample size

Substitute the given values into the formula:

\[ t = \frac{0.25 - \mu_0}{\frac{0.05}{\sqrt{30}}} \]

Since the population mean (\(\mu_0\)) is not explicitly given, we can't calculate the exact t-statistic without it. The t-statistic will be compared to the critical t-value for a two-tailed test at a 90% confidence level with 29 degrees of freedom.

Once \(\mu_0\) is provided or assumed, we can calculate the t-statistic and compare it to the critical t-value to determine if there's a significant effect of caffeine on reaction time at a 90% confidence level. If the calculated t-statistic exceeds the critical t-value, we reject the null hypothesis, indicating a significant effect of caffeine on reaction time.