In [None]:
Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would
use each type of test.

In [None]:
The t-test and z-test are both statistical tests used to determine whether there are significant differences between 
the means of two groups, but they are applied in different contexts based on sample size and population variance 
knowledge.

Key Differences
1. Population Variance:
   - t-test: Used when the population standard deviation is unknown and the sample size is small 
(typically ( n < 30 )).
   - z-test: Used when the population standard deviation is known or the sample size is large 
(typically ( n >= 30 )), making the sample mean approximately normally distributed due to the Central Limit Theorem.

2. Distribution:
   - t-test: Uses the t-distribution, which has heavier tails than the normal distribution, accounting for 
additional uncertainty due to smaller sample sizes.
   - z-test: Uses the standard normal distribution (z-distribution).

3. Sample Size:
   - t-test: Suitable for smaller samples (less than 30).
   - z-test: More appropriate for larger samples (30 or more).

Example Scenarios

1. t-test Scenario:
   - A researcher wants to compare the effectiveness of a new teaching method on student performance. They randomly 
select 20 students from a small class, measure their test scores after using the method, and calculate the mean. 
Since the sample size is small and the population variance is unknown, the researcher would use a **t-test** to 
determine if the mean score is significantly different from a known historical mean score.

2. z-test Scenario:
   - A quality control manager in a manufacturing plant wants to test whether a new machine produces parts that are 
within the specified weight limit. They know the population standard deviation from previous data and collect a sample 
of 100 parts. Because the sample size is large and the population standard deviation is known, the manager would use 
a z-test to determine if the mean weight of the produced parts significantly deviates from the target weight.

In [None]:
Q2: Differentiate between one-tailed and two-tailed tests.

In [None]:
One-Tailed Test
- Definition: A one-tailed test examines the effect in one specific direction. It tests whether the sample mean is 
either greater than or less than a specific value (the null hypothesis).
- Example Scenario: A company claims that their new product has an average lifespan greater than 100 hours. 
The null hypothesis would be that the average lifespan is 100 hours or less, while the alternative hypothesis would 
be that the average lifespan is greater than 100 hours. In this case, a one-tailed test is appropriate.

Two-Tailed Test

- Definition: A two-tailed test examines the effect in both directions. It tests whether the sample mean is 
significantly different from a specified value, either greater than or less than that value.
- Example Scenario: A researcher wants to determine if a new teaching method affects student performance compared to a
traditional method. The null hypothesis states that there is no difference in average scores, while the alternative 
hypothesis states that there is a difference (the new method could either improve or worsen performance). A two-tailed 
test is appropriate here.

In [None]:
Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for
each type of error.

In [None]:
In hypothesis testing, Type I and Type II errors are potential errors that can occur when making decisions based on 
statistical tests. Understanding these errors is crucial for interpreting the results of hypothesis tests.

Type I Error (False Positive)
- Definition: A Type I error occurs when the null hypothesis is rejected when it is actually true. In other 
words, it leads to the conclusion that there is an effect or difference when there is none.
- Significance Level: The probability of making a Type I error is denoted by, which is typically set at 0.05 (5%). 
This means there is a 5% chance of rejecting a true null hypothesis.
- Example Scenario: Imagine a medical test for a disease that indicates a patient is sick when they are actually 
healthy. If a patient receives a positive test result indicating they have the disease when they do not, this is a 
Type I error. The consequence might lead to unnecessary treatments or anxiety for the patient.

Type II Error (False Negative)
- Definition: A Type II error occurs when the null hypothesis is not rejected when it is actually false. 
This means that the test fails to detect an effect or difference that truly exists.
- Power of the Test: The power of a test is calculated as, indicating the likelihood of correctly rejecting a false 
null hypothesis.
- Example Scenario: Consider a quality control test for a manufacturing process. If a batch of products is produced 
with defects, but the test indicates that the batch is acceptable (thus failing to reject the null hypothesis that 
the batch is good), this is a Type II error. This could lead to defective products reaching customers, resulting in 
dissatisfaction and potential harm.

In [None]:
Q4: Explain Bayes's theorem with an example.

In [None]:
Bayes's theorem is a fundamental concept in probability and statistics that describes how to update the probability 
of a hypothesis based on new evidence. It provides a way to calculate conditional probabilities and is especially 
useful in situations where we want to revise our beliefs in light of new data.

Bayes's Theorem Formula
The theorem can be expressed mathematically as:

P(A|B) = {P(B|A) . P(A)}/{P(B)}

Where:
- P(A|B) is the posterior probability: the probability of event (A) occurring given that (B) is true.
- P(B|A) is the likelihood: the probability of event (B) occurring given that (A) is true.
- P(A) is the prior probability: the initial probability of event (A).
- P(B) is the marginal probability of event (B).

Example Scenario

Medical Testing Scenario:
Suppose we have a certain disease that affects 1% of a population. A medical test is available that can accurately
identify the disease, but it is not perfect. The test has:
- A true positive rate (sensitivity) of 90%: If a person has the disease, the test will correctly identify it 90% of 
the time.
- A false positive rate of 5%: If a person does not have the disease, the test will incorrectly indicate that they do
have it 5% of the time.

We want to find out the probability that a person has the disease given that they tested positive for it.

Step 1: Define the Events
- Let (A) be the event that a person has the disease.
- Let (B) be the event that a person tests positive for the disease.

Step 2: Identify the Probabilities
- P(A) = 0.01 (the prior probability that a person has the disease).
- P(B|A) = 0.90 (the probability of testing positive if the person has the disease).
- P(B|neg A) = 0.05 (the probability of testing positive if the person does not have the disease).

Step 3: Calculate (P(B))
To find (P(B)), we can use the law of total probability:

P(B) = P(B|A) . P(A) + P(B|neg A) . P(neg A)

P(B) = (0.90 . 0.01) + (0.05 . 0.99) = 0.009 + 0.0495 = 0.0585

Step 4: Apply Bayes's Theorem
Now we can use Bayes's theorem to find (P(A|B)):

P(A|B) = {P(B|A) . P(A)} / {P(B)} = {0.90 . 0.01}/{0.0585} = {0.009}/{0.0585} = approx 0.1538


In [None]:
Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

In [None]:
A confidence interval (CI) is a range of values that is used to estimate the true parameter of a population based 
on sample data. It provides an interval within which we expect the true population parameter to fall, along with a 
specified level of confidence (usually expressed as a percentage, like 95% or 99%).

Key Components of Confidence Intervals

1. Point Estimate: The sample statistic (e.g., sample mean) that serves as a best estimate of the population parameter.
2. Margin of Error: The range above and below the point estimate, determined by the desired confidence level and the 
variability of the data.
3. Confidence Level: The probability that the interval will contain the true population parameter. Common confidence 
levels are 90%, 95%, and 99%.

Formula for Confidence Interval
For a population mean, the confidence interval can be calculated using the formula:

text{CI} = bar{x} pm z . left({s}/{sqrt{n}}right)

Where:
- (bar{x}) = sample mean
- (z) = z-score corresponding to the desired confidence level (e.g., 1.96 for 95% confidence)
- (s) = sample standard deviation
- (n) = sample size

Example Calculation

Suppose we want to estimate the average height of a certain species of plant. We take a random sample of 30 plants 
and find the following:

- Sample mean height ((bar{x})): 50 cm
- Sample standard deviation ((s)): 5 cm
- Desired confidence level: 95%

Step 1: Find the z-score

For a 95% confidence level, the z-score is approximately 1.96.

Step 2: Calculate the Margin of Error

text{Margin of Error} = z . ({s}/{sqrt{n}}) = 1.96 . \left({5}/{\sqrt{30}})

Calculating (frac{5}/{sqrt{30}}):

frac{5}/{sqrt{30}} = approx {5}/{5.477} = approx 0.913

Now calculate the Margin of Error:

text{Margin of Error} = 1.96 . 0.913 = approx 1.79

Step 3: Calculate the Confidence Interval

Now we can construct the confidence interval:

text{CI} = bar{x} pm text{Margin of Error} = 50 pm 1.79

Thus, the confidence interval is:

text{CI} = [50 - 1.79, 50 + 1.79] = [48.21, 51.79]

In [None]:
Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the
event's probability and new evidence. Provide a sample problem and solution.

In [None]:
Bayes' Theorem Overview
Bayes' Theorem provides a way to update the probability estimate for an event based on new evidence. The formula is:

P(A|B) = {P(B|A) * P(A)}/{P(B)}

Where:
- (P(A|B)) is the probability of event (A) occurring given that (B) is true (posterior probability).
- (P(B|A)) is the probability of event (B) occurring given that (A) is true (likelihood).
- (P(A)) is the probability of event (A) occurring (prior probability).
- (P(B)) is the probability of event (B) occurring (marginal likelihood).

Sample Problem
Scenario: Medical Testing

Suppose a particular disease affects 1% of a population. There is a test for this disease that is 90% accurate, 
meaning:
- If a person has the disease, the test will correctly identify it 90% of the time (True Positive Rate).
- If a person does not have the disease, the test will correctly identify this 90% of the time (True Negative Rate).

Question: If a person tests positive for the disease, what is the probability that they actually have the disease?

Given Data
- (P(text{Disease}) = P(A) = 0.01\) (1% prevalence)
- (P(text{No Disease}) = 1 - P(A) = 0.99\)
- (P(text{Positive Test} | \text{Disease}) = P(B|A) = 0.90\) (sensitivity)
- (P(text{Positive Test} | \text{No Disease}) = P(B|\text{No } A) = 0.10\) (False Positive Rate)

Step 1: Calculate (P(B))

Using the law of total probability:

P(B) = P(B|A) * P(A) + P(B|\text{No } A) * P(\text{No } A)

Substituting in the values:

P(B) = (0.90 * 0.01) + (0.10 * 0.99) = 0.009 + 0.099 = 0.108

Step 2: Apply Bayes' Theorem
Now we can use Bayes' Theorem to find (P(A|B)):

P(A|B) = {P(B|A) * P(A)}/{P(B)}

Substituting the known values:

P(A|B) = {0.90 * 0.01}/{0.108}

Calculating this gives:

P(A|B) = {0.009}/{0.108} = approx 0.0833


In [None]:
Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5. Interpret the results.

In [None]:
To calculate the 95% confidence interval for a sample mean, you can use the formula:

CI = bar{x} pm z ({sigma}/{sqrt{n}}

Where:
- (bar{x}) is the sample mean.
- (z) is the z-score corresponding to the desired confidence level.
- (sigma) is the population standard deviation.
- (n) is the sample size.

Given Data
- Sample mean (bar{x}) = 50
- Standard deviation (sigma) = 5
- Confidence level = 95%

For a 95% confidence level, the z-score is approximately 1.96 (from standard normal distribution tables).

Step 1: Calculate the Standard Error (SE)

SE = {sigma}/{\sqrt{n}}

Assuming a sample size (n) of 30 (common for such calculations):

SE = frac{5}/{sqrt{30}} = approx {5}/{5.477} = approx 0.913

Step 2: Calculate the Margin of Error (ME)

ME = z * SE = 1.96 * 0.913 = approx 1.791

Step 3: Calculate the Confidence Interval (CI)

CI = bar{x} pm ME = 50 pm 1.791

Thus, the confidence interval is:

CI = (50 - 1.791, 50 + 1.791) = (48.209, 51.791)


In [None]:
Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error?
Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

In [None]:
### Margin of Error in a Confidence Interval

The **margin of error** (ME) in a confidence interval quantifies the amount of uncertainty or potential error in the estimate of a population parameter. It is essentially the range above and below the sample statistic (like the sample mean) that defines the confidence interval. The formula for margin of error is:

\[
ME = z \cdot SE
\]

Where:
- \(z\) is the z-score corresponding to the desired confidence level.
- \(SE\) (Standard Error) is calculated as:

\[
SE = \frac{\sigma}{\sqrt{n}}
\]

Here, \(\sigma\) is the population standard deviation and \(n\) is the sample size.

### Effect of Sample Size on Margin of Error

The margin of error is inversely related to the square root of the sample size (\(n\)). This means that as the sample size increases, the standard error decreases, resulting in a smaller margin of error. 

Mathematically, you can see this relationship in the formula for the standard error:

\[
SE = \frac{\sigma}{\sqrt{n}}
\]

### Example Scenario

**Scenario: Polling for Election Results**

Imagine a polling organization is trying to estimate the proportion of voters who favor a certain candidate. They conduct two separate polls:

1. **Poll A**: Sample size of 100 voters.
2. **Poll B**: Sample size of 1000 voters.

Assume the proportion of voters favoring the candidate is estimated to be 60% (0.60), and the population standard deviation (\(\sigma\)) for proportions can be approximated as:

\[
\sigma = \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{0.6 \cdot 0.4}{n}}
\]

#### For Poll A:

- \(n = 100\)

\[
SE_A = \sqrt{\frac{0.6 \cdot 0.4}{100}} = \sqrt{0.0024} \approx 0.049
\]

- If we use a 95% confidence level, \(z \approx 1.96\):

\[
ME_A = 1.96 \cdot 0.049 \approx 0.096
\]

#### For Poll B:

- \(n = 1000\)

\[
SE_B = \sqrt{\frac{0.6 \cdot 0.4}{1000}} = \sqrt{0.00024} \approx 0.0155
\]

- Again, using \(z \approx 1.96\):

\[
ME_B = 1.96 \cdot 0.0155 \approx 0.0305
\]

Conclusion

- Poll A (n=100): Margin of Error ≈ 9.6%
- Poll B (n=1000): Margin of Error ≈ 3.05%
    

In [None]:
Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population
standard deviation of 5. Interpret the results.

In [None]:
A z-score of 1 means that the data point of 75 is 1 standard deviation above the mean of the population.
This indicates that the value is higher than average, and in a normal distribution, about 84% of the population 
would fall below this score.

In [None]:
Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average
of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is
significantly effective at a 95% confidence level using a t-test.

In [None]:
There is sufficient evidence to conclude that the weight loss drug is significantly effective at the 95% confidence
level, as the average weight loss of participants is significantly greater than 0 pounds.

In [None]:
Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95%
confidence interval for the true proportion of people who are satisfied with their job.

In [None]:
The 95% confidence interval for the true proportion of people who are satisfied with their job is approximately 
(0.608, 0.692). This means we can be 95% confident that the true proportion of satisfied individuals in the population
is between 60.8% and 69.2%.

In [None]:
Q12. A researcher is testing the effectiveness of two different teaching methods on student performance.
Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82
with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a
significant difference in student performance using a t-test with a significance level of 0.01.

In [None]:
At the 0.01 significance level, there is not enough evidence to conclude that there is a significant difference in
student performance between the two teaching methods.

In [None]:
Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean
of 65. Calculate the 90% confidence interval for the true population mean.

In [None]:
The 90% confidence interval for the true population mean is approximately (62.67, 67.33). 
This means we can be 90% confident that the true population mean lies within this interval.

In [None]:
Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average
reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to
determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In [None]:
At the 90% confidence level, there is not enough evidence to conclude that caffeine has a significant effect on 
reaction time. The data does not indicate a significant difference from the average reaction time of 0.25 seconds.