In [1]:
# Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would
# use each type of test.

# The main difference between a t-test and a z-test lies in their application and assumptions regarding the population standard deviation:

# 1. **Z-test:**
#    - **Assumption:** Requires knowledge of the population standard deviation.
#    - **Usage:** Typically used when the sample size is large (usually \( n \geq 30 \)) and the population standard deviation is known.
#    - **Example Scenario:** Suppose you want to test whether the mean weight of newborn babies in a hospital is significantly different from a known population mean weight of newborns (e.g., 3 kg). If you have a large sample size (e.g., 100 newborns) and you know the population standard deviation (e.g., 0.5 kg), you can use a z-test to determine if the sample mean weight differs significantly from the population mean weight.

# 2. **T-test:**
#    - **Assumption:** Works with the sample standard deviation and assumes the data follows a normal distribution (or the sample size is sufficiently large for the Central Limit Theorem to apply).
#    - **Usage:** Used when the sample size is small (typically less than 30) or the population standard deviation is unknown.
#    - **Example Scenario:** Consider you are testing whether a new teaching method improves students' test scores compared to the traditional method. You randomly select 20 students and measure their test scores before and after implementing the new method. Since you have a small sample size (e.g., 20 students), and you don't know the population standard deviation of the test scores, you would use a t-test to compare the mean scores before and after the intervention.

# In summary, the choice between a z-test and a t-test depends on the sample size, whether the population standard deviation is known, and the assumptions about the data distribution.

In [None]:
# Q2: Differentiate between one-tailed and two-tailed tests.
    
#     One-tailed and two-tailed tests are concepts used in hypothesis testing to determine the directionality of the statistical hypothesis and the corresponding critical region of the test.

# 1. **One-Tailed Test:**
#    - Also known as a directional hypothesis test.
#    - In a one-tailed test, the entire probability of making a Type I error (rejecting a null hypothesis that is actually true) is concentrated in one tail (either the left tail or the right tail) of the probability distribution.
#    - It is used when the researcher is interested in testing for a change in a specific direction, either an increase or a decrease, but not both.
#    - The null hypothesis \( H_0 \) and the alternative hypothesis \( H_a \) are typically defined with a specific inequality (e.g., \( \mu > \mu_0 \) or \( \mu < \mu_0 \)), indicating the direction of the effect being tested.

# 2. **Two-Tailed Test:**
#    - Also known as non-directional hypothesis test.
#    - In a two-tailed test, the critical region is divided between both tails of the probability distribution.
#    - It is used when the researcher wants to test if there is a difference in either direction from the null hypothesis value (i.e., whether a parameter is not equal to a specified value).
#    - The null hypothesis \( H_0 \) typically states that there is no difference (e.g., \( \mu = \mu_0 \)), and the alternative hypothesis \( H_a \) would then be that there is a difference (e.g., \( \mu \neq \mu_0 \)).

# **Key Differences:**

# - **Critical Region:** In a one-tailed test, all of the significance level (α) is in one tail of the distribution. In a two-tailed test, the α level is split between two tails.
  
# - **Direction of the Hypothesis:** One-tailed tests focus on detecting an effect in one specific direction (either less than or greater than), while two-tailed tests test for differences in either direction from the null hypothesis value.

# - **Use Case:** One-tailed tests are used when there is a strong directional hypothesis or prior expectation about the direction of the effect. Two-tailed tests are used when there is no specific expectation or when the researcher wants to be open to detecting differences in either direction.

# In practice, the choice between a one-tailed and a two-tailed test depends on the specific research question, the nature of the hypothesis being tested, and prior knowledge or expectations about the direction of any potential effects.

In [None]:
# Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for
# each type of error.

# Type 1 error occurs when a null hypothesis that is actually true is rejected. For example, in drug testing, concluding a new drug is effective when it's actually not (false positive).

# Type 2 error happens when a null hypothesis that is false is not rejected. For instance, failing to conclude a drug is effective when it truly is (false negative).

In [None]:
# Q4: Explain Bayes's theorem with an example.
    
#     Bayes's theorem describes the probability of an event based on prior knowledge of conditions related to the event. It is formulated as \( P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)} \), where \( P(A | B) \) is the probability of event A given B. 

# **Example:** Consider a medical test for a rare disease where 1% of the population has it. The test correctly identifies the disease 95% of the time (true positive rate) and incorrectly identifies it in 2% of healthy individuals (false positive rate). If a person tests positive, Bayes's theorem can help calculate the probability they actually have the disease.

In [2]:
# Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.
    
# A confidence interval is a range of values that likely contains the true unknown parameter of interest, based on a sample from a population. It quantifies the uncertainty around an estimate, providing a plausible range within which the population parameter is expected to lie.

# **Calculation:** To compute a confidence interval for a population mean, you typically use the formula: \( \bar{x} \pm z \cdot \frac{s}{\sqrt{n}} \), where \( \bar{x} \) is the sample mean, \( s \) is the sample standard deviation, \( n \) is the sample size, and \( z \) is the critical value from the standard normal distribution corresponding to the desired confidence level (e.g., 1.96 for 95% confidence).

# **Example:** Suppose you have a sample of 100 individuals and you want to estimate the average height of a population. If the sample mean height is 170 cm and the standard deviation is 5 cm, the 95% confidence interval would be \( 170 \pm 1.96 \cdot \frac{5}{\sqrt{100}} \), resulting in a confidence interval of approximately 168.02 cm to 171.98 cm.

In [3]:
# Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the
# event's probability and new evidence. Provide a sample problem and solution.


# Sure, let's use Bayes' Theorem to solve a sample problem:

# **Problem:**

# In a certain city, 20% of all vehicles are trucks. A police officer reports that they saw a vehicle involved in a hit-and-run accident, and the vehicle appeared to be a truck. However, it's known that the officer correctly identifies trucks 90% of the time and incorrectly identifies non-trucks as trucks 5% of the time. What is the probability that the vehicle involved in the hit-and-run was actually a truck?

# **Solution:**

# Let's define the events:
# - \( T \): The vehicle is a truck.
# - \( H \): The officer reports that the vehicle is a truck.

# Given:
# - Prior probability \( P(T) = 0.20 \) (20% of all vehicles are trucks).
# - Probability of correctly identifying a truck \( P(H | T) = 0.90 \).
# - Probability of incorrectly identifying a non-truck as a truck \( P(H | \neg T) = 0.05 \).

# We need to find \( P(T | H) \), the probability that the vehicle is a truck given the officer reported it as a truck.

# Using Bayes' Theorem:

# \[ P(T | H) = \frac{P(H | T) \cdot P(T)}{P(H)} \]

# First, calculate \( P(H) \), the total probability of the officer reporting the vehicle as a truck:

# \[ P(H) = P(H | T) \cdot P(T) + P(H | \neg T) \cdot P(\neg T) \]
# \[ P(H) = 0.90 \cdot 0.20 + 0.05 \cdot 0.80 \]
# \[ P(H) = 0.18 + 0.04 \]
# \[ P(H) = 0.22 \]

# Now, apply Bayes' Theorem:

# \[ P(T | H) = \frac{0.90 \cdot 0.20}{0.22} \]
# \[ P(T | H) = \frac{0.18}{0.22} \]
# \[ P(T | H) \approx 0.8182 \]

# Therefore, the probability that the vehicle involved in the hit-and-run was actually a truck, given that the officer reported it as a truck, is approximately \( 0.8182 \) or \( 81.82\% \).

In [None]:
Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5. Interpret the results.

To calculate the 95% confidence interval for a sample with a mean of 50 and a standard deviation of 5, we use the formula for the confidence interval of the population mean when the population standard deviation is known:

\[ \text{Confidence Interval} = \bar{x} \pm z \cdot \frac{\sigma}{\sqrt{n}} \]

where:
- \( \bar{x} \) is the sample mean,
- \( \sigma \) is the population standard deviation,
- \( n \) is the sample size,
- \( z \) is the critical value from the standard normal distribution corresponding to the desired confidence level. For a 95% confidence level, \( z = 1.96 \).

Given:
- \( \bar{x} = 50 \)
- \( \sigma = 5 \)
- We assume the sample size \( n \) is sufficiently large for the normal approximation (though it's not explicitly given, this is a typical assumption in such problems).

Now, calculate the confidence interval:

\[ \text{Confidence Interval} = 50 \pm 1.96 \cdot \frac{5}{\sqrt{n}} \]

Since \( n \) is not specified, we'll proceed with the assumption that the sample size is large enough for the normal approximation to apply. The margin of error \( 1.96 \cdot \frac{5}{\sqrt{n}} \) represents the amount added and subtracted from the sample mean to form the interval.

Interpreting the results:
- The 95% confidence interval is a range of values within which we are 95% confident that the population mean lies.
- Based on the calculation, the confidence interval would be \( 50 \pm 1.96 \cdot \frac{5}{\sqrt{n}} \).
- To provide a specific interval, the exact value of \( n \) is needed. However, for illustrative purposes, if \( n = 100 \) (for instance), then \( \sqrt{n} = 10 \), and the confidence interval would be \( 50 \pm 1.96 \cdot \frac{5}{10} \), which simplifies to \( 50 \pm 0.98 \), resulting in the interval \( (49.02, 50.98) \).

Thus, we can state that we are 95% confident that the true population mean lies between 49.02 and 50.98.