In [None]:
# 1. What is hypothesis testing in statistics?

Hypothesis testing is a statistical method used to make decisions or inferences about a population based on sample data.
It involves two competing hypotheses:

1. Null Hypothesis (H0): Assumes no effect or no difference. It is the statement to be tested.
2. Alternative Hypothesis (H1 or Ha): Represents the claim or effect we want to provide evidence for.

The testing process includes:
- Collecting sample data.
- Calculating a test statistic.
- Comparing the test statistic to a critical value or using a p-value.
- Making a decision to either reject or fail to reject the null hypothesis.

Hypothesis testing helps determine whether observed data provides enough evidence to support a specific claim about the population.


In [None]:
# 2. What is the null hypothesis, and how does it differ from the alternative hypothesis?

Null Hypothesis (H0):
- It is a statement of no effect, no difference, or no change.
- It represents the default or status quo assumption.
- The goal of hypothesis testing is to assess whether there is enough evidence to reject the null hypothesis.

Alternative Hypothesis (H1 or Ha):
- It is a statement that contradicts the null hypothesis.
- Represents the claim or effect we want to test or prove.
- It suggests that there is an effect, difference, or change.

Difference:
- The null hypothesis assumes no relationship or effect, while the alternative hypothesis suggests that a relationship or effect exists.
- Hypothesis testing evaluates whether data provides enough evidence to reject H0 in favor of H1.

In [None]:
# 3. What is the significance level in hypothesis testing, and why is it important?

Significance Level (α) in Hypothesis Testing:

- The significance level, denoted by alpha (α), is the threshold probability used to decide whether to reject the null hypothesis.
- It represents the maximum risk of making a Type I error — rejecting the null hypothesis when it is actually true.
- Common significance levels are 0.05, 0.01, and 0.10, meaning a 5%, 1%, or 10% risk of false rejection.
- If the p-value of the test is less than or equal to α, we reject the null hypothesis; otherwise, we fail to reject it.
- The significance level helps control the balance between sensitivity to detect an effect and the risk of false positives.

Importance:
- It sets a clear criterion for making statistical decisions.
- Helps maintain the reliability and validity of conclusions drawn from data.

In [None]:
# 4.  What does a P-value represent in hypothesis testing?

P-value in Hypothesis Testing:

- The P-value is the probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true.
- It quantifies the evidence against the null hypothesis.
- A smaller P-value indicates stronger evidence to reject the null hypothesis.
- If the P-value is less than or equal to the chosen significance level (α), we reject the null hypothesis.
- If the P-value is greater than α, we fail to reject the null hypothesis.
- The P-value helps determine the statistical significance of the observed effect.

In [None]:
# 5.  How do you interpret the P-value in hypothesis testing?

Interpreting the P-value in Hypothesis Testing:

- If the P-value ≤ significance level (α):
    - Reject the null hypothesis (H0).
    - There is sufficient evidence to support the alternative hypothesis (H1).
    - The observed result is statistically significant.

- If the P-value > significance level (α):
    - Fail to reject the null hypothesis.
    - There is insufficient evidence to support the alternative hypothesis.
    - The observed result is not statistically significant.

- A small P-value indicates strong evidence against H0.
- A large P-value suggests that the data is consistent with H0.

In [None]:
# 6.What are Type 1 and Type 2 errors in hypothesis testing?

- Type 1 error = rejecting a true null hypothesis.
- Type 2 error = failing to reject a false null hypothesis.

- Type 1 Error (False Positive):
  - Occurs when the null hypothesis (H0) is rejected even though it is true.
  - The probability of making a Type 1 error is the significance level (α).
  - Example: Concluding a new drug works when it actually does not.

- Type 2 Error (False Negative):
  - Occurs when the null hypothesis is not rejected even though the alternative hypothesis (H1) is true.
  - The probability of making a Type 2 error is denoted by β.
  - Example: Failing to detect that a new drug works when it actually does.

In [None]:
# 7. What is the difference between a one-tailed and a two-tailed test in hypothesis testing?

- One-tailed: Directional hypothesis, more powerful if direction is correct.
- Two-tailed: Non-directional hypothesis, more conservative.

- One-tailed Test:
  - Tests for an effect in only one direction (either greater than or less than).
  - The entire significance level (α) is placed in one tail of the distribution.
  - Used when the research hypothesis predicts the direction of the effect.
  - Example: Testing if a new drug is **better** than the current drug.

- Two-tailed Test:
  - Tests for an effect in both directions (either greater than or less than).
  - The significance level (α) is split equally between the two tails.
  - Used when the research hypothesis does not predict the direction.
  - Example: Testing if a new drug has **any different effect** (better or worse) than the current drug.

In [None]:
# 8. What is the Z-test, and when is it used in hypothesis testing?

- A Z-test is a statistical test used to determine if there is a significant difference between sample and population means or between two sample means when the population variance is known.
- It assumes that the sample size is large (typically n > 30), or the population is normally distributed.
- The test statistic follows a standard normal distribution (Z-distribution).
- Common uses:
  - Testing the mean of a single sample against a known population mean.
  - Comparing means of two samples when population variances are known.
- Formula for the Z-test statistic for a single sample mean:
  Z = (sample_mean - population_mean) / (population_std_dev / sqrt(sample_size))

When to use:
- When population variance is known.
- Large sample sizes or normal population distribution.
- Useful for testing hypotheses about means.

In [None]:
# 9. How do you calculate the Z-score, and what does it represent in hypothesis testing?

- The Z-score measures how many standard deviations a data point or sample mean is from the population mean.
- Formula to calculate Z-score for a sample mean:

  Z = (X̄ - μ) / (σ / √n)

  Where:
    X̄ = sample mean
    μ = population mean
    σ = population standard deviation
    n = sample size

- Interpretation:
  - A large absolute Z-score (positive or negative) indicates the sample mean is far from the population mean.
  - In hypothesis testing, the Z-score helps determine how likely the observed sample mean is under the null hypothesis.
  - The Z-score is compared to critical values from the standard normal distribution to decide whether to reject the null hypothesis.

In [None]:
# 10. What is the relationship between Z-test and T-test in hypothesis testing?

- Use Z-test when population variance is known and sample size is large.
- Use T-test when population variance is unknown or sample size is small.

- Both tests are used to determine if there is a significant difference between means.
- The key difference lies in knowledge about population variance and sample size:

1. Z-test:
   - Used when the population variance (or standard deviation) is known.
   - Suitable for large sample sizes (usually n > 30).
   - Uses the standard normal distribution (Z-distribution).

2. T-test:
   - Used when the population variance is unknown.
   - Suitable for small sample sizes (usually n ≤ 30).
   - Uses the Student’s t-distribution, which accounts for extra uncertainty from estimating the population variance.

- As sample size increases, the t-distribution approaches the normal distribution, so T-test results become similar to Z-test results.

In [None]:
# 11. What is a confidence interval, and how is it used to interpret statistical results?

- A confidence interval is a range of values, derived from sample data, that is likely to contain the true population parameter (e.g., mean) with a specified level of confidence.
- It is usually expressed with a confidence level, such as 95% or 99%, indicating the probability that the interval contains the true parameter.
- Formula for a confidence interval for a population mean (when population standard deviation is known):

  CI = sample_mean ± Z*(σ / √n)

  Where:
    - Z* is the critical value from the standard normal distribution corresponding to the confidence level
    - σ is the population standard deviation
    - n is the sample size

- Interpretation:
  - If we say a 95% confidence interval for a mean is (50, 60), it means we are 95% confident that the true population mean lies between 50 and 60.
  - Confidence intervals provide more information than hypothesis tests because they estimate the range of plausible values for the parameter.
- Use:
  - To estimate population parameters with an indication of uncertainty.
  - To assess the precision of sample estimates.

In [None]:
# 12. What is the margin of error, and how does it affect the confidence interval?

- The margin of error (MOE) is the maximum expected difference between the sample estimate and the true population parameter.
- It quantifies the amount of uncertainty or potential error in the estimate.
- Formula for margin of error when estimating a population mean (with known population standard deviation):

  MOE = Z* × (σ / √n)

  Where:
    - Z* is the critical value corresponding to the desired confidence level (e.g., 1.96 for 95%)
    - σ is the population standard deviation
    - n is the sample size

- How MOE affects Confidence Interval (CI):
  - The confidence interval is constructed as:

    CI = sample_mean ± MOE

  - A larger margin of error results in a wider confidence interval, indicating less precision.
  - A smaller margin of error results in a narrower confidence interval, indicating more precise estimates.

- Factors affecting MOE:
  - Increasing sample size decreases MOE, leading to more precise estimates.
  - Higher confidence levels increase MOE, resulting in wider intervals.

In [None]:
# 13. How is Bayes' Theorem used in statistics, and what is its significance?

- Bayes' Theorem is a mathematical formula used to update the probability of a hypothesis based on new evidence or data.
- It expresses how to revise prior beliefs (prior probability) into updated beliefs (posterior probability) after observing new data.
- The formula is:

  P(A|B) = [P(B|A) × P(A)] / P(B)

  Where:
    - P(A|B) is the posterior probability: probability of event A given event B occurred.
    - P(B|A) is the likelihood: probability of event B given event A.
    - P(A) is the prior probability of event A.
    - P(B) is the total probability of event B.

- Significance:
  - Allows incorporation of new information to improve decision-making.
  - Widely used in statistics, machine learning, medical diagnosis, and many real-world applications.
  - Helps in updating probabilities dynamically as more data becomes available.

In [None]:
# 14. What is the Chi-square distribution, and when is it used?

- The Chi-square (χ²) distribution is a continuous probability distribution that arises in statistics mainly when dealing with variance estimates of a normally distributed population.
- It is the distribution of a sum of the squares of independent standard normal random variables.
- The shape of the Chi-square distribution depends on the degrees of freedom (df).

Common uses of Chi-square distribution:
- Chi-square test for independence: To test if two categorical variables are independent.
- Chi-square goodness-of-fit test: To see how well observed data fits an expected distribution.
- Test for population variance: To test hypotheses about the variance of a normally distributed population.

Characteristics:
- Only takes positive values (≥ 0).
- Skewed to the right, but becomes more symmetric as degrees of freedom increase.

In [None]:
# 15. What is the Chi-square goodness of fit test, and how is it applied?

- The Chi-square goodness of fit test is used to determine whether an observed frequency distribution matches an expected distribution.
- It tests if the observed categorical data fits a specific theoretical distribution.
- The test compares observed counts with expected counts under the null hypothesis.

Steps to apply the test:
1. Define the null hypothesis (H0): The observed data follows the expected distribution.
2. Calculate the expected frequencies based on the theoretical distribution.
3. Compute the test statistic:

   χ² = Σ [(Observed - Expected)² / Expected]

4. Determine the degrees of freedom: df = (number of categories - 1)
5. Compare the test statistic to the critical value from the Chi-square distribution or use the p-value.
6. Make a decision:
   - If χ² > critical value or p-value < significance level, reject H0.
   - Otherwise, fail to reject H0.

Application examples:
- Testing if a dice is fair by comparing observed roll frequencies to expected equal probabilities.
- Checking if genetic traits follow expected Mendelian ratios.

In [None]:
# 16.  What is an ANOVA test, and what are its assumptions?

- ANOVA is a statistical method used to compare the means of three or more groups to see if at least one group mean is significantly different from the others.
- It analyzes the variance within groups and between groups to determine if the observed differences are statistically significant.
- Commonly used when comparing more than two groups to avoid multiple t-tests and reduce Type 1 error.

Assumptions of ANOVA:
1. Independence: Observations are independent of each other.
2. Normality: The data in each group should be approximately normally distributed.
3. Homogeneity of variances (Homoscedasticity): The variances across groups are equal.
4. The dependent variable is measured at the interval or ratio level.

In [None]:
# 17. What are the different types of ANOVA tests?

1. One-Way ANOVA:
   - Compares the means of three or more independent groups based on one factor.
   - Example: Testing if different diets affect weight loss.

2. Two-Way ANOVA:
   - Examines the effect of two independent factors on a dependent variable.
   - Can also test for interaction effects between factors.
   - Example: Studying how diet type and exercise routine together affect weight loss.

3. Repeated Measures ANOVA:
   - Used when the same subjects are measured multiple times under different conditions.
   - Accounts for the correlation between repeated measures.
   - Example: Measuring blood pressure of patients before, during, and after treatment.

4. MANOVA (Multivariate ANOVA):
   - Extends ANOVA when there are two or more dependent variables.
   - Tests differences in multiple dependent variables simultaneously.
   - Example: Testing effects of a drug on both blood pressure and cholesterol levels.

In [None]:
# 18. What is the F-test, and how does it relate to hypothesis testing?

- The F-test is used to compare two variances to determine if they come from populations with equal variances.
- It is the basis for tests like ANOVA, which compares the variance between group means to the variance within groups.
- The test statistic follows an F-distribution, which depends on two sets of degrees of freedom (numerator and denominator).

How it works:
- Calculate the ratio of two variances (variance between groups / variance within groups).
- If the ratio is significantly greater than 1, it suggests that group means are different.

Use cases:
- Testing equality of variances (e.g., Levene’s test).
- Testing differences among multiple group means (ANOVA).
- Model comparison in regression analysis.

Relation to hypothesis testing:
- Null hypothesis (H0): Variances (or group means) are equal.
- Alternative hypothesis (H1): At least one variance (or mean) is different.
- Based on the F-statistic and critical value or p-value, decide to reject or fail to reject H0.