Chi-Squared Goodness of Fit Test

- This test determines if an observed categorical variable follows an expected distribution.
- The null hypothesis states that the observed distribution matches the expected distribution, while the alternative hypothesis suggests a significant difference.

Chi-Squared Test for Independence

- This test assesses whether two categorical variables are associated with each other.
- The null hypothesis posits that the variables are independent, while the alternative hypothesis indicates a relationship between them.

Calculating Chi-Squared Statistic

- The chi-squared statistic is calculated by summing the squared differences between observed and expected counts, divided by the expected counts.
- This statistic is then used to calculate the p-value, helping to determine whether to reject the null hypothesis.


---

two real-life scenarios where chi-squared tests can be applied:

1. Market Research:

- Scenario: A company wants to know if customer preferences for different product colors are equally distributed.
- Application: The company collects data on the number of customers who purchased each color. Using a chi-squared goodness of fit test, they can determine 
if the observed sales match their expected distribution (e.g., equal sales for each color). If the test shows a significant difference, the company may decide to adjust their inventory based on customer preferences.

2. Healthcare Studies:

- Scenario: Researchers want to investigate if there is an association between smoking status (smoker vs. non-smoker) and the occurrence of a specific health condition (e.g., lung disease).
- Application: They collect data from a sample population and create a contingency table showing the counts of smokers and non-smokers with and without the health condition. A chi-squared test for independence can be performed to see if smoking status is associated with the health condition. A significant result would suggest a relationship that could inform public health strategies.

---

![image-2.png](attachment:image-2.png)

In [1]:
import scipy.stats as stats
observations = [650, 570, 420, 480, 510, 380, 490]
expectations = [500, 500, 500, 500, 500, 500, 500]
result = stats.chisquare(f_obs=observations, f_exp=expectations)
result

Power_divergenceResult(statistic=np.float64(97.6), pvalue=np.float64(7.943886923343835e-19))

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [7]:
import numpy as np
import scipy.stats as stats
observations = np.array([[850, 450],[1300, 900]])
result = stats.contingency.chi2_contingency(observations, correction=False)
print('\t',result)

	 Chi2ContingencyResult(statistic=np.float64(13.660757846804358), pvalue=np.float64(0.00021898310129108426), dof=1, expected_freq=array([[ 798.57142857,  501.42857143],
       [1351.42857143,  848.57142857]]))


----

# ANOVA

Understanding ANOVA

- ANOVA is a statistical technique used to test the difference of means between three or more groups, extending the t-test which compares two groups.
- It helps determine relationships between categorical variables and continuous variables, such as comparing lifespans of different butterfly species.

Types of ANOVA

- One-way ANOVA compares the means of one continuous dependent variable across three or more groups, testing a null hypothesis that states the means are equal.
- Two-way ANOVA examines the means based on two categorical variables, allowing for the analysis of interactions between these variables.

Hypothesis Testing

- In both one-way and two-way ANOVA, null and alternative hypotheses are formulated to test the relationships between variables.
- The null hypothesis typically states that there is no difference in means, while the alternative hypothesis suggests that at least one mean is different.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

---

Understanding `ANOVA` and `ANCOVA`

- ANOVA helps identify differences in a continuous Y variable across different groupings.
- ANCOVA extends ANOVA by controlling for covariates, allowing for a clearer understanding of the relationship between categorical and continuous variables.

Comparison with Regression Analysis

- Simple linear regression considers one independent variable, while multiple regression incorporates multiple factors.
- ANCOVA shares similarities with regression analysis, focusing on categorical variables while controlling for covariates.

Practical Application of ANCOVA

- ANCOVA can analyze relationships, such as book sales across genres while controlling for publication year.
- Hypothesis testing is essential, with the null hypothesis stating that means are equal across groups, and the alternative hypothesis suggesting they are not.

Overall, these statistical techniques are vital tools for data analytics professionals to draw accurate conclusions from data.

![image.png](attachment:image.png)

---

`MANOVA` and `MANCOVA` Overview

- MANOVA (Multivariate Analysis of Variance) extends ANOVA to compare multiple continuous outcome variables based on categorical independent variables.
- MANCOVA (Multivariate Analysis of Covariance) is similar but controls for additional covariates while analyzing the relationship between outcome variables and categorical variables.

Hypothesis Testing with MANOVA

- In a one-way MANOVA example, the null hypothesis states that the means of continuous variables (e.g., books sold and profits) are equal across different book genres.
- The alternative hypothesis suggests that at least one genre differs in terms of sales or profits.

Using MANCOVA for Control

- MANCOVA allows for the examination of the relationship between categorical variables (e.g., book genre) and continuous variables (e.g., sales and profits) while controlling for a covariate (e.g., author popularity).
- The null hypothesis in this case posits that sales and profits are equal across genres, regardless of the author's social media following.

----

MANOVA (Multivariate Analysis of Variance) and MANCOVA (Multivariate Analysis of Covariance) are both statistical methods used to analyze the relationship between multiple dependent variables and one or more independent variables. 

Here are the key differences:

MANOVA

- Purpose: 
    - Compares the means of multiple dependent variables across different groups defined by categorical independent variables.
    - Control for Covariates: Does not control for any additional variables; it only examines the effect of the independent variables on the dependent variables.

- Example Use Case: Analyzing how different teaching methods (independent variable) affect students' scores in math and science (dependent variables).

MANCOVA

- Purpose: 
    - Similar to MANOVA, but it also controls for one or more covariates (continuous variables) that may influence the dependent variables.
    - Control for Covariates: Adjusts the analysis for the effects of covariates, allowing for a clearer understanding of the relationship between the independent and dependent variables.
- Example Use Case: Analyzing how different teaching methods affect students' scores in math and science while controlling for students' prior knowledge (covariate).

In summary, the main difference lies in MANCOVA's ability to control for additional covariates, providing a more nuanced analysis of the data.