# ABC of Statistics for Data Science and Machine Learning | (Day-13)

Choosing the appropriate statistical test for hypothesis testing in Python depends on several factors, including the type of data, the distribution of the data, the sample size, and the research question. Below is a guide to help you select the right test based on different scenarios, followed by Python examples for each test.

### 1. **Determine the Type of Data**

- **Numerical (Continuous):** Data that can take any value within a range (e.g., height, weight).
- **Categorical:** Data that can be divided into distinct categories (e.g., gender, race).

### 2. **Identify the Number of Groups or Variables**

- **One Group:** Testing against a known value or distribution.
- **Two Groups:** Comparing two independent or related samples.
- **More than Two Groups:** Comparing multiple independent or related samples.

### 3. **Consider the Data Distribution**

- **Parametric Tests:** Assume that the data follows a specific distribution, usually normal.
- **Non-Parametric Tests:** Do not assume a specific distribution.

### 4. **Check Sample Size**

- **Small Sample Size:** May require non-parametric tests.
- **Large Sample Size:** Parametric tests can often be used.

### 5. **Common Hypothesis Tests and When to Use Them**

| **Scenario**                                    | **Test**                                  | **Parametric/Non-Parametric** | **Python Implementation**                                      |
|-------------------------------------------------|-------------------------------------------|-------------------------------|----------------------------------------------------------------|
| **1. Testing if a sample mean is equal to a known value** | One-sample t-test                         | Parametric                    | `stats.ttest_1samp(data, popmean)`                             |
| **2. Testing if a sample median is equal to a known value** | One-sample Wilcoxon signed-rank test      | Non-Parametric                | `stats.wilcoxon(data - popmedian)`                             |
| **3. Comparing means of two independent groups** | Two-sample t-test (independent)           | Parametric                    | `stats.ttest_ind(data1, data2)`                                |
| **4. Comparing medians of two independent groups** | Mann-Whitney U test                       | Non-Parametric                | `stats.mannwhitneyu(data1, data2)`                             |
| **5. Comparing means of two related groups**    | Paired t-test                             | Parametric                    | `stats.ttest_rel(data1, data2)`                                |
| **6. Comparing medians of two related groups**  | Wilcoxon signed-rank test                 | Non-Parametric                | `stats.wilcoxon(data1, data2)`                                 |
| **7. Comparing means of more than two independent groups** | ANOVA (Analysis of Variance)              | Parametric                    | `stats.f_oneway(data1, data2, ...)`                            |
| **8. Comparing medians of more than two independent groups** | Kruskal-Wallis test                       | Non-Parametric                | `stats.kruskal(data1, data2, ...)`                             |
| **9. Testing correlation between two variables** | Pearson correlation                       | Parametric                    | `stats.pearsonr(x, y)`                                         |
| **10. Testing rank correlation between two variables** | Spearman correlation                      | Non-Parametric                | `stats.spearmanr(x, y)`                                        |
| **11. Testing if a sample follows a specific distribution** | Shapiro-Wilk test (Normality)             | Parametric                    | `stats.shapiro(data)`                                          |
| **12. Testing if two samples follow the same distribution** | Kolmogorov-Smirnov test                   | Non-Parametric                | `stats.kstest(data1, data2)`                                   |
| **13. Testing association between categorical variables** | Chi-square test                           | Non-Parametric                | `stats.chi2_contingency(table)`                                |

### 6. **Step-by-Step Guide for Choosing the Test**

1. **Identify Your Research Question:** What are you trying to prove or compare? Are you comparing means, medians, or testing for an association?
   
2. **Understand the Type of Data:** 
   - Is your data continuous or categorical?
   - How many groups or variables are you comparing?

3. **Check Data Distribution:**
   - If the data is normally distributed, use parametric tests.
   - If the data is not normally distributed, use non-parametric tests.

4. **Sample Size Considerations:**
   - For small samples, prefer non-parametric tests.
   - For larger samples, parametric tests are generally more powerful.

5. **Run the Appropriate Test in Python:** 
   - Import `scipy.stats` or `statsmodels` depending on the test.
   - Use the appropriate function to conduct the test.

### Example Implementations in Python

1. **One-Sample t-test:**

   ```python
   import scipy.stats as stats

   data = [2.3, 2.5, 2.8, 2.1, 2.7]
   popmean = 2.5
   t_statistic, p_value = stats.ttest_1samp(data, popmean)
   ```

2. **Mann-Whitney U Test:**

   ```python
   import scipy.stats as stats

   data1 = [2.3, 2.5, 2.8, 2.1, 2.7]
   data2 = [3.1, 3.3, 3.2, 3.0, 3.4]
   u_statistic, p_value = stats.mannwhitneyu(data1, data2)
   ```

3. **Chi-Square Test:**

   ```python
   import scipy.stats as stats

   # Contingency table
   table = [[10, 20, 30], [6, 9, 17]]
   chi2, p, dof, expected = stats.chi2_contingency(table)
   ```

By following these steps and using the appropriate tests in Python, you can conduct hypothesis testing accurately and draw meaningful conclusions from your data.