# Lecture - 06

- Anova Test
- Co varience
- Pearson Correlation Coefficient
- Spearman Rank Correlation Coefficient

## ANOVA (Analysis of Variance) - F Test
ANOVA is a statistical technique used to determine whether there are significant differences between the means of three or more groups. It helps in testing hypotheses about population means. There are different types of ANOVA tests based on the experimental design and the number of factors involved.

### One-Way ANOVA

**One-Way ANOVA** is used when you have one independent variable (factor) with multiple levels (groups) and you want to test if there are significant differences in the means of these groups.

### Real-Time Scenario for One-Way ANOVA

**Scenario:**
A company wants to evaluate the effectiveness of three different training programs on employee performance. The performance of employees is measured after each training program.

**Groups:**
- **Training Program A**
- **Training Program B**
- **Training Program C**

**Objective:**
Determine if there is a significant difference in performance scores among the three training programs.

In [1]:
import numpy as np
import scipy.stats as stats

# Data: Performance scores for each training program
scores_A = np.array([85, 88, 90, 92, 87])
scores_B = np.array([78, 80, 82, 81, 79])
scores_C = np.array([91, 93, 95, 94, 96])

# Perform One-Way ANOVA
f_statistic, p_value = stats.f_oneway(scores_A, scores_B, scores_C)

print("F-Statistic:", f_statistic)
print("P-Value:", p_value)

# Interpretation
if p_value < 0.05:
    print("There is a significant difference in performance scores among the training programs.")
else:
    print("There is no significant difference in performance scores among the training programs.")

F-Statistic: 53.73333333333322
P-Value: 1.0270864474090559e-06
There is a significant difference in performance scores among the training programs.


### Two-Way ANOVA

**Two-Way ANOVA** is used when you have two independent variables (factors) and you want to understand how each factor and their interaction affect the dependent variable.

#### Real-Time Scenario for Two-Way ANOVA

**Scenario:**
A researcher wants to study the impact of two factors on the yield of a crop. The two factors are:
- **Fertilizer Type:** Organic vs. Chemical
- **Watering Frequency:** Daily vs. Weekly

**Objective:**
Determine if there is a significant effect of fertilizer type, watering frequency, and their interaction on the crop yield.


In [2]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Data: Crop yield for each combination of fertilizer type and watering frequency
data = pd.DataFrame({
    'Yield': [20, 22, 21, 23, 19, 21, 20, 22, 21, 23],
    'Fertilizer': ['Organic', 'Organic', 'Organic', 'Organic', 'Organic', 
                   'Chemical', 'Chemical', 'Chemical', 'Chemical', 'Chemical'],
    'Watering': ['Daily', 'Weekly', 'Daily', 'Weekly', 'Daily', 
                 'Daily', 'Weekly', 'Daily', 'Weekly', 'Weekly']
})

# Fit the model
model = ols('Yield ~ C(Fertilizer) * C(Watering)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)

# Interpretation
if anova_table['PR(>F)']['C(Fertilizer)'] < 0.05:
    print("There is a significant effect of fertilizer type on crop yield.")
else:
    print("There is no significant effect of fertilizer type on crop yield.")

if anova_table['PR(>F)']['C(Watering)'] < 0.05:
    print("There is a significant effect of watering frequency on crop yield.")
else:
    print("There is no significant effect of watering frequency on crop yield.")

if anova_table['PR(>F)']['C(Fertilizer):C(Watering)'] < 0.05:
    print("There is a significant interaction effect between fertilizer type and watering frequency on crop yield.")
else:
    print("There is no significant interaction effect between fertilizer type and watering frequency on crop yield.")

                             sum_sq   df         F    PR(>F)
C(Fertilizer)              0.066667  1.0  0.052174  0.826909
C(Watering)                3.266667  1.0  2.556522  0.160956
C(Fertilizer):C(Watering)  4.266667  1.0  3.339130  0.117422
Residual                   7.666667  6.0       NaN       NaN
There is no significant effect of fertilizer type on crop yield.
There is no significant effect of watering frequency on crop yield.
There is no significant interaction effect between fertilizer type and watering frequency on crop yield.


### Explanation:

1. **One-Way ANOVA:**
   - **Define Data:** Performance scores from each training program.
   - **Perform ANOVA:** Use `stats.f_oneway()` to test for differences between the means of the three groups.
   - **Interpret Results:** Check the P-value to determine if there are significant differences between the groups.

2. **Two-Way ANOVA:**
   - **Define Data:** Create a DataFrame with crop yield data, fertilizer types, and watering frequencies.
   - **Fit Model:** Use `statsmodels` to fit the ANOVA model with interaction.
   - **Perform ANOVA:** Generate an ANOVA table to see the effects of each factor and their interaction.
   - **Interpret Results:** Check the P-values for the main effects and interaction to determine if they are significant.

### Summary

- **One-Way ANOVA:** Tests differences between the means of three or more independent groups based on one factor.
- **Two-Way ANOVA:** Tests the effects of two independent factors on a dependent variable and their interaction, applicable when you have two categorical factors.

Both ANOVA tests are powerful tools for analyzing experimental data and understanding the effects of different factors on outcomes.

## Covariance
Covariance measures how two variables change together. A positive covariance indicates that the variables increase together, while a negative covariance indicates that one variable tends to increase when the other decreases.

In [3]:
import numpy as np

# Example data
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 4, 3, 2, 1])

# Calculate covariance matrix
cov_matrix = np.cov(x, y, ddof=0)
cov_xy = cov_matrix[0, 1]

print("Covariance:", cov_xy)

Covariance: -2.0


## Pearson Correlation Coefficient
The Pearson correlation coefficient measures the linear relationship between two variables. It ranges from -1 to 1, where 1 means a perfect positive linear relationship, -1 means a perfect negative linear relationship, and 0 means no linear relationship.

In [4]:
import numpy as np

# Example data
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 4, 3, 2, 1])

# Calculate Pearson correlation coefficient
pearson_corr = np.corrcoef(x, y)[0, 1]

print("Pearson Correlation Coefficient:", pearson_corr)

Pearson Correlation Coefficient: -0.9999999999999999


## Spearman Rank Correlation Coefficient
The Spearman rank correlation coefficient measures the strength and direction of the association between two ranked variables. It is a non-parametric measure, meaning it does not assume a linear relationship or normally distributed data.

In [5]:
from scipy.stats import spearmanr

# Example data
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 4, 3, 2, 1])

# Calculate Spearman rank correlation coefficient
spearman_corr, _ = spearmanr(x, y)

print("Spearman Rank Correlation Coefficient:", spearman_corr)

Spearman Rank Correlation Coefficient: -0.9999999999999999


#### Prepared By,
Ahamed Basith