## Main Task

> We want to investigate if there is a significant association between the type of outcome (Win, Draw, Loss) and whether a match is played at home or away for Arsenal FC.

We collect data from 30 matches and categorize the outcomes as follows:

|          | Win | Draw | Loss | Total |
|----------|-----|------|------|-------|
| Home     |  15 |    2 |    2 |    19 |
| Away     |  13 |    3 |    3 |    19 |
| **Total**|  28 |    5 |    5 |    38 |

## Why Chi-Square Test For Independence?

* **Data Type:** We have categorical data (match outcomes and match location).
* **Objective:** We want to determine if there is a significant association between two categorical variables (outcome type and match location).
* **Frequency Counts:** The data consists of frequency counts for different categories.

## Step By Step Solution
#### State the Hypotheses:
* **Null Hypothesis ($H_0$)**: The type of outcome (Win, Draw, Loss) is independent of the match location (Home, Away).
* **Alternative Hypothesis($H_1$)**: The type of outcome (Win, Draw, Loss) is dependent on the match location (Home, Away).

#### Observed Frequencies:
|          | Win | Draw | Loss | Total |
|----------|-----|------|------|-------|
| Home     |  15 |    2 |    2 |    19 |
| Away     |  13 |    3 |    3 |    19 |
| **Total**|  28 |    5 |    5 |    38 |

#### Expected Frequencies:
To calculate the expected frequencies, we use the formula: $$ E_{ij} = \frac{ (Row Total_{i})  × (Column Total_{j}) } {Grand Total} $$
#### Calculating each expected frequency:
$$E_{home,win} = \frac{19 × 28}{38} = 14$$
$$E_{home,draw} = \frac{19 × 5}{38} = 2.5$$
$$E_{home,loss} = \frac{19 × 5}{38} = 2.5$$
$$E_{away,win} = \frac{19 × 28}{38} = 14$$
$$E_{away,draw} = \frac{19 × 5}{38} = 2.5$$
$$E_{away,loss} = \frac{19 × 5}{38} = 2.5$$

The expected frequency table is:
|          | Win | Draw | Loss | Total |
|----------|-----|------|------|-------|
| Home     |  14 |  2.5 |  2.5 |    19 |
| Away     |  14 |  2.5 |  2.5 |    19 |
| **Total**|  28 |    5 |    5 |    38 |

#### Calculating the Chi-Square Statistic
The Chi-Square statistic is calculated using the formula: $$ X^2 = \sum\frac{(O_{ij} - E_{ij})^2}{E_{ij}} $$
##### Plugging in the observed and expected frequencies: 
$$ X^2 = \frac{(15 - 14)^2}{14} + \frac{(2 - 2.5)^2}{2.5} + \frac{(2 - 2.5)^2}{2.5} + \frac{(13 - 14)^2}{14} + \frac{(3 - 2.5)^2}{2.5} + \frac{(3 - 2.5)^2}{2.5} = 0.5428$$

#### Degrees of Freedom and Critical Value
Degrees of freedom for the Chi-Square Test for Independence is given by: $$ df = (r - 1) × (c - 1) $$
where: 
* $r$ is the number of rows.
* $c$ is the number of columns.

In our case: $$ df = (2 - 1) × (3 - 1) = 1 × 2 = 2 $$

#### 

Using a significance level ($\alpha$) of $0.05$, the critical value from the Chi-Square distribution table for 2 degrees of freedom is approximately $5.991$.

#### Decision
We compare the calculated Chi-Square statistic $X^2 = 0.5428$ with the critical value $5.991$.

Since $0.5428 < 5.991$, we **fail to reject** the null hypothesis.  
There is not enough evidence to suggest that the type of outcome is dependent on the match location.

## Python Implementation


In [1]:
import numpy as np
import scipy.stats as stats

# Observed frequencies
observed = np.array([[15, 2, 2],
                     [13, 3, 3]])

# Perform the Chi-Square test
chi2, p, dof, expected = stats.chi2_contingency(observed)

# Output the results
print("Chi-Square Statistic:", chi2)
print("p-value:", p)
print("Degrees of Freedom:", dof)
print("Expected Frequencies:\n", expected)

# Interpretation
alpha = 0.05
if p < alpha:
    print("Reject the null hypothesis (H₀). There is a significant association between match location and match outcome.")
else:
    print("Fail to reject the null hypothesis (H₀). There is no significant association between match location and match outcome.")


Chi-Square Statistic: 0.5428571428571428
p-value: 0.7622897307899537
Degrees of Freedom: 2
Expected Frequencies:
 [[14.   2.5  2.5]
 [14.   2.5  2.5]]
Fail to reject the null hypothesis (H₀). There is no significant association between match location and match outcome.
