## Main Task

>  We are interested in testing whether the average number of goals scored by Arsenal FC in home matches is significantly different from the average number of goals scored in away matches.  

We have data from 38 matches of Arsenal FC for the 2023-24 season. The number of home and away goals scored in these matches are as follows:  

home_goals = [2, 2, 3, 2, 1, 5, 2, 1, 3, 2, 4, 5, 4, 2, 5, 4, 2, 0, 3, 2, 3, 5, 2]  
away_goals = [1, 1, 1, 1, 0, 4, 1, 1, 1, 3, 0, 0, 1, 6, 5, 6, 2, 0, 2, 1, 2, 3, 1]

* **Sample mean (home_goals) $\bar X$**: 2.78 goals per home game.
* **Sample mean (away_goals) $\bar X$**: 1.86 goals per away game. 

## Why T-Test is acceptable here?

* We are comparing the means of two independent groups: goals scored in home matches vs. goals scored in away matches.  
* The sample sizes are relatively small (less than 30), and we do not know the population standard deviations.  
* The T-test is used to determine if there is a significant difference between the means of two groups, assuming the data follows a normal distribution.

## Step By Step Solution

#### State the Hypotheses:  
* **Null Hypothesis (H<sub>0</sub>)**: μ<sub>Home</sub> = μ<sub>Away</sub>  
(The average number of goals scored in home matches is equal to the average number of goals scored in away matches)

* **Alternative Hypothesis (H<sub>1</sub>)**: μ<sub>Home</sub> $\neq$ μ<sub>Away</sub>  
(Arsenal's average goals per match is different from the league average)  

> This is a two-tailed test because we are checking for any difference, not just an increase or decrease.

#### Choose the significance level ($\alpha$)
Common significance levels are 0.05, 0.01, and 0.10. We'll use α = 0.05 because it balances the risk of Type I and Type II errors effectively.

#### Calculate the Test statistic:
The test statistic for a two-sample T-test is calculated as:

$$
t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}
$$

where:
- $\bar{X}_1$ and $\bar{X}_2$ are the sample means of the two groups.
- $s_1$ and $s_2$ are the sample standard deviations of the two groups.
- $n_1$ and $n_2$ are the sample sizes of the two groups. 

The sample standard deviation of home goals $(s_1)$ is calculated as:

$$
s_1 = \sqrt{\frac{\sum (x_i - \bar{X}_1)^2}{n_1 - 1}}
$$

where:
- $x_i$ are the individual sample points.
- $\bar{X}_1$ is the sample mean of home goals.
- $n_1$ is the sample size of home goals.

##### Calculation of $s_1$(Sample Standard Deviation for Home Goals)
$$
s_1 = \sqrt{\frac{(2-2.7826)^2 + (2-2.7826)^2 + \cdots + (2-2.7826)^2}{23 - 1}} \approx 1.4572
$$

##### Calculation of $s_2$(Sample Standard Deviation for Home Goals)
The sample standard deviation of away goals $(s_2)$ is calculated as:

$$
s_2 = \sqrt{\frac{\sum (x_i - \bar{X}_2)^2}{n_2 - 1}}
$$

where:
- $x_i$ are the individual sample points.
- $\bar{X}_2$ is the sample mean of away goals.
- $n_2$ is the sample size of away goals.

$$
s_2 = \sqrt{\frac{(1-1.8696)^2 + (1-1.8696)^2 + \cdots + (1-1.8696)^2}{23 - 1}} \approx 1.7487
$$

##### Calculation of the Test statistic t
The test statistic \( t \) is calculated as:

$$
t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}
$$

Using the given sample means, sample standard deviations, and sample sizes:

$$
t = \frac{2.7826 - 1.8696}{\sqrt{\frac{1.4572^2}{23} + \frac{1.7487^2}{23}}} = \frac{0.913}{\sqrt{\frac{2.1235}{23} + \frac{3.0562}{23}}} = \frac{0.913}{\sqrt{0.0923 + 0.1329}} = \frac{0.913}{\sqrt{0.2252}} \approx \frac{0.913}{0.4746} \approx 1.922
$$

#### Calculate the degree of freedom
The degrees of freedom for the two-sample T-test is calculated as:

$$
df = \frac{\left( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} \right)^2}{\frac{\left( \frac{s_1^2}{n_1} \right)^2}{n_1 - 1} + \frac{\left( \frac{s_2^2}{n_2} \right)^2}{n_2 - 1}}
$$

Using the given sample standard deviations and sample sizes:

$$
s_1 = 1.4572, \quad n_1 = 23
$$

$$
s_2 = 1.7487, \quad n_2 = 23
$$

The degrees of freedom \(df\) can be calculated as:

$$
df = \frac{\left( \frac{1.4572^2}{23} + \frac{1.7487^2}{23} \right)^2}{\frac{\left( \frac{1.4572^2}{23} \right)^2}{23 - 1} + \frac{\left( \frac{1.7487^2}{23} \right)^2}{23 - 1}}
$$

First, calculate the numerator:

$$
\text{Numerator} = \left( \frac{1.4572^2}{23} + \frac{1.7487^2}{23} \right)^2
$$

$$
\text{Numerator} = \left( \frac{2.1235}{23} + \frac{3.0562}{23} \right)^2
$$

$$
\text{Numerator} = \left( 0.0923 + 0.1329 \right)^2
$$

$$
\text{Numerator} = \left( 0.2252 \right)^2
$$

$$
\text{Numerator} = 0.0507
$$

Next, calculate the denominator:

$$
\text{Denominator} = \frac{\left( \frac{1.4572^2}{23} \right)^2}{23 - 1} + \frac{\left( \frac{1.7487^2}{23} \right)^2}{23 - 1}
$$

$$
\text{Denominator} = \frac{\left( 0.0923 \right)^2}{22} + \frac{\left( 0.1329 \right)^2}{22}
$$

$$
\text{Denominator} = \frac{0.0085}{22} + \frac{0.0177}{22}
$$

$$
\text{Denominator} = 0.0004 + 0.0008
$$

$$
\text{Denominator} = 0.0012
$$

Finally, calculate the degrees of freedom:

$$
df = \frac{0.0507}{0.0012} \approx 42.25
$$

Since \(df\) is not an integer, we can round it to the nearest whole number or use it as it is in software tools. For practical purposes, we'll use df $\approx 42$.

#### Determine the Critical Value
For α = 0.05 and two-tailed test, the critical value for 𝑑𝑓 = 63 from T-distribution table is approximately 2.000.

#### Make the decision
Since $|t| = 1.922$ is less than 2.000, we **fail to reject the null hypothesis**.


## Python Implementation

In [5]:
import numpy as np
from scipy.stats import ttest_ind

# Data: Home and away goals
home_goals = [2, 2, 3, 2, 1, 5, 2, 1, 3, 2, 4, 5, 4, 2, 5, 4, 2, 0, 3, 2, 3, 5, 2]
away_goals = [1, 1, 1, 1, 0, 4, 1, 1, 1, 3, 0, 0, 1, 6, 5, 6, 2, 0, 2, 1, 2, 3, 1]

# Calculate sample statistics
home_mean = np.mean(home_goals)
away_mean = np.mean(away_goals)
home_std = np.std(home_goals, ddof=1)  # ddof=1 for sample standard deviation
away_std = np.std(away_goals, ddof=1)

# Calculate the T-test
t_stat, p_value = ttest_ind(home_goals, away_goals)

# Output the results
print(f"Home Mean: {home_mean}")
print(f"Away Mean: {away_mean}")
print(f"Home Std Dev: {home_std}")
print(f"Away Std Dev: {away_std}")
print(f"T-Statistic: {t_stat}")
print(f"P-Value: {p_value}")

Home Mean: 2.782608695652174
Away Mean: 1.8695652173913044
Home Std Dev: 1.4128154270215338
Away Std Dev: 1.8166990000869694
T-Statistic: 1.902668981649552
P-Value: 0.06364057049758211
