## Main Task

> We want to test whether Arsenal's average goals per match (sample mean = 2.4) are different from a hypothesized league average (population mean, 𝜇) of 2.0 goals per match.

We have data from 38 matches of Arsenal FC for the 2023-24 season. The number of goals scored in these matches are as follows:  

Goals: [5, 5, 5, 4, 4, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 0, 0, 6, 6, 5, 4, 3, 0, 0, 2, 1, 1, 1, 1, 0]

* **Sample mean goals $\bar X$**: 2.4 goals per match. (Calculated from the dataset.)
* **Sample size (n)**: 38 matches.  
* **Population Standard Deviation (𝜎)**: approximately is 1.2 (*assumed known here*).
* **Population mean $\mu$**: approximately is 2.0

## Why Z-Test is acceptable here?  

* The sample size is large (𝑛 > 30).  
* The population standard deviation (σ) is known.

## Step By Step Solution  

#### State the Hypotheses:  
* **Null Hypothesis (H<sub>0</sub>)**: Population mean (μ) = 2.0 (Arsenal's average goals per match is equal to the league average)
* **Alternative Hypothesis (H<sub>1</sub>)**: Population mean (μ) $\neq$ 2.0 (Arsenal's average goals per match is different from the league average)  

> This is a two-tailed test because we are checking for any difference, not just an increase or decrease.


#### Determine the Significance level:  
* **Significance Level(α)**: 0.05. This level is commonly used in hypothesis testing because it provides a balance between Type I error (false positive) and Type II error (false negative). It implies a 5% risk of rejecting the null hypothesis when it is true.  


#### Calculate the Z-Score: The Z-score is calculated using the formula:  
$$ Z = \frac{\bar X - \mu}{\frac{𝜎}{\sqrt(n)}}$$  
* Plugging in the values: $$ Z = \frac{2.4 - 2.0}{\frac{1.2}{\sqrt(38)}} = {\frac{0.4}{0.1945}} ≈ 2.02 $$  


#### Determine the critical value (s):
For a two-tailed test with $\alpha = 0.05$, the critical values are approximately $± 1.96$ (from Z-tables or standard normal distribution tables).

#### Make the decision
* If $|Z| > 1.96$, we **reject** the null hypothesis.
* If $|Z| ≤ 1.96$, we **fail to reject** the null hypothesis.

In our case, $|Z| = 2.06$, which is greater than $1.96$. 

#### Conclusion
Since $|Z| > 1.96$, we reject the null hypothesis and conclude that there is a significant difference between Arsenal's average goals per match and the hypothesized league average of 2.0 goals per match.


## Python Implementation

In [8]:
import numpy as np
from scipy import stats

# Data
goals = [5, 5, 5, 4, 4, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 0, 0, 6, 6, 5, 4, 3, 0, 0, 2, 1, 1, 1, 1, 0]

# Sample statistics
sample_mean = np.mean(goals)
sample_size = len(goals)
population_mean = 2.0
population_std = 1.2

# Z-test
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))

# Two-tailed test, so we get the critical z-value for 0.025 in each tail
alpha = 0.05
critical_value = stats.norm.ppf(1 - alpha/2)

# Print the results
print(f"Sample Mean: {sample_mean}")
print(f"Z-score: {z_score}")
print(f"Critical Z-value: {critical_value}")

if abs(z_score) > critical_value:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

Sample Mean: 2.3947368421052633
Z-score: 2.0277677641345324
Critical Z-value: 1.959963984540054
Reject the null hypothesis
