## Question 1
A researcher gathers information about the patterns of Physical Activity of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (Low, Medium, High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (Yes = consumed, No = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table4.png)

In [1]:
import numpy as np
from scipy.stats import chi2_contingency

# Contingency table 
data = np.array([[32, 12],  # Low Physical Activity
                 [14, 22],  # Medium Physical Activity
                 [6, 9]])   # High Physical Activity

# Perform Chi-Square test of independence
chi2, p, dof, expected = chi2_contingency(data)

# Output the results
print(f"Chi-Square Statistic: {chi2}")
print(f"P-value: {p}")
print(f"Degrees of Freedom: {dof}")
print("Expected Frequencies Table:")
print(expected)

# Decision based on p-value
alpha = 0.05  # 5% significance level
if p < alpha:
    print("Reject the null hypothesis: There is a significant association between physical activity and sugary drink consumption.")
else:
    print("Fail to reject the null hypothesis: There is no significant association between physical activity and sugary drink consumption.")

Chi-Square Statistic: 10.712198008709638
P-value: 0.004719280137040844
Degrees of Freedom: 2
Expected Frequencies Table:
[[24.08421053 19.91578947]
 [19.70526316 16.29473684]
 [ 8.21052632  6.78947368]]
Reject the null hypothesis: There is a significant association between physical activity and sugary drink consumption.


## [OPTIONAL] Question 2
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

Check [here](https://www.geeksforgeeks.org/how-to-create-a-poisson-probability-mass-function-plot-in-python/) how to create a poisson distribution and how to calculate the expected observations, using the probability mass function (pmf). 
A Poisson distribution is a discrete probability distribution. It gives the probability of an event happening a certain number of times (k) within a given interval of time or space. The Poisson distribution has only one parameter, λ (lambda), which is the mean number of events.

In [2]:
from scipy.stats import poisson, chisquare

# Given data: Number of scores and corresponding number of times
observed = [35, 99, 104, 110, 62, 25, 10, 3]  # Last value is for 7 or more scores
total_observations = 448  # Total number of observations
lambda_value = 2.435  # Given Poisson mean (lambda)

# Calculate the expected frequencies using Poisson distribution
expected = []
for i in range(7):  # For scores 0 to 6
    expected.append(poisson.pmf(i, lambda_value) * total_observations)

# Combine the probabilities for "7 or more" into one category
expected_7_or_more = (1 - poisson.cdf(6, lambda_value)) * total_observations
expected.append(expected_7_or_more)

# Perform the Chi-Square test
chi2_stat, p_value = chisquare(f_obs=observed, f_exp=expected)

# Output the results
print(f"Chi-Square Statistic: {chi2_stat}")
print(f"P-value: {p_value}")

# Decision based on p-value
alpha = 0.05  # 5% significance level
if p_value < alpha:
    print("Reject the null hypothesis: The data does not follow a Poisson distribution.")
else:
    print("Fail to reject the null hypothesis: The data follows a Poisson distribution.")

Chi-Square Statistic: 6.491310681109786
P-value: 0.4836889068537311
Fail to reject the null hypothesis: The data follows a Poisson distribution.
