## Question 1
A researcher gathers information about the patterns of Physical Activity of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (Low, Medium, High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (Yes = consumed, No = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table4.png)

In [15]:
#1. State the Hypothesis
#Null Hypothesis (Ho): There is no association between physical activity patterns and sugary drink consumption.
#Alternative Hypothesis (Ha): There is an association between physical activity patterns and sugary drink consumption.

import numpy as np
from scipy.stats import chi2_contingency, chi2

#observed data 
data = np.array ([
    [32, 12], #low
    [14, 22], #Medium
    [6, 9] #high
])

# Perform Chi-Square Test of Independence
chi2_stat, p_value, dof, expected = chi2_contingency(data)

# Display the results
print("Chi-Square Statistic:", chi2_stat)
print("Degrees of Freedom:", dof)
print("Expected Frequencies:")
print(expected)
print("P-Value (from chi2_contingency):", p_value)

# Conclusion
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant association between patterns of physical activity and the consumption of sugary drinks for the children of this school")
else:
    print("Fail to reject the null hypothesis: No significant association between patterns of physical activity and the consumption of sugary drinks for the children of this school.")

Chi-Square Statistic: 10.712198008709638
Degrees of Freedom: 2
Expected Frequencies:
[[24.08421053 19.91578947]
 [19.70526316 16.29473684]
 [ 8.21052632  6.78947368]]
P-Value (from chi2_contingency): 0.004719280137040844
Reject the null hypothesis: There is a significant association between patterns of physical activity and the consumption of sugary drinks for the children of this school


## [OPTIONAL] Question 2
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

Check [here](https://www.geeksforgeeks.org/how-to-create-a-poisson-probability-mass-function-plot-in-python/) how to create a poisson distribution and how to calculate the expected observations, using the probability mass function (pmf). 
A Poisson distribution is a discrete probability distribution. It gives the probability of an event happening a certain number of times (k) within a given interval of time or space. The Poisson distribution has only one parameter, λ (lambda), which is the mean number of events.

In [25]:
#1 State the hypotesis
#Null Hypothesis (Ho): The observed number of scores follows a Poisson distribution with 𝜆 = 2.435
#Alternative Hypothesis (Ha): The observed number of scores does not follow a Poisson distribution with 𝜆 = 2.435
import math
import pandas as pd
from scipy.stats import chi2

# Given data
observed_data = [35, 99, 104, 110, 62, 25, 10, 3]  # Last value includes "7 or more"
total_observations = 448
mean_lambda = 2.435

expected_data = []
for k in range(len(observed_data) - 1):  # For 0 to 6
    poisson_prob = (mean_lambda**k * math.exp(-mean_lambda)) / math.factorial(k)
    expected_data.append(poisson_prob * total_observations)

# Group "7 or more"
poisson_prob_tail = 1 - sum((mean_lambda**j * math.exp(-mean_lambda)) / math.factorial(j) for j in range(7))
expected_data.append(poisson_prob_tail * total_observations)

# Combine data into a DataFrame for better visualization
df = pd.DataFrame({
    "Number of Scores": ["0", "1", "2", "3", "4", "5", "6" , "7 or more"],
    "Observed": observed_data,
    "Expected": expected_data
})

# Calculate chi-square statistic
df["Chi-Square"] = (df["Observed"] - df["Expected"])**2 / df["Expected"]
chi_square_stat = df["Chi-Square"].sum()

# Degrees of freedom
degrees_of_freedom = len(df) - 1 - 1  # Categories - 1 - Estimated Parameter (mean)

# p-value
p_value = chi2.sf(chi_square_stat, degrees_of_freedom)

# Display results
print(df)
print(f"Chi-Square Statistic: {chi_square_stat}")
print(f"Degrees of Freedom: {degrees_of_freedom}")
print(f"P-Value: {p_value}")

# Conclusion
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The data does not follow a Poisson distribution.")
else:
    print("Fail to reject the null hypothesis: The data follows a Poisson distribution.")


  Number of Scores  Observed    Expected  Chi-Square
0                0        35   39.243791    0.458920
1                1        99   95.558630    0.123935
2                2       104  116.342632    1.309413
3                3       110   94.431437    2.566732
4                4        62   57.485137    0.354596
5                5        25   27.995262    0.320468
6                6        10   11.361410    0.163135
7        7 or more         3    5.581701    1.194113
Chi-Square Statistic: 6.491310681109779
Degrees of Freedom: 6
P-Value: 0.37045709484106654
Fail to reject the null hypothesis: The data follows a Poisson distribution.
