## Question 1
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

In [3]:
# your answer here
from scipy.stats import poisson
from scipy.stats import chisquare
import numpy as np

observed = np.array([35, 99, 104, 110, 62, 25, 10, 3])
total_observed = np.sum(observed)
lambda_param = 2.435
expected = np.array([poisson.pmf(i, lambda_param) * total_observed for i in range(len(observed))])
expected *= total_observed / np.sum(expected)
chi2_stat, p_value = chisquare(observed, f_exp=expected)
alpha = 0.05

if p_value < alpha:
    print(f"At the {alpha} significance level, reject the null hypothesis.")
    print("The number of scores does not follow a Poisson distribution.")
else:
    print(f"At the {alpha} significance level, fail to reject the null hypothesis.")
    print("There is no significant evidence to suggest that the number of scores does not follow a Poisson distribution.")

print(f"Chi-square statistic: {chi2_stat}")
print(f"P-value: {p_value}")



At the 0.05 significance level, fail to reject the null hypothesis.
There is no significant evidence to suggest that the number of scores does not follow a Poisson distribution.
Chi-square statistic: 5.5005589331731475
P-value: 0.5991164535654854


## BONUS/OPTIONAL - Question 2
Let's analyze a discrete distribution. To analyze the number of defective items in a factory in the city of Medellín, we took a random sample of n = 60 articles and observed the number of defectives in the following table:

![](table2.png)

A poissón distribution was proposed since it is defined for x = 0,1,2,3, .... using the following model:

![](image1.png)

For some extra insights check the following link: https://online.stat.psu.edu/stat504/node/63/ 

Does the distribution of defective items follow this distribution?

In [11]:
# your code here
observed_freq = [27, 21, 7, 4, 1]
defective_items = [0, 1, 2, 3, 4]
n = 60
lambda_param = sum([x * f for x, f in zip(defective_items, observed_freq)]) / n
expected_freq = [poisson.pmf(x, lambda_param) * n for x in range(len(observed_freq) - 1)]
expected_freq.append(n - sum(expected_freq))  # Last category for 4 or more defective items
chi2_stat, p_value = chisquare(observed_freq, f_exp=expected_freq)
alpha = 0.05

if p_value < alpha:
    print(f"At the {alpha} significance level, reject the null hypothesis.")
    print("The distribution of defective items does not follow a Poisson distribution with the proposed model.")
else:
    print(f"At the {alpha} significance level, fail to reject the null hypothesis.")
    print("There is no significant evidence to suggest that the distribution of defective items does not follow a Poisson distribution with the proposed model.")

print(f"Chi-square statistic: {chi2_stat}")
print(f"P-value: {p_value}")


At the 0.05 significance level, fail to reject the null hypothesis.
There is no significant evidence to suggest that the distribution of defective items does not follow a Poisson distribution with the proposed model.
Chi-square statistic: 1.5398143337634154
P-value: 0.8195662273210649


## Question 3
A quality control engineer takes a sample of 10 tires that come out of an assembly line, and would like to verify on the basis of the data that follows, if the number of tires with defects observed over 200 days, if it is true that 5% of all tires have defects (that is, if the sample comes from a binomial population with n = 10 and p = 0.05). 

![](table3.png)


In [9]:
# your answer here
from scipy.stats import binom

observed_freq = [8, 35, 64, 62, 25, 6]
n = 10
p = 0.05
total_days = 200
expected_freq = [binom.pmf(i, n, p) * total_days for i in range(len(observed_freq)-1)]
expected_freq_last = total_days - sum(expected_freq)
expected_freq.append(expected_freq_last)
chi2_stat, p_value = chisquare(observed_freq, f_exp=expected_freq)
alpha = 0.05

if p_value < alpha:
    print(f"At the {alpha} significance level, reject the null hypothesis.")
    print("The observed data does not fit a binomial distribution with n=10 and p=0.05.")
else:
    print(f"At the {alpha} significance level, fail to reject the null hypothesis.")
    print("The observed data fits a binomial distribution with n=10 and p=0.05.")

print(f"Chi-square statistic: {chi2_stat}")
print(f"P-value: {p_value}")



At the 0.05 significance level, reject the null hypothesis.
The observed data does not fit a binomial distribution with n=10 and p=0.05.
Chi-square statistic: 7994.391382369575
P-value: 0.0


## Question 4
A researcher gathers information about the patterns of Physical Activity of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (Low, Medium, High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (Yes = consumed, No = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table4.png)

In [8]:
#your answer here
from scipy.stats import chi2_contingency

observed = [[30, 20],
            [40, 35],
            [25, 45]]

chi2, p_value, dof, expected = chi2_contingency(observed)
alpha = 0.05

if p_value < alpha:
    print(f"At the {alpha} significance level, reject the null hypothesis.")
    print("There is evidence of an association between patterns of physical activity and sugary drink consumption.")
else:
    print(f"At the {alpha} significance level, fail to reject the null hypothesis.")
    print("There is no significant evidence of an association between patterns of physical activity and sugary drink consumption.")

print(f"Chi-square statistic: {chi2}")
print(f"P-value: {p_value}")


At the 0.05 significance level, reject the null hypothesis.
There is evidence of an association between patterns of physical activity and sugary drink consumption.
Chi-square statistic: 7.924624060150377
P-value: 0.019019090709576122
