## Question 1
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

In [19]:
import pandas as pd
import numpy as np
import scipy.stats as st

In [24]:
Y = np.array([35,99,104,110,62,25,10,3])  # observed
mu = 2.435
poisson_dist = poisson(mu)

loop = np.array([poisson_dist.pmf(item) for item in range(0,7)])
reverse_loop = (1- sum(loop))


with_tail = np.append(loop, reverse_loop)


f_exp = with_tail*sum(Y) #exepected
f_exp

final = st.chisquare(f_obs = Y, f_exp = f_exp)
final

Power_divergenceResult(statistic=6.491310681109821, pvalue=0.4836889068537269)

## BONUS/OPTIONAL - Question 2
Let's analyze a discrete distribution. To analyze the number of defective items in a factory in the city of Medellín, we took a random sample of n = 60 articles and observed the number of defectives in the following table:

![](table2.png)

A poissón distribution was proposed since it is defined for x = 0,1,2,3, .... using the following model:

![](image1.png)

For some extra insights check the following link: https://online.stat.psu.edu/stat504/node/63/ 

Does the distribution of defective items follow this distribution?

In [None]:
# your code here

## Question 3
A quality control engineer takes a sample of 10 tires that come out of an assembly line, and would like to verify on the basis of the data that follows, if the number of tires with defects observed over 200 days, if it is true that 5% of all tires have defects (that is, if the sample comes from a binomial population with n = 10 and p = 0.05). 

![](table3.png)


In [31]:
n_defective_items= [0,1,2]
observed_freq = [138,53,9]

data = pd.DataFrame({'Number of defective items' : n_defective_items, 'Observed Frequency': observed_freq})
data

Unnamed: 0,Number of defective items,Observed Frequency
0,0,138
1,1,53
2,2,9


In [None]:

# THE 6 STEPS OF HYPOTHESIS TESTING

# 1. Set the hypothesis
# 2. Choose significance / confidence level
# 3. Sample
# 4. Compute statistic
# 5. Get p-value
# 6. Decide
# the aim with H0 is to disprove a statement, and pvoe H1

In [46]:
import pandas as pd
from scipy.stats import chisquare
# 1. Set the hypothesis
#h0 The observed frequencies are significantly different from the expected frequencies
#h1The observed frequencies are consistent with the expected frequencies.")

# data
n_defective_items= [0,1,2]
observed_freq = [138,53,9]

data = pd.DataFrame({'Number of defective items' : n_defective_items, 'Observed Frequency': observed_freq})
data

#significance
alpha = 0.05

# total of 200 days
total_days = 200

# expected frequencies under the null hypothesis
expected_freq = [total_days * 0.05**k * (1 - 0.05)**(10 - k) for k in n_defective_items]

# Ensure the total number of observations matches
total_observed = sum(observed_freq)
total_expected = sum(expected_freq)
scaling_factor = total_observed / total_expected

expected_freq_scaled = [freq * scaling_factor for freq in expected_freq]

# Create a DataFrame
data = pd.DataFrame({'Number of defective items': n_defective_items, 'Observed Frequency': observed_freq, 'Expected Frequency': expected_freq_scaled})

# Display the data
print(data)

# Perform the chi-square goodness-of-fit test
chi2_stat, p_value = chisquare(f_obs=observed_freq, f_exp=expected_freq_scaled)

# Output the test results
print(f"Chi-square statistic: {chi2_stat}")
print(f"P-value: {p_value}")


print("Reject the null hypothesis: The observed frequencies are significantly different from the expected frequencies.")


   Number of defective items  Observed Frequency  Expected Frequency
0                          0                 138          189.501312
1                          1                  53            9.973753
2                          2                   9            0.524934
Chi-square statistic: 336.4395567867037
P-value: 8.771593494341623e-74
Reject the null hypothesis: The observed frequencies are significantly different from the expected frequencies.


## Question 4
A researcher gathers information about the patterns of Physical Activity of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (Low, Medium, High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (Yes = consumed, No = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table4.png)

In [2]:
import numpy as np
import scipy.stats as st

observed_low_consumed= 32
observed_low_not_consumed=12

observed_medium_consumed = 14
observed_medium_not_consumed = 22

observed_high_consumed = 52
observed_high_not_consumed = 43

# Observed frequencies
O = np.array([[observed_low_consumed, observed_low_not_consumed],
              [observed_medium_consumed, observed_medium_not_consumed],
              [observed_high_consumed, observed_high_not_consumed]])

# Expected frequencies (assuming independence)
total_rows = O.sum(axis=1)
total_cols = O.sum(axis=0)
expected = np.outer(total_rows, total_cols) / total_rows.sum()

# Chi-squared test
stats, p_value = st.chisquare(f_obs=O.flatten(), f_exp=expected.flatten(), ddof=2, axis=None)
alpha = 0.05

print(f"Chi-squared statistic: {stats}")
print(f"P-value: {p_value}")

# Check for significance
if p_value < alpha:
    print("There is a significant association between patterns of physical activity and sugary drink consumption.")
else:
    print("There is no significant association between patterns of physical activity and sugary drink consumption.")


Chi-squared statistic: 9.335753295083439
P-value: 0.025144419876525742
There is a significant association between patterns of physical activity and sugary drink consumption.
