## Question 1
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

In [10]:
import numpy as np
import scipy.stats as st

In [11]:
# H0: Distribution Observed ~ Poisson(2.435) 

# H1: Distribution Observed !~ Geom(2.435) 

alpha = 0.05


OV = np.array([35,99,104,110,62,25,10,3]) # observed values

from scipy.stats import poisson

lamb = 2.435

poisson_dist = poisson(lamb)

poisson_pmfs = np.array([poisson_dist.pmf(i) for i in range(0,7)]) # calculating the PMFs up until 6

print(poisson_pmfs) ## we only have 10 values in the array, but we have 11 values in the OV.. this is because we still need to compute the tail (the > 7 part)
                ## we'll need to calculate it separately and append it to our array of pmfs


with_tail = np.append(poisson_pmfs,poisson_dist.sf(6)) ## the tail we get with the safe function. 10 until the end

print(with_tail) ## now we have all the values of the pmfs with the tail

np.sum(with_tail) # this has to sum to 1 


[0.08759775 0.21330051 0.25969338 0.21078446 0.12831504 0.06248942
 0.02536029]
[0.08759775 0.21330051 0.25969338 0.21078446 0.12831504 0.06248942
 0.02536029 0.01245915]


1.0

In [13]:
EV = with_tail * 448
EV


stat, p_value = st.chisquare(f_obs=OV,f_exp=EV)

print(st.chisquare(f_obs=OV,f_exp=EV))

p_value < alpha 

# In this case the p_value is higher than alpha so we cannot reject the null hypothesis. We can say that our observations likely came from a population that
# follows a distribution poisson distribution with parameter 2.435

Power_divergenceResult(statistic=6.491310681109792, pvalue=0.48368890685373034)


False

## BONUS/OPTIONAL - Question 2
Let's analyze a discrete distribution. To analyze the number of defective items in a factory in the city of Medellín, we took a random sample of n = 60 articles and observed the number of defectives in the following table:

![](table2.png)

A poissón distribution was proposed since it is defined for x = 0,1,2,3, .... using the following model:

![](image1.png)

For some extra insights check the following link: https://online.stat.psu.edu/stat504/node/63/ 

Does the distribution of defective items follow this distribution?

In [2]:
# your code here

## Question 3
A quality control engineer takes a sample of 10 tires that come out of an assembly line, and would like to verify on the basis of the data that follows, if the number of tires with defects observed over 200 days, if it is true that 5% of all tires have defects (that is, if the sample comes from a binomial population with n = 10 and p = 0.05). 

![](table3.png)


In [21]:
alpha = 0.05


OV = np.array([138,53,9]) # observed values

from scipy.stats import binom

n = 10
p = 0.05

binom_dist = binom(n,p)

binom_pmfs = [binom_dist.pmf(0), binom_dist.pmf(1), binom_dist.sf(1)]

print(binom_pmfs)

np.sum(binom_pmfs)


[0.5987369392383787, 0.3151247048623047, 0.08613835589931637]


0.9999999999999998

In [25]:
EV = np.array(binom_pmfs) * 200


stat, p_value = st.chisquare(f_obs=OV,f_exp=EV)

print(st.chisquare(f_obs=OV,f_exp=EV))

p_value < alpha 

# The p_value is smaller than our significance level so we can reject the null hypothesis and say that the values observed do not significantly
# follow a binomial distribution

Power_divergenceResult(statistic=8.30617951954277, pvalue=0.015715783395951168)


True

## Question 4
A researcher gathers information about the patterns of Physical Activity of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (Low, Medium, High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (Yes = consumed, No = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table4.png)

In [32]:
## Association Test

# H0: consumption of sugary drinks are independent of level of physical activity
# H1: consumption of sugary drinks are dependent of level of physical activity

alpha = 0.05

## we just need the observed values now:


table = np.array([[32,12],
              [14,22],
              [6,9],
              ])

stat, p_value, ddof, ev = st.chi2_contingency(table)

print(st.chi2_contingency(table))

(10.712198008709638, 0.004719280137040844, 2, array([[24.08421053, 19.91578947],
       [19.70526316, 16.29473684],
       [ 8.21052632,  6.78947368]]))


In [34]:
p_value < alpha

# therewere we can reject the null hypothesis and say that consumption of sugary drinks and the level of phyiscal activity are dependent of each other

True