# Statistical Data Management Session 9: Inferences Based on a Single Sample Tests of Hypothesis (chapter 8 in McClave & Sincich)

## 1. The European ℮ Standard

Prepackaged items in the EU may bear the ℮-mark to show that they are conforming with EU weight standards (see  https://europa.eu/youreurope/business/product-requirements/labels-markings/emark/index_en.htm). 

1. To test the claim of your favourite crisp brand that their packages contain 120g, you weigh the contents of 20 packages and find $\bar{x} = 119.5$ and $s=0.8$. Is this brand complying to EU regulations correctly? The Council Directive of 20 January 1976 "on the approximation of the laws of the Member States relating to the making-up by weight or by volume of certain prepackaged products" OJ L 046 21.2.1976, p. 1 stipulates a one-sided t-test at confidence level $\alpha = 0.005$. You may assume the weights follow a normal distribution.

    $H_0: \mu = 120$, we test whether the sample weighs less than stipulated, so perform a one-sided test: $H_a: \mu < 120$.

In [None]:
import sqlite3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats as sts
import time
%matplotlib inline

n = 20
x_bar = 119.5
s = 0.8
mu_0 = 120

alpha = 0.005
t_distribution = sts.t(n - 1)
t_alpha = t_distribution.ppf(alpha)

t = np.sqrt(n) * (x_bar - mu_0) / s

print("Critical t value:", t_alpha)
print("Obtained t value:", t)
# The t-value is not smaller than t_alpha, so don't reject H_0.
# You cannot claim that the packages are not filled enough according to EU regulations.

2. Calculate and interpret the $p$-value of the test.

In [None]:
p_value = t_distribution.cdf(t)
print(p_value)

This is the probability $p$ that, if $H_0$ is true, a sample of size 20 has a mean $\bar{x} = 119.5$ or less. Note that $p>\alpha$, which we expected, as the test was not significant. Or, in other words, the probability that the observed deficiency (less than the required 120) in weight can be attributed to chance, is higher than the threshold.

3. Now assume you have more information about the production process, implying that we know that the machines that fill the packages do this with a standard deviation $\sigma = 0.8$. Perform the test again!

    $\sigma$ is known + assumption of normality, so you may perform a large-sample test!

In [None]:
n = 20
x_bar = 119.5
sigma = 0.8
mu_0 = 120

alpha = 0.005
standard_normal = sts.norm(0,1)
z_alpha = standard_normal.ppf(alpha)

z = np.sqrt(n) * (x_bar - mu_0) / sigma

print("Critical z value:", z_alpha)
print("Obtained z value:", z)
# Now the z-value is smaller than z_alpha, so reject H_0 in favour of H_a.
# Note that knowledge of this population parameter, rather than having to estimate it as well from the sample,
# leads to (potentially) stronger conclusions!

# Same with p-value:
print(standard_normal.cdf(z))
# p-value < alpha => significant

## 2. Comparing Exam UML Tasks

Say that for an *Object Oriented Software Development* exam, it is known that an expected proportion of 75% of participants passes the UML modelling task. In the interest of fairness, the teachers of this course want to guard whether different exams are comparable. In the file ``uml.csv`` in the ``shared`` folder, you find scores, out of 64, of an exam UML taks. Run the cell below to define the proportion of passed students for this exam.

In [None]:
df_uml = pd.read_csv("../../shared/uml.csv")
n = len(df_uml)
p_hat = len(df_uml[df_uml["MarksUML"] >= 32]) / n
print("Number of participants:            ", n)
print("Proportion of students who passed: ", p_hat)

Perform a test at significance level $\alpha = 0.05$ to check whether this pass-rate is significantly different from $75\%$.

$H_0: p = 0.75$, we test whether results are different so perform a two-sided test: $H_a: p \neq 0.75$.

In [None]:
q_hat = 1 - p_hat
print("Large sample test OK? =>", n * q_hat > 15 and n * p_hat > 15)
p_0 = 0.75
q_0 = 1 - p_0

alpha = 0.05
standard_normal = sts.norm(0,1)
z_alpha = standard_normal.ppf(1 - alpha/2) # two-sided: "divide uncertainty over two tails"

z = np.sqrt(n) * (p_hat - p_0) / np.sqrt(p_0 * q_0)
print("Critical z values:", -z_alpha, z_alpha)
print("Obtained z value:", z)

As $-z_{\alpha/2} < z < z_{\alpha/2}$, we do not reject $H_0$ and conclude that the pass-rate is not significantly different from 0.75 at level of significance $\alpha = 0.05$.

## 3. List Performance

How long does it take Python to do an operation on a huge list? Repeated simulations lead me to hazard the opinion that the function ``fill_with_ones()`` defined below takes 0.025 seconds to run in my Notebook interpreter, when applied to a list of a million items. We will test whether your hub performs worse (i.e. longer execution time).

1. To check this, we need a data set. One execution of a function is not representative as the execution time depends on other processes as well. To overcome this problem, the code below executes the function call to ``fill_with_ones()`` 100 times. Run the code to generate your data set.

In [None]:
def fill_with_ones(array): #      silly function that simply overwrites all entries in an array with ones
    for i in range(len(array)):
        array[i] = 1

dummy_array = [0]*1000000 #       define an array with a million zeroes
times = np.empty(100) #           array to catch the time it takes for 100 simulations

for i in range(100): #            do this a 100 times
    start = time.time() #         log the time now, before the function call
    fill_with_ones(dummy_array) # call the function
    end = time.time() #           log the time again, after the function call
    print(end - start) #          print the time difference
    times[i] = end - start #      save the time difference
print("Mean:", times.mean())

2. Formulate $H_0$ and $H_a$.

    $H_0: \mu = 0.025$, we test whether your machine performs worse, so perform a one-sided test: $H_a: \mu > 0.025$.

3. Perform the test at significance level $\alpha = 0.01$.

    $z_\alpha = 2.3263$ (calculated below).
    $z = 10(\bar{x} - 0.025)/s$, with your sample statistics. If $z > z_\alpha$, reject $H_0$ in favour of $H_a$, else don't reject $H_0$, at 0.01 significance.

    In Python:

In [None]:
standard_normal = sts.norm(0,1)
z_alpha = standard_normal.ppf(1 - 0.01)
n = 100
x_bar = times.mean()

s = times.std()
mu_0 = 0.025

z = np.sqrt(n) * (x_bar - mu_0) / s

print("Critical z value:", z_alpha)
print("Obtained z value:", z)

## 4. Birth Weight

In last week's exercises, we calculated a $90\%$ confidence interval, based on $n=42$ and $\bar{x}=3.31$, for babies' birthweight: $[3.16, 3.47]$. Assume that these data were obtained from a sample in one hospital. We want to test, at $\alpha=0.05$, whether the weight of babies born in this hospital is significantly less than the national average, which is $\mu = 3.4$ kg.

1. Comment on the following reasoning: "$3.4$ lies more to the right ($3.4>3.31$) in this interval, so the birth weight in this hospital is indeed significantly less than in the national population."

    This is a wrong conclusion: while the average in this sample is less than the population mean, we don't know whether it is **significantly** less, i.e. if we are (at a certain level of certainty) no longer willing to attribute the observed difference to chance.
    
2. Draw the correct conclusion based on the given confidence interval.

    $H_0: \mu=3.4$, we test whether the observed weight is significantly less, so perform a one-sided test: $H_a: \mu<3.4$. 
    
    As $\alpha=0.05$, for a two-sided test, a $90\%$ confidence interval solves the question!
    
    Conclusion: as $3.4 \in [3.16, 3.47]$, we cannot reject $H_0$, the observed difference is not significant at $\alpha=0.05$.

## 5. SQL Recap

The file ``uml.sql`` provided on Toledo contains the information used in exercise 2: student q-numbers and scores. Note that certain students occur twice, e.g. q-number 114 with scores 14 and 15. In that case, their answer was spread over multiple pages and their score is the sum of these individual numbers. Import the file using MySQL Workbench and write the appropriate queries to retrieve the relevant information. Re-run your analysis (without running the cell which defined the dataframe!) to check whether you have the correct information.

In [None]:
conn = sqlite3.connect("../../shared/uml.db")

# should generate a single result, namely the number of students
query_total = """
SELECT COUNT(DISTINCT q_number) AS total FROM marks
"""

# should generate all students (q-numbers) who passed
query_passed = """
SELECT COUNT(DISTINCT q_number) AS passed FROM marks WHERE q_number IN (SELECT q_number FROM marks GROUP BY q_number HAVING SUM(MarksUML) >=32)
"""

df_total = pd.read_sql_query(query_total, conn)
df_passed = pd.read_sql_query(query_passed, conn)
print(df_total)
print(df_passed)

# Note that our analysis still works!
n = df_total["total"]
p_hat = df_passed["passed"] / n

q_hat = 1 - p_hat

p_0 = 0.75
q_0 = 1 - p_0

alpha = 0.05
standard_normal = sts.norm(0,1)
z_alpha = standard_normal.ppf(1 - alpha/2) # two-sided: "divide uncertainty over two tails"

z = np.sqrt(n) * (p_hat - p_0) / np.sqrt(p_0 * q_0)
print("Critical z values:", -z_alpha, z_alpha)
print("Obtained z value:", z)