## **7.1** Passing proportions

In [15]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy.stats as stats
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.power as smp
import statsmodels.stats.proportion as smprop

![image.png](attachment:image.png)

### **a)** Compute a 95% confidence interval for the difference between the two passing proportions

---

In [12]:


# Proportions for each group
p1 = 82 / 108  # Proportion passed for Course 1
p2 = 104 / 143  # Proportion passed for Course 2

# Sample sizes
n1 = 108
n2 = 143


![image.png](attachment:image.png)

In [13]:

# Standard error of the difference
se_diff = np.sqrt((p1 * (1 - p1) / n1) + (p2 * (1 - p2) / n2))

# 95% Confidence interval
z = 1.96  # z* value for 95% confidence

lower = (p1 - p2) - z * se_diff
upper = (p1 - p2) + z * se_diff

print(f"95% Confidence Interval: [{lower:.4f}, {upper:.4f}]")

95% Confidence Interval: [-0.0768, 0.1408]


### **b)** What is the critical values for the $χ^2$-test of the hypothesis 
### H0 : p1 = p2 with significance level α = 0.01?

---

### Steps:

1. **Hypothesis test setup**
   
   ![image.png](attachment:image.png)
   

2. **Degrees of freedom**
   
   ![image-2.png](attachment:image-2.png)

3. **Critical value for $χ^2$**
   
   ![image-3.png](attachment:image-3.png)

In [17]:
critical_value = stats.chi2.ppf(0.99, df=1)
print(critical_value)

6.6348966010212145


### **c)** If the passing proportion for a course given repeatedly is assumed to be 0.80 on average, and there are 250 students who are taking the exam each time, what is the expected value, µ and standard deviation, σ, for the number of students who do not pass the exam for a randomly selected course?

---

### We have:

* Passing proportion: $p=0.80$

* Failing proportion: $q=1-p=0.20$

* Total number of students: $n=250$

### The expected value for the number of students who fail the exam is calculated as:

$μ=n⋅q$

### Standard Deviation (𝜎)

$\sqrt{n⋅p⋅q}$

In [22]:
print(250*0.20)

print(np.sqrt(250*0.20*0.8))

50.0
6.324555320336759


---

## **7.2** Outdoor lightning

![image.png](attachment:image.png)

### **a)** Is there a significant difference between the proportion exported and the proportion sold in Denmark (with α = 0.05)?

---


![image.png](attachment:image.png)

### We have:

* Total lamps sold: $n = 250$
* Number sold in Denmark: $110$
* Observed proportion: $\hat{p} = \frac{110}{250} = 0.44$
* Hypothesized proportion: $p_0 = 0.5$
* Significance level: $\alpha = 0.05$

### The test statistic for a one-sample proportion test is calculated as:

$$
z_{\text{obs}} = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1 - p_0)}{n}}}
$$



In [None]:
denamark_procent = 7.2+28.0+8.8

export_procent = 6.4+34.8+14.8

print(denamark_procent)
print(export_procent)



44.0
56.0


In [None]:
# nu har vi procenterne så skal vi finde antallet:

denamrk = 250-((250/100)*44)

export = 250-((250/100)*56)
print(denamrk)
print(export)

140.0
110.0


In [26]:
z_obs = (140-250*0.5)/np.sqrt(250*0.5*0.5)
print(z_obs)

1.8973665961010275


### Substituting the values:

$$
z_{\text{obs}} = 1.90
$$

### Critical Value

At a significance level of $\alpha = 0.05$ for a two-tailed test:

$$
z_{\alpha/2} = \pm 1.96
$$

### Decision

- If $|z_{\text{obs}}| > 1.96$, reject the null hypothesis.
- If $|z_{\text{obs}}| \leq 1.96$, fail to reject the null hypothesis.

Here, $|z_{\text{obs}}| = 1.90$, which is less than $1.96$.

### Conclusion

There is **no significant difference** between the proportion of lamps sold in Denmark and the hypothesized proportion of $50\%$ at the $5\%$ significance level.

### **b)** The relevant critical value to use for testing whether there is a significant difference in how the sold variants are distributed in Denmark and for export is (with α = 0.05)?

---

### We have:

For this part, we are performing a **chi-squared test for independence** to determine whether there is a significant difference in the distribution of sold variants between Denmark and export.

### Step 1: Hypotheses

* **Null Hypothesis ($H_0$)**: The distribution of sold variants is the same in Denmark and for export.
* **Alternative Hypothesis ($H_A$)**: The distribution of sold variants differs between Denmark and export.

### Step 2: Degrees of Freedom (df)

The degrees of freedom for a chi-squared test are calculated as:

$$
\text{df} = (r - 1)(c - 1)
$$

Where:
- $r = 3$ (number of rows: copper, painted, stainless steel)
- $c = 2$ (number of columns: Denmark, export)

Thus:

$$
\text{df} = (3 - 1)(2 - 1) = 2
$$

### Step 3: Critical Value

For $\alpha = 0.05$ and $\text{df} = 2$, the critical value can be found using a chi-squared distribution table or Python:

```python
from scipy.stats import chi2

critical_value = chi2.ppf(0.95, df=2)
print(critical_value)


In [27]:


critical_value = stats.chi2.ppf(0.95, df=2)
print(critical_value)

5.991464547107979


---

## **7.3** Local elections

![image.png](attachment:image.png)


### **a)** At the time of the exit poll the p was of course not known. If the following hypothesis was tested based on the exit poll

### H0 : p = 0.295
### H1 : p ̸= 0.295,

### what test statistic and conclusion would then be obtained with α = 0.001

---

In [None]:
from scipy.stats import norm
import numpy as np


n = 740  # Total sample size
p0 = 0.295  # Hypothesized proportion
x = 168  # Observed number of successes

# Calculate the sample proportion
p_hat = x / n

# Calculate the test statistic (z_obs)
z_obs = (p_hat - p0) / np.sqrt(p0 * (1 - p0) / n)
print(f"Test Statistic (z_obs): {z_obs:.2f}")

# Critical value for two-tailed test at significance level 0.001 (alpha = 0.001)
z_critical = norm.ppf(1 - 0.001 / 2) #two tailed so we divide with 2
print(f"Critical value (z_critical): ±{z_critical:.3f}")

# Conclusion
if abs(z_obs) > z_critical:
    print("Conclusion: Reject the null hypothesis.")
else:
    print("Conclusion: Fail to reject the null hypothesis.")


Test Statistic (z_obs): -4.05
Critical value (z_critical): ±3.291
Conclusion: Reject the null hypothesis.


### **b)** Calculate a 95%-confidence interval for p based on the exit poll.

---


In [None]:

p_hat = 0.227  #observed proportion
n = 740  
z_critical = 1.96  


margin_of_error = z_critical * np.sqrt(p_hat * (1 - p_hat) / n)

CI = p_hat + np.array([-margin_of_error, margin_of_error])
print(CI)


[0.19681836 0.25718164]


### **c)** Based on a scenario that the proportion voting for particular party is around 30%, how large an exit poll should be taken to achieve a 99% confidence interval having a width of 0.01 in average for this proportion?

---

![image.png](attachment:image.png)

In [34]:
import numpy as np

# Given values
p = 0.30  # Estimated proportion
width = 0.01  # Desired width of confidence interval
z_alpha2 = 2.576  # z-value for 99% confidence

# Calculate required sample size
n = (2 * z_alpha2 * np.sqrt(p * (1 - p)) / width) ** 2
print(n)
print(np.ceil(n)) # rounds each element in an array to the nearest integer that is greater than or equal to the element


55740.51840000001
55741.0


---

## **7.4** Sugar quality

![image.png](attachment:image.png)


### **a)** If the following hypothesis
### H0 : pA = pB,
### H1 : pA ̸= pB. is tested on a significance level of 5%, what is the p-value and conclusion?

---

In [None]:


# Data for Supplier A and Supplier B
count = [6, 12]  # Number of defective bags for each supplier
nobs = [50, 50]  # Total bags inspected for each supplier

# Perform two-proportion z-test
z_obs, p_value = smprop.proportions_ztest(count, nobs, alternative='two-sided')

# Display results
print(f"Test Statistic (z): {z_obs:.4f}")
print(f"P-value: {p_value:.4f}")

# Conclusion
alpha = 0.05

if p_value < alpha:
    print("Conclusion: Reject the null hypothesis. The defective rates differ significantly.")
else:
    print("Conclusion: Fail to reject the null hypothesis. No significant difference in defective rates.")


Test Statistic (z): -1.5617
P-value: 0.1183
Conclusion: Fail to reject the null hypothesis. No significant difference in defective rates.


### **b)** A supplier has delivered 200 bags, of which 36 were defective. A 99% confidence interval for p the proportion of defective bags for this supplier is:

---

In [36]:
print(smprop.proportion_confint(36, 200, alpha=0.01))


(0.11002462081879329, 0.2499753791812067)


### **c)** Based on the scenario, that the proportion of defective bags for a new supplier is about 20%, a new study was planned with the aim of obtaining an average width, B, of a 95% confidence interval. The Analysis Department achieved the result that one should examine 1537 bags, but had forgotten to specify which value for the width B, they had used. What was the value used for B?

---

![image.png](attachment:image.png)

In [37]:


# Given values
n = 1537  # Sample size
p = 0.20  # Proportion of defective bags
z_alpha2 = 1.96  # z-value for 95% confidence

# Calculate the width B
B = 2 * z_alpha2 * np.sqrt((p * (1 - p)) / n)
print(f"The width of the confidence interval, B, is: {B:.4f}")


The width of the confidence interval, B, is: 0.0400


---

### **7.5** Physical training

![image.png](attachment:image.png)


### **a)** What is the expected number of individuals with above average training condition and good job success under H0 (i.e. if H0 is assumed to be true)?

---

![image.png](attachment:image.png)

In [None]:
# Given values
row_total = 63  # Total for Good job success
col_total = 80  # Total for Above average training condition
grand_total = 200  # Grand Total

# Calculate expected frequency
E = (row_total * col_total) / grand_total
print(f"Expected frequency: {E:.2f}")


Expected frequency: 25.20


### **b)** For the calculation of the relevant **χ^2**-test statistic, identify the following two numbers: 

### – A: the number of contributions to the test statistic
### – B: the contribution to the statistic from table cell (1,1)

---

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

### **c)** The total **χ^2**-test statistic is 10.985, so the p-value and the conclusion will be (both must be valid):

---

In [39]:


# Given values
chi2_stat = 10.985  # Test statistic
df = 4  # Degrees of freedom

# Calculate p-value
p_value = 1 - stats.chi2.cdf(chi2_stat, df)
print(f"P-value: {p_value:.4f}")

# Conclusion
alpha = 0.05
if p_value < alpha:
    print("Conclusion: Reject the null hypothesis. There is a significant relationship.")
else:
    print("Conclusion: Fail to reject the null hypothesis. No significant relationship.")


P-value: 0.0267
Conclusion: Reject the null hypothesis. There is a significant relationship.
