In [24]:
import numpy as np
from scipy import stats
from scipy.stats import chi2
from scipy.stats import wilcoxon

# **Exercise 1**

It is known that the average myocardial transit time in healthy individuals is 4.5sec. Myocardial transit time was measured in 10 patients with occluded right coronary arteries.

1) Test at the 5% significance level if the mean myocardial transit time of the population of patients with occluded right coronary arteries is equal to the value of 4.5sec. Present all steps of the test. Comment on whether the conclusions of the test are extended to all patients with occluded coronary arteries.
2) Test at the 5% significance level if the standard deviation of the myocardial transit time of the population of patients with occluded right coronary arteries is equal to 0.6 sec.

In [3]:
data = np.array([5.1, 5.6, 4.6, 3.8, 4.2, 5.1, 3.1, 3.7, 4.7, 3.3])
data

array([5.1, 5.6, 4.6, 3.8, 4.2, 5.1, 3.1, 3.7, 4.7, 3.3])

## Question 1

**Steps when the population variance is unknown:** 
- Step 1 : We set the null hypothesis $H_0: \mu=4.5$.
- Step 2 : We set the alternative hypothesis $H_1: \mu \neq 4.5 since the hypothesis is two-tailed.
- Step 3 : We set $t=\sqrt{n}\frac{\bar{x}-4.5}{s}$ (which follows the Student’s t distribution with n-1 degrees of freedom) where s is the sample standard deviation.
- Step 4 : We select the level of significance a and estimate the corresponding point $t_{a/2,n-1}$.
- Step 5 : We calculate the value of t for the sample.
- Step 6 : We reject the null hypothesis in favor of $H_1 : \mu \neq 4.5$  if $|t|>t_{a/2,n-1}$ (two-tailed hypothesis test).

In [12]:
a = 0.05
n = len(data)
x_bar = np.mean(data)
s = np.std(data, ddof=1)

In [13]:
t_val = (x_bar - 4.5) / (s / np.sqrt(n))
critical_t = stats.t.ppf(1 - a/2, df=n-1)

print(f"Calculated t-value: {t_val}")
print(f"Critical t-value: {critical_t}")


if abs(t_val) > critical_t:
    print("Reject the null hypothesis.")
    print("There is enough evidence at the 5% significance level to conclude that the mean myocardial transit time of patients with occluded right coronary arteries is different from 4.5 sec.")
else:
    print("Do not reject the null hypothesis.")
    print("""There is not enough evidence at the 5% significance
level to conclude that the mean myocardial transit time
of patients with occluded right coronary arteries is
different from 4.5 sec.""")

Calculated t-value: -0.6816356439122179
Critical t-value: 2.2621571627409915
Do not reject the null hypothesis.
There is not enough evidence at the 5% significance
level to conclude that the mean myocardial transit time
of patients with occluded right coronary arteries is
different from 4.5 sec.


In [14]:
margin_of_error = critical_t * (s / np.sqrt(n))
confidence_interval = (x_bar - margin_of_error, x_bar + margin_of_error)

print(f"95% confidence interval for the mean: {confidence_interval}")

95% confidence interval for the mean: (3.7226305799439445, 4.9173694200560565)


While the results suggest that the null hypothesis may hold true for a subset of patients with occluded coronary arteries, the limited sample size in this study cautions against generalizing these findings to a broader population of patients.

## Question 2

**Steps for hypothesis Tests for the Variance**

- Step 1: We set the null hypothesis $H_0: \sigma = 0.6$.
- Step 2: We set the alternative hypothesis $H_1: \sigma \neq 0.6$ for the two tailed hypothesis.
- Step 3 : We set $\chi ^2=\frac{(n-1)S^2}{\sigma_0^2}$                 (which $\chi$ follows the chi-square distribution and $S^2$ is the samples variance).
- Step 4 : We select the significance level a and specify the points $\chi_{(n-1);a/2}^2$ and $\chi_{(n-1);1-a/2}^2$.
- Step 5 : We calculate the value of $\chi^2$ for the sample.
- Step 6 : We reject the null hypothesis in favor of $H_1: \sigma \neq 0.6$ if $\chi^2>\chi_{(n-1);a/2}^2$ or $\chi^2<\chi_{(n-1);1-a/2}^2$  (two-tailed test)

In [20]:
S_2 = np.var(data, ddof=1)
chi_squared_stat = (n-1) * S_2 / (0.6**2)

In [21]:
chi2_lower = chi2.ppf(a/2, df=n-1)
chi2_upper = chi2.ppf(1-a/2, df=n-1)


In [22]:
if chi_squared_stat < chi2_lower or chi_squared_stat > chi2_upper:
    print(f"""Reject the null hypothesis at the
          {a*100}% significance level.""")
    print(f"""The observed chi-squared stat of 
          {chi_squared_stat:.4f} is outside the interval 
          ({chi2_lower:.4f}, {chi2_upper:.4f}).""")
else:
    print(f"""Do not reject the null hypothesis at the
{a*100}% significance level.""")
    print(f"""The observed chi-squared stat of 
{chi_squared_stat:.4f} is inside the interval
({chi2_lower:.4f}, {chi2_upper:.4f}).""")


Do not reject the null hypothesis at the
5.0% significance level.
The observed chi-squared stat of 
17.4333 is inside the interval
(2.7004, 19.0228).


# Exercise 2

A dentist’s patients listen to the FM1 radio station during dental treatment as she believes that this calms them down. The dentist conducted a large survey of her patients' stress levels, asking them to rate the level of stress they felt during their dental treatment on a scale of 1 to 10 ("1" being no stress and "10" in the case of excessive stress). The responses followed a non-normal but symmetric distribution with a median equal to 4. In order to improve the relaxation levels of her patients during dental treatment, the dentist changed the radio station FM1 to the radio station FM2. She asked 18 of her random patients to rate on a scale of 1 to 10 their stress levels during dental treatment while listening to the FM2 radio station. Test at the 10% level of significance whether the FM2 radio station changed the stress level of patients.

**Solution**

The data illustrate a non-normal, symmetric distribution. Given that we have only 18 data points for the FM2 radio station and the data set size is less than 30, non-parametric methods are appropriate. To determine if the FM2 radio station influenced the stress levels of patients at the 10% significance level, I will conduct a two-tailed test

Since the data are:
- Do not follow Normal distribution
- Are symmetric 
- Are discrete
- Have size less than 30

The non parametric methods are appropriate to determine if the FM2 radio station influenced the stress levels of patients at the 10% significance level. To determine the answer, I will conduct a two-tailed test.


In [23]:
data = np.array([8, 2, 6, 3, 4, 3, 4, 3, 10, 1, 6, 3, 9, 3, 6, 6, 4, 6])
data

array([ 8,  2,  6,  3,  4,  3,  4,  3, 10,  1,  6,  3,  9,  3,  6,  6,  4,
        6])

In [29]:
a = 0.1
differences = data - 4

_, pvalue = wilcoxon(differences, alternative='two-sided')

if pvalue < a:
    print(f"Reject the null hypothesis. The p-value is {pvalue:.4f}")
else:
    print(f"Do not reject the null hypothesis. The p-value is {pvalue:.4f}")

Do not reject the null hypothesis. The p-value is 0.1594
