# Chapter 7: Inference for numerical data

### Exercise 7.1 - Identify the critical $t$.

* (a) $df = 6 - 1 = 5,\ t_{5} = 2.015$
* (b) $df = 21 - 1 = 20,\ t_{20} = 2.528$
* (c) $df = 29 - 1 = 28,\ t_{28} = 2.048$
* (d) $df = 12 - 1 = 11,\ t_{11} = 3.106$

### Exercise 7.2 - $t$-distribution.

Given that the more degree of freedoms, the more the distribution becomes normal, we can say that dotted is the distribution with the less degrees of freedom, i.e. with one degree of freedom, dashed is the one with 5 degrees of freedom and the solid one is the normal.

### Exercixe 7.3 - Find the p-value, Part I.

* (a) Null hypothesis not rejected.
* (b) Null hypothesis rejected.
* (c) Null hypothesis not rejected.
* (d) Null hypothesis rejected.


### Exercixe 7.4 - Find the p-value, Part II.

* (a) Null hypothesis not rejected.
* (b) Null hypothesis not rejected.

In [2]:
from scipy.stats import t

a = 0.01
n = 18
T = 0.5

df = n - 1

(1 - t.cdf(abs(T), df)) * 2, (1 - t.cdf(abs(T), df)) * 2 < a

(0.6234852065657115, False)

### Excercise 7.5 - Working backwards, Part I.

Sample mean is simply $\bar{x} = 20$ and standard deviation is $\sigma = 2.99$.

### Excercise 7.6 - Working backwards, Part II.

Sample mean is $\bar{x} = 61$, standard deviation is $\sigma = 17.53$ and margin of error is $MOE = t\_statistic * SE = 1.71 * SE = 6$

### Exercise 7.7 - Sleep habits of New Yorkers.

* (a) Null hypothesis says that New Yorkers sleep 8 hours at night ($\mu_{NY} = 8$). Alternate hypothesis says that New Yorkers sleep a different amount of time ($\mu_{NY} \neq 8$).
* (b) Normality seems to be met, $SE = s / \sqrt{n} = 0.77 / 5 = 0.154$, $df = 25 - 1 = 24$, and the T statistic is $T = -1.75$ which leads us to a p-value of $0.092$.
* (c) The p-value tells how much is the probability of having given this result by mere chance, and has to be less than our significance level in order to reject the null hypothesis.
* (d) The difference is not significant to reject the null hypothesis.
* (e) Given the result of the p-value, I would not expect 8 to be in the interval if we choose a significance level of 90%.

### Exercise 7.8 - Heights of adults.

* (a) 171.1 cm and 170.3 cm.
* (b) 9.4 and (163.8, 177.8).
* (c) A person of 180 cm has a T score of 0.947 therefore he/she is not unusual, while a person of 155 cm is more unusual having a T score of -1.71.
* (d) No, since we would have other sample statistics differing from the above.
* (e) We would use the standard error calculated as $SE = \bar{\sigma} / \sqrt{n} = 3.066$.

### Exercise 7.9 - Find the mean.

The requested sample mean would be $56.26$ or $63.74$.

### Exercise 7.10 - $t⋆$ vs. $z⋆$.

The confidence interval will be itself larger.

### Exercise 7.11 - Play the piano.

* (a) The null hypothesis tells that Georgianna is wrong since the average years of piano lessons are not different from the global average of 5. The alternate hypothesis would give credit to Georgianna claim.
* (b) Derived confidence interval is $[3.57,\ 5.62]$, and $pval = 0.43$ so we cannot reject the null hypothesis.
* (c) They agree since there is a huge overlap between the null mean and the alternate mean.

### Exercise 7.12 - Auto exhaust and lead exposure.

* (a) Null hypothesis is that the obtained result is only gotten by chance ($H_0: \mu = 35$) while alternate hypothesis is that there is a significant difference in the actual levels ($H_0: \mu \neq 35$).
* (b) Since $n > 30$, we can assume normality.
* (c) We got a p value of almost zero, so we can reject the null hypothesis.

### Exercise 7.13 - Car insurance savings.

He would need to interview around 384/385 people.

### Exercise 7.14 - SAT scores.

* (a) She would need a sample of 273.
* (b) He would definitely need a bigger sample.
* (c) The minimum sample size would be 664.

### Exercise 7.15 - Air quality.

We can use a paired test if for each capital we have a sample for both 2013 and 2014. It would make sense to use paired data to perform our inference.

### Exercise 7.16 - True / False: paired.

* (a) True.
* (b) True.
* (c) True.
* (d) False, since each observation of the first dataset is subtracted from the corresponding observation in the second dataset or viceversa.

### Exercise 7.17 - Paired or not? Part I.

* (a) Paired, since for each student we have a pre-semester grade and a post-semester grade.
* (b) Unpaired, since this is about the difference of two different populations.
* (c) Paired, since we are measuring the same measure over time for the same sample of patients.
* (d) Paired, since we would measure the weight of the same individuals before and after.

### Exercise 7.18 - Paired or not? Part II.

* (a) Unpaired, since they are two different populations.
* (b) Paired, since the same items price are measured in different places.
* (c) Unpaired, since the populations are different.

### Exercise 7.19 - Global warming, Part I.

* (a) They are paired since for each observation we are matching the temperatured as measured in 1948 and 2018.
* (b) Null hypothesis is that the number of days with temperature exceeding 90 degrees did not change from 1948 to 2018 ($H_0: \mu_{1948} = \mu_{2018}$). Alternate hypothesis is that the number of days with temperature exceeding 90 degrees did change ($H_0: \mu_{1948} \ne \mu_{2018}$). 
* (c) Normality can be assumed as the sample is big and there are no extreme outliers.
* (d) First we calculate $SE = \frac{sd}{\sqrt{n}} = 0.21$. We can calculate the t-statistic as $t = \frac{2.9}{17.2} = 2.36$ with $p\_value = 0.01$.
* (e) The p-value makes us reject the null hypothesis in favor of the alternate hypothesis.
* (f) We might have made a type I error where we rejected a true null hypothesis.
* (g) I wouldn't expect the value $0$ included in my confidence interval, and indeed it is not.

### Exercise 7.20 - High School and Beyond, Part I.

* (a) There seems to be a slight difference in the average reading and writing scores.
* (b) From the left plot, I can see they are somehow generally related.
* (c) $H_0: \mu_{read} - \mu_{write} = 0$ and $H_A: \mu_{read} - \mu_{write} \ne 0$.
* (d) Sample is random, there are no extreme outliers for the sample size so normality can be assumed.
* (e) Let's calculate the _Standard Error_ $SE = 0.628$. Let's calculate the _t-statistic_ as $t = -0.867$ which leads us to $p\_value = 0.387$.
* (f) We might have made a _Type II Error_, that is failing to reject a null hypothesis which is actually true. So we might actually have a significant difference between reading and writing scores, but the data failed to highlight it.
* (g) I would definitely expect the value $0$ to be included.

### Exercise 7.21 - Global warming, Part II.

* (a) The 90% confidence interval is $[2.63,\ 3.17]$.
* (b) We are 90% confident that the true average difference is in the above interval.
* (c) Yes, since the interval is delimited by two positive values showing the actual presence of a difference.

### Exercise 7.22 - High school and beyond, Part II.

* (a) The 95% confidence interval is $[-1.583,\ 0.493]$.
* (b) We are 95% confidence that the true population difference between read and write scores is within the above interval.
* (c) No, it does not, since it includes the $0$ value.

### Exercise 7.23 - Friday the 13<sup>th</sup>, Part I.

* (a) Data is paired: they refer to the number of cars passing by a specific intersection on both days.
* (b) $H_0: n_{6^{th}} - n_{13^{th}} = 0$ while $H_A: n_{6^{th}} - n_{13^{th}} \ne 0$. Basically in the null hypothesis we are assuming that there is no actual difference between the two numbers while in the alternate hypothesis we are saying there is difference.
* (c) We are in a particular situation where assuming normality is a very strong assumption and given the outliers, these have to be reported.
* (d) We have $T = 4.93$ and $p\_value = 0.001$.
* (e) We can reject the null hypothesis. We have strong evidence that the average number of cars at the intersection is greater on 6<sup>th</sup>.
* (f) If the average number of cars passing were the same for both days the probability to see such a test statistic would be less than $0.01$.
* (g) We might have made a _Type I Error_ that is to reject a null hypothesis that should not be rejected.

In [3]:
from scipy.stats import t

def se_diff(s1, s2, n1, n2):
    return ((s1 ** 2) / n1 + (s2 ** 2) / n2) ** 0.5

def df_diff(n1, n2):
    return min(n1 - 1, n2 - 1)


### Exercise 7.24 - Diamonds, Part I.

Before proceeding with our testing, we need to assess normality. A sample size of 23 will require that no big outliers are present and if so, they ought to be reported when stating the results. From the box plots, we can see that there are some outliers, and this has to be reported.

We are comparing two means, so we will use the difference of two means inference framework.
Let denote with $\mu_{diff} = \mu_{1.00} - \mu_{0.99} = 12.3$ the difference between the standardized prices for 1-carat diamonds and 0.99-carat diamonds. Let us calculate the _Standard Error_ as $SE = 4.36$ and degrees of freedom $df = 23 - 1 = 22$.

Let's now move on calculating $T = 2.82$ which leads to $p\_value = 0.01$ which allows us to reject the null hypothesis and affirm that there is a significant difference in the standardized prices, which by logic there should not be.

### Exercise 7.25 - Friday the 13th, Part II.

* (a) Normality is a very strong assumption here given the small number of samples. We have to note that and state this when we conclude our analysis. With that in mind, let's state the two hypothesis: $H_0: \mu_{13^{th}} - \mu_{6^{th}} = 0,\ H_A: \mu_{13^{th}} - \mu_{6^{th}} \ne 0$. Let's find the _Standard Error_ as $SE = 1.23$ with a _T Statistic_ of $T = 2.71$. Let's calculate the _p-value_ as $p\_value = 0.04$ which makes us reject the null hypothesis. We might have made a _Type I Error_ here.

* (b) Let's find the critical value first $t_{95} = 2.57$ and then let's define our _95% Confidence Interval_ as $[0.17,\ 6.49]$ which means that we are 95% confident that on average there are between 0.17 and 6.49 more admissions to ER during the 13<sup>th</sup> compare to the 6<sup>th</sup>.

* (c) The dataset is too small to deduct any conclusion anyway. The 95% confidence interval indeed show us 0 as a value that might be plausible. I would refrain from certain conclusions based on the data at hand.

### Exercise 7.26 - Diamonds, Part II.

Let's find the critical value first $T = 2.07$ and the _95% Confidence Interval_ is $[3.25,\ 21.35]$. So we are 95% confident that on average 1.00-carat diamonds costs (per carat) more than 0.99-carat diamons by an amount between the given confidence interval.

### Exercise 7.27 - Chicken diet and weight, Part I.

* (a) There is clear difference between the two groups, wiht linseed-fed chickens being heavier. 
* (b) Let's calculate the _Standard Error_ as $SE = 19.41$, the _degrees of freedom_ as $df = 9$, $T = 3.02$ and finally we can compute $p\_value = 0.014$. So data suggest that there is a significant weight difference.
* (c) We might have committed a _Type I Error_ that is rejecting a true Null Hypothesis.
* (d) With $\alpha = 0.01$ we would not reject the Null Hypothesis.

### Exercise 7.28 - Fuel efficiency of manual and automatic cars, Part I.

Let's calculate $SE = 1.13$ and $df = 25$. Let's move on calculating the _T Statistic_ as $T = 3.30$ which leads to a p-value of $p\_value = 0.003$. Now, assuming the hypothesis are $H_0: \mu_{manual} - \mu_{automatic} = 0$ and $H_A: \mu_{manual} - \mu_{automatic} \ne 0$, we can reject the null hypothesis given the p-value, which means that the data provides strong evidence that there is an actual difference in fuel efficiency.

### Exercise 7.29 - Chicken diet and weight, Part II.

Let's calculate $SE = 23.56$ and $df = 11$. Now we get $T = 3.27$ and $p\_value = 0.007$ which makes us reject the null hypothesis according to which there is no difference in average weight between the two groups. So it is safe to say that the casein diet is responsible for the higher average weight.

### Exercise 7.30 - Fuel efficiency of manual and automatic cars, Part II.

The _98% Confidence Interval_ for the difference $\mu_{manual} - \mu_{automatic}$ is $[1.4090783673065097,\ 8.510921632693485]$ so we can say that we are 98% confident that the average highway fuel efficienty for manual cars is higher than the one for automatic cars by an amount within the found confidence interval.

### Exercise 7.31 - Prison isolation experiment, Part I.

We need to check if normality is met first: for _Tr 2_ we have a pretty normal distribution while the other two treatments does not really show a normal trend and therefore normality is a strong assumption to make. The null hypothesis is that there is no difference among the average scores from each treatments. We can compare them two by two, remembering that the null hypothesis is that the average scores are the same, while the alternate hypothesis is that there's an actual difference between the two averages. We want to use $\alpha = 0.05$.

* Let's first compare $\mu_{Tr_1}$ with $\mu_{Tr_2}$. Let's calculate, as per usual, $SE = 3.91$, $df = 14 - 1 = 13$, $T = 0.86$ and $p\_value = 0.41$. So there's not sufficient evidence to reject the null hypothesis from this dataset.
* Let's first compare $\mu_{Tr_1}$ with $\mu_{Tr_3}$. Let's calculate, as per usual, $SE = 4.01$, $df = 14 - 1 = 13$, $T = 2.35$ and $p\_value = 0.04$. So there's strong evidence suggesting an actual difference in average score between _Tr 1_ and _Tr 2_.
* Let's first compare $\mu_{Tr_2}$ with $\mu_{Tr_3}$. Let's calculate, as per usual, $SE = 3.12$, $df = 14 - 1 = 13$, $T = 1.94$ and $p\_value = 0.07$. So there is not sufficient evidence to assess that treatment _Tr 2_ and _Tr 3_ average scores differ significantly, i.e. there is no evidence to reject the null hypothesis.

### Exercise 7.32 - True / False: comparing means.

* (a) False, we need to check the appropriate conditions (i.e. for $n < 30$ that no clear outliers are present while for $n \ge 30$ that no extreme outliers are present).
* (b) True, as the distribution approaches a normal distribution (given the number of samples increasing).
* (c) No, we use a pooled standard error when the standard deviations are similar.

In [11]:
# Cell to work out the math

m1, m2 = 2.86, -3.21
s1, s2 = 7.94, 8.57
n1, n2 = 14, 14

diff_mean = m1 - m2

alpha = 0.02

se = se_diff(s1, s2, n1, n2)
df = df_diff(n1, n2)

cv = t.ppf(1 - alpha / 2, df)

t_stat = (m1 - m2) / se 
p_val = t.sf(abs(t_stat), df) * 2

print(f"SE = {se}, df = {df}, cv (alpha = {alpha}) = {cv}, t statistic = {t_stat}, p-value = {p_val} Confidence interval = {diff_mean - cv * se, diff_mean + cv * se}")



SE = 3.1223674625880555, df = 13, cv (alpha = 0.02) = 2.6503088378527013, t statistic = 1.9440376806158244, p-value = 0.07385522761200089 Confidence interval = (-2.2052380811208376, 14.345238081120838)
