# Chapter 8. Principles of statistical analysis

## Imports 

In [None]:
import DataFrames as pd # pandas in python
import Distributions as dst

## Warning regarding solutions

NO GUARANTEE THAT THE SOLUTIONS WILL WORK OR WORK CORRECTLY! USE THEM AT YOUR OWN RISK!

THE ANSWERS PROVIDED BELOW MAY BE WRONG. USE THEM AT YOUR OWN RISK!

# Exercises

## Exercise 8.2

Eight diabetic patients had plasma glucose levels (mmol/l) measured before and one hour after oral administration of 100 g glucose (Feingold et al., 1989), with the following results

before = [4.67, 4.97, 5.11, 5.17, 5.33, 6.22, 6.50, 7.00]

after = [5.44, 10.11, 8.49, 6.61, 10.67, 5.67, 5.78, 9.89]

(a) Calculate the standard error of the mean change in plasma glucose

(b) On the basis of these data, how many diabetic patients would need to be studied so that the width of the 95% conficence interval for the mean change in plasma glucose level was 0.5 mmol/l? (Assume that the Normal distribution is the appropriate sampling distribution of the change in plasma glucose.)

In [None]:
ex82 = pd.DataFrame((
    ;glucose_before = [4.67, 4.97, 5.11, 5.17, 5.33, 6.22, 6.50, 7.00],
    glucose_after = [5.44, 10.11, 8.49, 6.61, 10.67, 5.67, 5.78, 9.89]
    ))
ex82[:, "change"] = ex82[!, "glucose_after"] - ex82[!, "glucose_before"]
ex82

### Ex.8.2a Solution

In [None]:
function get_mean(xs::Vector{<:Number})::Float64
    return mean::Float64 = sum(xs) / length(xs)
end

function get_sd(xs::Vector{<:Number})::Float64
    mean::Float64 = get_mean(xs)
    return sqrt(sum([(x-mean)^2 for x in xs]) / (length(xs) - 1))
end

In [None]:
function get_sem(xs::Vector{<:Number})::Float64
    return get_sd(xs) / sqrt(length(xs))
end

In [None]:
# here we don't use the formula for two independent samples
# since before and after are dependent, so sem for single sample is OK
get_sem(ex82[!, "change"])

### Ex.8.2a Answer

So, SEM of change for glucose is equal to 0.8354...

### Ex.8.2b Solution

So in order to answer this question I need to solve this equation for `sem`:

$(mean + sem * 1.96) - (mean - sem * 1.96) = 0.5$

After transformations I got:

$sem = 0.25/1.96$

So in order to have range of 95% CI sem needs to be equal to 0.25/1.96 = 0.1275...

Since `sem` is `sd/sqrt(n)` than I need to solve this equation for n:

$sem = sd / sqrt(n)$

After transformations I got:

$n = (sd / sem)^2$

In [None]:
(get_sd(ex82[!, "change"]) / (0.25/1.96))^2

It seems that it takes between 343 and 344 patients from population of normal distribution with mean and sd like in the ex82[!, "change"]
to reduce the range of 95% CI to 0.5 mmol/l.

Let's test this with a computer simulation.

In [None]:
# 95% ci range for sample
function get_95perc_ci_range(xs::Vector{<:Number})::Float64
    mean::Float64 = get_mean(xs)
    sem::Float64 = get_sem(xs)
    upper_95perc_ci::Float64 = mean + 1.96 * sem
    lower_95perc_ci::Float64 = mean - 1.96 * sem
    return upper_95perc_ci - lower_95perc_ci
end

In [None]:
function get_95perc_ci_range_from_simulation(
    population_mean::Float64, population_sd::Float64,
    n_in_sample::Int)::Float64
    population::Vector{<:Number} = rand(
        dst.Normal(population_mean, population_sd), n_in_sample)
    sem = get_sem(population)
    mean = get_mean(population)
    upper_95perc_ci::Float64 = mean + 1.96 * sem
    lower_95perc_ci::Float64 = mean - 1.96 * sem
    return upper_95perc_ci - lower_95perc_ci
end

In [None]:
function estimate_95perc_ci_range(
    population_mean::Float64, population_sd::Float64, n_in_sample::Int,
    n_simulations::Int)::Float64
    ranges::Vector{Float64} = [
        get_95perc_ci_range_from_simulation(population_mean, population_sd,
        n_in_sample) for _ in 1:n_simulations]
    return sum(ranges)/n_simulations
end

In [None]:
estimate_95perc_ci_range(
    get_mean(ex82[!, "change"]), get_sd(ex82[!, "change"]),
    343, 100_000)

### Ex.8.2b Answer

Both mathematical calculations and computer simulation indicate that it takes a sample of 343 or 344 patients drawn from a normal population with `mean = mean(change)` and `sd = sd(change)` to get the range of 95% CI equal to 0.5 mmol/l.