# Sampling

In the lessons, you learned the purpose of sampling. A sequence of independent and identically distributed random variables (i.i.d) $(X_1, X_2, \cdots, X_n)$ chosen from a population with distribution $F$ is called a sample from the distribution $F$. In this exercise, we aim to demonstrate the unbiasedness of the sample mean and variance.

In the code below, you are given the scores for the Advanced Programming course (the same scores used in the previous exercise). We want to evaluate the unbiasedness of these estimators by performing various samplings from this population and calculating the sample mean and variance.

In [1]:
final_grade <- c(
    19.3, 18.2, 14.1, 15.1, 17.1, 0.0,
    16.2, 17.4, 16.1, 19.2, 14.6, 13.4,
    18.0, 15.7, 14.8, 17.0, 19.6, 18.2,
    0.0, 17.2, 0.0, 18.8, 16.7, 11.0,
    17.5, 19.3, 17.2, 13.5, 0.0, 18.5,
    16.8, 16.4, 12.8, 17.9, 19.4, 12.9,
    12.4, 19.1, 12.3, 18.0, 18.4, 18.5,
    15.2, 0.0, 12.9, 14.1, 17.5, 17.2,
    15.6, 14.9, 16.9, 12.0, 15.9, 18.4,
    17.8, 19.3, 12.0, 15.6, 13.7, 10.0,
    17.6, 19.1, 19.9, 18.2, 18.3, 17.7,
    19.0, 15.5, 15.1, 18.4
)

In [2]:
sample_mean_and_variance <- function(n) {
  sampled_grade <- sample(final_grade, n)
  return (
    list(
      mean(sampled_grade), var(sampled_grade)
    )
  )
}

In [3]:
total_samples <- 1000
N <- 20
means <- c()
variances <- c()

for(i in 1:total_samples) {
  result <- sample_mean_and_variance(N)
  means <- c(means, as.numeric(result[1]))
  variances <- c(variances, as.numeric(result[2]))
}

Since the $var$ function in $R$ calculates the sample variance, we need to write a function to calculate the population variance in order to examine the unbiasedness of the estimators. We know the sample variance is computed as follows (where $n$ is the sample size):

$$S^2 = Σ_{i=1}^{n} \frac{(x_i - x̄)^2}{n - 1}$$

In [4]:
population_variance <- function(x) {
  n <- length(x)
  return (
    var(x) * (n - 1) / n
  )
}

In [5]:
cat(mean(means), mean(variances))

15.24015 23.28337

In [6]:
cat(mean(final_grade), population_variance(final_grade))

15.23429 23.21225