## Anscombe’s quartet Data with named tuples
Our goal is to perform the following operations on each of these data sets:
* Define a named tuple storing Anscombe’s quartet data,
* Calculate the mean and standard deviation of x and y variables,
* Calculate Pearson’s correlation coefficient of x and y variables,
* Fit a linear regression explaining y by x and compute its coefficient of determination R²,
* Investigate the data visually by using plots. . .

In [1]:
# Defining a matrix storing Anscombe’s quartet data
aq = [
    10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
    8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
    13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
    9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
    11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
    14.0 9.96 14.0 8.1 14.0 8.84 8.0 7.04
    6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
    4.0 4.26 4.0 3.1 4.0 5.39 19.0 12.50
    12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
    7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
    5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
    ]

11×8 Matrix{Float64}:
 10.0   8.04  10.0  9.14  10.0   7.46   8.0   6.58
  8.0   6.95   8.0  8.14   8.0   6.77   8.0   5.76
 13.0   7.58  13.0  8.74  13.0  12.74   8.0   7.71
  9.0   8.81   9.0  8.77   9.0   7.11   8.0   8.84
 11.0   8.33  11.0  9.26  11.0   7.81   8.0   8.47
 14.0   9.96  14.0  8.1   14.0   8.84   8.0   7.04
  6.0   7.24   6.0  6.13   6.0   6.08   8.0   5.25
  4.0   4.26   4.0  3.1    4.0   5.39  19.0  12.5
 12.0  10.84  12.0  9.13  12.0   8.15   8.0   5.56
  7.0   4.82   7.0  7.26   7.0   6.42   8.0   7.91
  5.0   5.68   5.0  4.74   5.0   5.73   8.0   6.89

In [8]:
#  Defining a named tuple storing Anscombe’s quartet data
data = (
    set1 = (x=aq[:, 1], y=aq[:, 2]),
    set2 = (x=aq[:, 3], y=aq[:, 4]),
    set3 = (x=aq[:, 5], y=aq[:, 6]),
    set4 = (x=aq[:, 7], y=aq[:, 8])
);

### Analyzing Anscombe’s quartet data stored in a named tuple

In [12]:
# Defining Statistics Package Module
using Statistics

### Mean of the Anscombe's Quartet Data

In [14]:
# calculate the means of x variables in each set
map(s -> mean(s.x), data)

(set1 = 9.0, set2 = 9.0, set3 = 9.0, set4 = 9.0)

In [13]:
# calculate the means of y variables in each set
map(s -> mean(s.y), data)

(set1 = 7.500909090909093, set2 = 7.500909090909091, set3 = 7.500000000000001, set4 = 7.50090909090909)

### Standard Deviation of the Anscombe's Quartet Data

In [21]:
# calculate the standard deviation of x variables in each set
map(s -> std(s.x), data)

(set1 = 3.3166247903554, set2 = 3.3166247903554, set3 = 3.3166247903554, set4 = 3.3166247903554)

In [22]:
# calculate the standard deviation of y variables in each set
map(s -> std(s.y), data)

(set1 = 2.031568135925815, set2 = 2.0316567355016177, set3 = 2.030423601123667, set4 = 2.0305785113876023)

### Pearson’s correlation coefficient of x and y variables

In [17]:
# Calculation of Pearson’s correlation of x and y varaibles in each set
map(s -> cor(s.x, s.y), data)

(set1 = 0.8164205163448398, set2 = 0.8162365060002429, set3 = 0.8162867394895983, set4 = 0.8165214368885028)

###  Fitting a linear regression explaining y by x

In [24]:
# Defining GLM Package Module
using GLM

In [44]:
# using the GLM.jl package to fit a linear model for the first data set
model = lm(@formula(y ~ x), data.set1)

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y ~ 1 + x

Coefficients:
───────────────────────────────────────────────────────────────────────
                Coef.  Std. Error     t  Pr(>|t|)  Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────────────
(Intercept)  3.00009     1.12475   2.67    0.0257   0.455737   5.54444
x            0.500091    0.117906  4.24    0.0022   0.23337    0.766812
───────────────────────────────────────────────────────────────────────

In [45]:
# using the r2 function from GLM.jl package to calculate the coefficient of determination of our model
r2(model)

0.6665424595087749