## Anscombe’s quartet Data without dictionary

Our goal is to perform the following operations on each of these data sets:
* Calculate the mean and standard deviation of x and y variables,
* Calculate Pearson’s correlation coefficient of x and y variables,
* Fit a linear regression explaining y by x and compute its coefficient of determination R²,
* Investigate the data visually by using plots. . .

In [None]:
# Defining a matrix storing Anscombe’s quartet data
aq = [
    10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
    8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
    13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
    9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
    11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
    14.0 9.96 14.0 8.1 14.0 8.84 8.0 7.04
    6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
    4.0 4.26 4.0 3.1 4.0 5.39 19.0 12.50
    12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
    7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
    5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
    ]

In [None]:
# Defining Statistics Package Module
using Statistics

### Mean of the Anscombe's Quartet Data

In [None]:
# Getting the mean of the aq matrix
mean(aq; dims=1);

#### Alternative of getting the mean of the aq matrix

In [None]:
# using list comprehension
[mean(col) for col in eachcol(aq)];


In [None]:
# using map function
map(mean, eachcol(aq));

In [None]:
# using do-end notation
map(eachcol(aq)) do col
    mean(col)
end;

In [None]:
[mean(@view aq[:,j]) for j in axes(aq, 2)]

### Standard Deviation of the Anscombe's Quartet Data

In [None]:
# Getting standard deviation of the aq matrix
std(aq; dims=1)

#### Alternative of getting the standard deviation of the aq matrix

In [None]:
# using list comprehension
[std(col) for col in eachcol(aq)];

In [None]:
# using map function
map(std, eachcol(aq));

In [None]:
# using do-end notation
map(eachcol(aq)) do col
    std(col)
end;

In [None]:
[std(@view aq[:, j]) for j in axes(aq, 2)]

### Pearson’s correlation coefficient of x and y variables

In [None]:
# Calculating correlations between variables
[cor(aq[:, i], aq[:, i+1]) for i in 1:2:7]

###  Fitting a linear regression explaining y by x

In [None]:
# Getting all elements in the 2nd column
y = aq[:, 2]

In [None]:
# Getting ones of the row length and all elements in the 1st column
X = [ones(11) aq[:, 1]]

In [None]:
X \ y

In [None]:
[[ones(11) aq[:, i]] \ aq[:, i+1] for i in 1:2:7]

### R² coefficient of determination
Genaral Formula: 1 - SSᵣₑₛ\SSₜₒₜ
```
1. SSᵣₑₛ stands for the sum of square of residuals
Formula: SSᵣₑₛ = ⁿ∑ᵢ₌₁(yᵢ - f(xᵢ))
where:
    yᵢ is the iᵗʰ value of the variable to be predicted.
    xᵢ is the iᵗʰ value of the explanatory variable.
    f(xᵢ) is the predicted value of yᵢ.
    n is the total number of observations.

2. SSₜₒₜ stands for the sum of square total
Formula: SSₜₒₜ = ⁿ∑ᵢ₌₁(yᵢ - ȳ)
where:
    yᵢ is the iᵗʰ value of the variable to be predicted.
    ȳ is the overall mean value of a sample.
    n is the total number of observations.
```
    

In [None]:
function R²(x, y)
    n = length(axes(aq, 1)) # The total number of observations being 11
    X = [ones(n) x]
    model = X \ y

    prediction = X * model # f(xᵢ)
    error = y - prediction # yᵢ - f(xᵢ)
    mean_y = mean(y) # Overall mean value of the sample
    
    # Getting the sum square of Residuals and overall Total
    SS_res = sum(v -> v^2, error) # ⁿ∑ᵢ₌₁(yᵢ - f(xᵢ))
    SS_tot = sum(v -> (v - mean_y)^2, y) # ⁿ∑ᵢ₌₁(yᵢ - ȳ)

    R_square = 1 - SS_res/SS_tot
    return R_square
end

In [None]:
[R²(aq[:, i], aq[:, i+1]) for i in 1:2:7]

### Plotting the Anscombe’s quartet data

In [None]:
# Defining Plots Package Module
using Plots

In [None]:
scatter(aq[:, 1], aq[:, 2]; legend=false);

In [None]:
# using plot function and list comprehension
plot(
    [scatter(aq[:, i], aq[:, i+1]; legend=false)
    for i in 1:2:7]...
)