# Structuring your data by using named tuples

The NamedTuple
type is a basic method to add structure to your code. You can think of a NamedTuple
as a way to add names to the consecutive elements of a tuple.

In [3]:
aq = rand(11, 8)*14

11×8 Matrix{Float64}:
  7.78536    4.83653   9.6885    1.35995   …   9.26827   4.86986     2.3597
 11.6281     8.30996   8.80445   3.80326      13.7997    1.37371    11.0791
  3.0478     9.11315   2.2198    7.05243       8.01321  10.8671      1.19453
  4.12105    1.82298   4.86586   0.683873     12.3057    0.546671   11.9969
  9.53409    8.84699   2.09452   9.9412        7.32993  12.1817      4.79019
  3.44051    4.39119   8.94      7.89347   …  11.191     3.96647    13.9121
 11.736     10.4064    5.74373   7.07984       1.07578   0.0305362   6.80115
  0.909522   6.33205   3.92323  10.8936        9.1766   10.9841     10.6966
  1.88126    6.49563   8.26233   6.03877      10.6326    8.87685     9.43032
  3.8849    12.7331   13.9156   13.9722       11.4215   11.8548      3.29229
  4.08944    5.76094  10.4867    5.64438   …   6.17229   5.07871     8.91156

### Defining named tuples and accessing their contents

In [4]:
data_set_1 = (x=aq[:, 1], y=aq[:, 2])

(x = [7.785361844152274, 11.628113143874893, 3.047795624523852, 4.121045996997349, 9.534086452568783, 3.440508144849459, 11.73595838992245, 0.9095219008828741, 1.8812600071542145, 3.884895122172503, 4.089437638671869], y = [4.8365315979180545, 8.309958508940783, 9.113154820539648, 1.8229757405863345, 8.84699047882784, 4.391193037691046, 10.406398939127568, 6.332054717370709, 6.495630441337451, 12.733138447949035, 5.760939422369723])

In [10]:
data_set_1[1]

11-element Vector{Float64}:
  7.785361844152274
 11.628113143874893
  3.047795624523852
  4.121045996997349
  9.534086452568783
  3.440508144849459
 11.73595838992245
  0.9095219008828741
  1.8812600071542145
  3.884895122172503
  4.089437638671869

In [11]:
data_set_1.x

11-element Vector{Float64}:
  7.785361844152274
 11.628113143874893
  3.047795624523852
  4.121045996997349
  9.534086452568783
  3.440508144849459
 11.73595838992245
  0.9095219008828741
  1.8812600071542145
  3.884895122172503
  4.089437638671869

Let’s now create a nested named tuple holding our four data sets in the next listing.

In [17]:
data = (
    set1 = (x = aq[:, 1], y = aq[:, 2]), 
    set2 = (x = aq[:, 3], y = aq[:, 4]), 
    set3 = (x = aq[:, 5], y = aq[:, 6]), 
    set4 = (x = aq[:, 7], y = aq[:, 8]), 
    )

    
    

(set1 = (x = [7.785361844152274, 11.628113143874893, 3.047795624523852, 4.121045996997349, 9.534086452568783, 3.440508144849459, 11.73595838992245, 0.9095219008828741, 1.8812600071542145, 3.884895122172503, 4.089437638671869], y = [4.8365315979180545, 8.309958508940783, 9.113154820539648, 1.8229757405863345, 8.84699047882784, 4.391193037691046, 10.406398939127568, 6.332054717370709, 6.495630441337451, 12.733138447949035, 5.760939422369723]), set2 = (x = [9.688497377320946, 8.80444698441519, 2.219797630578099, 4.8658602764395615, 2.09452440859174, 8.939995582012816, 5.743728166421002, 3.9232341647061713, 8.262333887432044, 13.91555920231956, 10.486691971327392], y = [1.359946042534186, 3.8032635657387104, 7.052425482641526, 0.6838732409071597, 9.941201439212062, 7.8934736238470204, 7.079838951036613, 10.893565693522998, 6.038769004793378, 13.972168615771118, 5.644379651651209]), set3 = (x = [13.71271352167529, 3.404346142688194, 5.562832442757952, 8.474402295770965, 9.157025256756754, 4

In [18]:
data.set1.x

11-element Vector{Float64}:
  7.785361844152274
 11.628113143874893
  3.047795624523852
  4.121045996997349
  9.534086452568783
  3.440508144849459
 11.73595838992245
  0.9095219008828741
  1.8812600071542145
  3.884895122172503
  4.089437638671869

### Analyzing Anscombe’s quartet data stored in a named tuple

In [19]:
using Statistics

In [23]:
map(s -> mean(s.x), data)

(set1 = 5.641634933251866, set2 = 7.176788150142229, set3 = 6.794929088693493, set4 = 6.420956336005284)

In the code, we create an anonymous function s -> mean(s.x) that extracts the x
field from a passed NamedTuple and computes its mean. An interesting feature to
note is that the map function is smart enough to return a NamedTuple that keeps the
names of processed fields from the source NamedTuple. Calculation of Pearson’s cor-
relation works similarly:

In [25]:
map(s -> cor(s.x, s.y) , data)

(set1 = 0.30587556329036153, set2 = 0.0696799427252521, set3 = -0.5790658671390951, set4 = -0.48435264229121855)

Finally, let’s fit a linear model for the first data set by using the GLM.jl package. In the
model, the target variable is y, and we have one feature x:

In [29]:
using GLM

In [32]:
model = lm(@formula(y ~ x), data.set1)

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y ~ 1 + x

Coefficients:
───────────────────────────────────────────────────────────────────────
                Coef.  Std. Error     t  Pr(>|t|)  Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────────────
(Intercept)  5.81246     1.70125   3.42    0.0077   1.96397    9.66096
x            0.243512    0.252653  0.96    0.3603  -0.328028   0.815052
───────────────────────────────────────────────────────────────────────

As the last example, let’s use the r2 function from GLM.jl to calculate the coeffi-
cient of determination of our model:

In [34]:
r2(model)

0.09355986021819607

## Understanding composite types and mutability of values in Julia

### COMPOSITE TYPES

The model variable that we worked with in section 4.3.2 has a TableRegression-
Model type that is a composite type—namely, a struct. When doing basic opera-
tions, you will most likely not need to create them on your own, but you will often
encounter them returned by functions from packages.
The TableRegressionModel type is defined in the StatsModels.jl package as fol-
lows:


``` julia
struct TableRegressionModel{M,T} <: RegressionModel
model::M
mf::ModelFrame
mm::ModelMatrix{T}
end
```

At a basic level, you do not need to understand all the details of this definition. What
is important for us now is that the struct defines three fields: model, mf, and mm.
When you get a value having such a type, you can easily access its fields by using a dot
(.), just as for a NamedTuple (in part 2, you will learn that the way the dot operator

In [35]:
model.mf

ModelFrame{NamedTuple{(:y, :x), Tuple{Vector{Float64}, Vector{Float64}}}, LinearModel}(y ~ 1 + x, StatsModels.Schema with 2 entries:
  x => x
  y => y
, (y = [4.8365315979180545, 8.309958508940783, 9.113154820539648, 1.8229757405863345, 8.84699047882784, 4.391193037691046, 10.406398939127568, 6.332054717370709, 6.495630441337451, 12.733138447949035, 5.760939422369723], x = [7.785361844152274, 11.628113143874893, 3.047795624523852, 4.121045996997349, 9.534086452568783, 3.440508144849459, 11.73595838992245, 0.9095219008828741, 1.8812600071542145, 3.884895122172503, 4.089437638671869]), LinearModel)

In [36]:
model.mm

ModelMatrix{Matrix{Float64}}([1.0 7.785361844152274; 1.0 11.628113143874893; … ; 1.0 3.884895122172503; 1.0 4.089437638671869], [1, 2])

In [37]:
model.model

LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:

Coefficients:
──────────────────────────────────────────────────────────────
       Coef.  Std. Error     t  Pr(>|t|)  Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────
x1  5.81246     1.70125   3.42    0.0077   1.96397    9.66096
x2  0.243512    0.252653  0.96    0.3603  -0.328028   0.815052
──────────────────────────────────────────────────────────────


### MUTABILITY OF VALUES

Julia distinguishes between mutable and immutable types. Here is a classification of
selected types encountered so far in this book:

- Immutable—Int, Float64, String, Tuple, NamedTuple, struct
- Mutable—Array (so also Vector and Matrix), Dict, and struct created with mutable keyword added

You might ask how immutable and mutable types differ. The point is that mutable val-
ues can be changed.

This might sound obvious, but the crucial thing is that they can
also be changed by functions to which they are passed. Such side effects can be quite
surprising. Therefore, as discussed in chapter 2, it is crucially important to annotate
the functions that mutate their arguments with the exclamation point suffix (!).

In the first example, we see the difference between calling the unique and unique!
functions on a vector. They both remove duplicates from a collection. The difference
between them is that unique returns a new vector, while unique! works in place:

In [56]:
x = [1, 2, 3, 4, 5, 1, 2, 5]

8-element Vector{Int64}:
 1
 2
 3
 4
 5
 1
 2
 5

In [57]:
unique(x)'

1×5 adjoint(::Vector{Int64}) with eltype Int64:
 1  2  3  4  5

In [62]:
x'

1×5 adjoint(::Vector{Int64}) with eltype Int64:
 1  2  3  4  5

In [59]:
unique!(x)'

1×5 adjoint(::Vector{Int64}) with eltype Int64:
 1  2  3  4  5

In [61]:
x'

1×5 adjoint(::Vector{Int64}) with eltype Int64:
 1  2  3  4  5

The second example is meant to show you that even if your data structure is
immutable, it might contain mutable elements that can be changed by a function. In
this example, we use the empty! function that takes a mutable collection as its argu-
ment and removes all elements stored in it in place:

In summary, stressing what we have already discussed, Julia does not copy data
when passing arguments to a function. If an object is passed to a function and con-
tains a structure (even a nested one) that is mutable, that object can potentially be
mutated by the function. If you want to create a fully independent object when pass-
ing a value to a function, to be sure that the original value is guaranteed not to be
mutated, use the deepcopy function to create it.