# Data Containers

This notebook shows how to combine data into different types of "containers" (arrays, dictionaries, tuples, ...) inside your program.

## Load Packages and Extra Functions

In [1]:
using Dates

include("printmat.jl")

printwhere (generic function with 1 method)

# Arrays

need no introduction. They are used everywhere in finance and statistics/econometrics.

## Vectors, Matrices and High-dimensional Arrays

can be created in many ways: the code below demonstrates just a few of them. See the chapter on Arrays for (many) more details.

To access an array element, just do `A[2]` or simlarly. Also, you can change an array element as in `B[2,1] = -999.`

Notice that `D = [A B]` creates an independent copy, so later changing `B` does not affect `D`. However, if we define `E = B`, then a change of `B` will affect both itself and `E`.

In [2]:
A = [100,101]           #a vector
display(A)

B = [1 2;               #a matrix
     0 10]
display(B)

C = rand(4,3,2)         #a 4x3x3 array
display(C)

D = [A B]               #a 2x3 matrix

2-element Array{Int64,1}:
 100
 101

2×2 Array{Int64,2}:
 1   2
 0  10

4×3×2 Array{Float64,3}:
[:, :, 1] =
 0.900772  0.82797   0.564876
 0.30227   0.225969  0.981435
 0.426516  0.023429  0.0455218
 0.630349  0.543657  0.92544

[:, :, 2] =
 0.343813  0.592027  0.338774
 0.609085  0.411707  0.525685
 0.524346  0.443655  0.329084
 0.987935  0.671677  0.326064

2×3 Array{Int64,2}:
 100  1   2
 101  0  10

In [3]:
println("A[2] is ",A[2])          #access an element

B[2,1] = -999                     #change an element
println("\nB is now")
printmat(B)

A[2] is 101

B is now
         1         2
      -999        10



In [4]:
println("\nD is not affected")
display(D)               #D is not changed when B is

2×3 Array{Int64,2}:
 100  1   2
 101  0  10


D is not affected


## Arrays of Arrays (or other types)

You can store very different things (a mixture of numbers, matrices, strings) in an array. For instance, if `a` is a vector, `str` is a string and `C` is a matrix, then `x = [a,str,C]` puts them into a vector.

If you later change `C` then it will affect `x` (discussed at the end of the notebook).

In [5]:
a   = 1:10
str = "Hazel"
C   = [11 12;21 22]
x  = [a,str,C]      #element 1 of x is x1

foreach(display,x)   #loops over the elements of x

1:10

"Hazel"

2×2 Array{Int64,2}:
 11  12
 21  22

# Tuples and Named Tuples

are very useful for collecting very different types of data (a number, a string, and a couple of vectors, say). 

Once created, you cannot change tuples (they are immutable), although *changing elements of an array* that belongs to the tuple will affect the tuple too.

Tuples are often used as inputs or outputs of functions.

In [6]:
a   = 1:10
str = "Hazel"
C   = [11 12;21 22]

t = (a,str,C)         #a tuple
display(t)

nt = (a=a,str=str,C=C)    #a named tuple, (a2=a,str2=str,C2=C) would also work
display(nt)
display(nt.C)

nt_ = (;a,str,C)          #works in Julia >= 1.5.0, names are given by variables
display(nt_)

(1:10, "Hazel", [11 12; 21 22])

(a = 1:10, str = "Hazel", C = [11 12; 21 22])

2×2 Array{Int64,2}:
 11  12
 21  22

(a = 1:10, str = "Hazel", C = [11 12; 21 22])

In [7]:
println("t[1] is ",t[1])
#t[1] = -999                        #cannot change the tuple, uncomment to get an error

(a2,str2,C2) = nt                   #extract the tuple into variables ("destructuring")
println("a2 and str2 are: $a2 $str2")

t[1] is 1:10
a2 and str2 are: 1:10 Hazel


# Dictionaries

is a very flexible way to collect different types of data. Dictionaries can (in contrast to tuples) be changed. Also, changing elements of an array that belongs to the dictionary will affect the dictionary too.

A dictionary is organised as (key,value) pairs, where the key is the name of the element. You can loop over the elements (see below) and also change/add elements in a loop.

In [8]:
a   = 1:10
str = "Hazel"
C   = [11 12;21 22]

d = Dict(:a=>a,:str=>str,:C=>C)         #dictionary, "a" instead of :a works too

println("d[:C] is ",d[:C])

d[:b] = -999                  #can change an element

d[:verse2] = "Stardust"            #can add an element

d[:C] is [11 12; 21 22]


"Stardust"

In [9]:
display(d)

Dict{Symbol,Any} with 5 entries:
  :a      => 1:10
  :b      => -999
  :verse2 => "Stardust"
  :str    => "Hazel"
  :C      => [11 12; 21 22]

In [10]:
for (key,value) in d                #loop over a dictionary
    println("$key: $value")
end

a: 1:10
b: -999
verse2: Stardust
str: Hazel
C: [11 12; 21 22]


# Your Own Tailor Made Data Type

It is sometime conventient to define your own `struct` as a container. The `struct` command creates an immutable type (you cannot change it, except for elements of arrays that below to it). There is also a `mutable struct` approach.

In [11]:
a   = 1:10
str = "Hazel"
C   = [11 12;21 22]

struct MyType            #change to mutable struct to be able to change it later. 
   x                     #can be anything
   s::String             #has to be a String
   z::Array{Float64}     #has to be an Array of Float64 numbers
end

x1 = MyType(a,str,C)    #has to specify all arguments

println("x1: ",x1)
println("x1.s: ",x1.s)

#x1 = MyType(1:10,10,[1.0;2])    #error since 10 is not a string
#x1.x = 3         #cannot change a MyTpe, uncomment to get an error

x1: MyType(1:10, "Hazel", [11.0 12.0; 21.0 22.0])
x1.s: Hazel


### DataFrames and Other Things

See [DataFrames.jl](https://juliadata.github.io/DataFrames.jl/stable/) for how to work with DataFrames.

# A Potential Pitfall when Using an Array in another Data Container

Suppose create an array of arrays  (or a tuple, dictionary) called `y`, and that the array `C` is one of the elements.

If you later change *elements* of `C` then it will affect `y` as well (and vice versa). This happens with *arrays*, since they are designed to conserve memory space. If you want an independent copy, do `copy(C)` instead of just `C`. 

In contrast, if you change the shape of `C` then it will *not* affect `y`.

Also, the struct `MyType` will not change even if `C` does.

In [1]:
a   = 1:10
str = "Hazel"
C   = [11 12;21 22]

x = [a,str,C]
t = (a,str,C)
d = Dict(:a=>a,:str=>str,:C=>C)
e =  MyType(a,str,C)

C[1,1] = -999                  #changing an element of C affects x,t,d, but not e

display(x)
display(t)
display(d)
display(e)

LoadError: UndefVarError: MyType not defined

In [13]:
C = 0               #changing the shape of C does not affect x,t,d
display(t)

(1:10, "Hazel", [-999 12; 21 22])