## Collections - vectors and matrices

Being able to work efficiently with structures that allow us to have a collection of related data is an essential skill in order to perform meaningful analysis with Julia. We already seen vectors. In the following we will enrich our knowledge about vectors and familiarize ourselfes with matrices.

Without getting into too much details vectors have a single dimension while matrices have two dimensions. With arrays you represent collections of data that have 3 or more dimensions. In this course we will see only vectors and matrices as those are by far the most common ones. 

Let us revisit vectors and introduce some additional functionality. Taking the opportunity at the following I stress differences between Julia and Python.

In [None]:
fish_weights = [100, 85, 96.5, 102]

4-element Vector{Float64}:
 100.0
  85.0
  96.5
 102.0

Equivalently you could do

In [None]:
Float64[100, 85, 96.5, 102]

4-element Vector{Float64}:
 100.0
  85.0
  96.5
 102.0

As you might have noticed even though we defined a horizontal vector we got back a columnar one. That's the usual convention across different programming languages.

Now let's how some built-in Julia functions that are useful with vectors and matrices.

In [None]:
size(fish_weights)

(4,)

In [None]:
size(fish_weights,1)

4

In [None]:
size(fish_weights,2)

1

In [None]:
length(fish_weights)

4

Compared to Python indexing in Julia starts from 1. In R indexing also starts from 1

In [None]:
fish_weights[1]

100.0

Equivalently

In [None]:
fish_weights[begin]

100.0

The following is convenient when we are dealing with large vectors.

In [None]:
fish_weights[end]

102.0

In [None]:
fish_weights[end-1]

96.5

Here it's good to stress a difference between Python where `-`and a number can be used to slice elements from the end. In Julia you can do it only in combination with `end`as the code above shows.

Sometimes its good to know what type of data our vector has. We can do that as follows.

In [None]:
eltype(fish_weights)

Float64

A important difference to be aware of between Python and Julia when working with vectors and matrices lies on the fact that to use efficiently those structures you don't need any specialized packages. For example in Python one would have to rely on `Numpy` to do efficiently linear algebra. On the other hand in Julia both vectors and matrices are part of the core of Julia. And if you haven't guessed computations are really fast with Julia!

Slicing vectors in Julia is bit different compared to Python. In Julia both the beginning and end of a range are inclusive.

In [None]:
fish_weights[1:3]

3-element Vector{Float64}:
 100.0
  85.0
  96.5

Julia has builtin functions that allow creating vectors populated with specified values

In [None]:
zeros(Int64,6)

6-element Vector{Int64}:
 0
 0
 0
 0
 0
 0

In [None]:
ones(Int64,5)

5-element Vector{Int64}:
 1
 1
 1
 1
 1

We can define matrices on a similar way. The key differences here are:
* We use space to separate elements
* We use `;`to signify the end of a row

In [None]:
mat_weights = [10.2 11 12.5; 23 9.3 19; 8.5 13 10.3]

3×3 Matrix{Float64}:
 10.2  11.0  12.5
 23.0   9.3  19.0
  8.5  13.0  10.3

We can estimate summary statistics from vectors and matrices as follows

In [None]:
using Statistics

In [None]:
mean(mat_weights)

12.977777777777778

We can estimate the mean row wise

In [None]:
mean(mat_weights; dims=1)

1×3 Matrix{Float64}:
 13.9  11.1  13.9333

Or column wise

In [None]:
mean(mat_weights; dims=2)

3×1 Matrix{Float64}:
 11.233333333333334
 17.099999999999998
 10.6

Or equally

In [None]:
map(mean, eachcol(mat_weights))

3-element Vector{Float64}:
 13.9
 11.1
 13.933333333333332

In [None]:
map(mean, eachrow(mat_weights))

3-element Vector{Float64}:
 11.233333333333334
 17.099999999999998
 10.6

At this point we can introduce an additional way of performing loops in Julia using comprehensions. Similar to Python.

In [None]:
[mean(col) for col in eachcol(mat_weights)]

3-element Vector{Float64}:
 13.9
 11.1
 13.933333333333332

Keep in mind that the above is not restricted to matrices only but can be used in any collection structure. We will see more examples across the course.

Let's make a 2nd matrix and perform some calculations

In [None]:
more_weights = [21 12 32; 51 29 11; 18 39 42]

3×3 Matrix{Int64}:
 21  12  32
 51  29  11
 18  39  42

Let's see some simple linear algebra below

In [None]:
mat_weights + more_weights

3×3 Matrix{Float64}:
 31.2  23.0  44.5
 74.0  38.3  30.0
 26.5  52.0  52.3

In [None]:
mat_weights - more_weights

3×3 Matrix{Float64}:
 -10.8   -1.0  -19.5
 -28.0  -19.7    8.0
  -9.5  -26.0  -31.7

In [None]:
mat_weights * more_weights

3×3 Matrix{Float64}:
 1000.2   928.9   972.4
 1299.3  1286.7  1636.3
 1026.9   880.7   847.6

If you haven't noticed the previous performed matrix multiplication and not the elementwise product as one with a background in R or Python might have expected.

If we wanted the elementwise product we would have to use to following syntax. More info about this crypting `.`comes later!

In [None]:
mat_weights .* more_weights

3×3 Matrix{Float64}:
  214.2  132.0  400.0
 1173.0  269.7  209.0
  153.0  507.0  432.6

If we want to perform more advanced matrix algebra calculations we would have to load specialized packages as in below.

In [None]:
using LinearAlgebra

Trace of a matrix

In [None]:
tr(mat_weights)

29.8

Inverse of a matrix

In [None]:
mat_weights_inv = inv(mat_weights)

3×3 Matrix{Float64}:
 -0.400415   0.130285     0.245609
 -0.199665  -0.00315121   0.248125
  0.582444  -0.10354     -0.418766

In [None]:
mat_weights * mat_weights_inv

3×3 Matrix{Float64}:
 1.0          4.44089e-16   0.0
 1.77636e-15  1.0          -8.88178e-16
 0.0          4.44089e-16   1.0

If we wanted to esitmated correlations between corresponding columns of the two matrices above

In [None]:
[cor(mat_weights[:,i], more_weights[:,i]) for i in 1:size(mat_weights,2)]

3-element Vector{Float64}:
  0.9996837815346159
  0.40940016483835484
 -0.9971210441584166

Simple linear regression `y = X x B`

In [None]:
X = [ones(2) [12, 8]]

2×2 Matrix{Float64}:
 1.0  12.0
 1.0   8.0

In [None]:
y = [0.2, 0.25]

2-element Vector{Float64}:
 0.2
 0.25

In [None]:
B = X \ y

2-element Vector{Float64}:
  0.35
 -0.012499999999999997

In [None]:
X * B

2-element Vector{Float64}:
 0.2
 0.25

### Vectorised operations

In [None]:
fish_weights

4-element Vector{Float64}:
 100.0
  85.0
  96.5
 102.0

Let's say we want to add `2`to each element of the vector above. In `R`for example we could accomplish it as follows. However...

In [None]:
fish_weights + 2

MethodError: MethodError: no method matching +(::Vector{Float64}, ::Int64)
For element-wise addition, use broadcasting with dot syntax: array .+ scalar
The function `+` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  +(::Any, ::Any, !Matched::Any, !Matched::Any...)
   @ Base operators.jl:596
  +(!Matched::Base.CoreLogging.LogLevel, ::Integer)
   @ Base logging/logging.jl:132
  +(!Matched::Complex{Bool}, ::Real)
   @ Base complex.jl:323
  ...


Julia is a bit special on this aspect. The solution is simple but bit `cryptic`at first sight.

In [None]:
f = fish_weights .+ 2

4-element Vector{Float64}:
 102.0
  87.0
  98.5
 104.0

In [None]:
f

4-element Vector{Float64}:
 102.0
  87.0
  98.5
 104.0

There are a few exceptions to the concept above. As we see below. However, in general when we want to perform calculations of this sort we most often need to use the `.`operator. 

In [None]:
fish_weights * 2

4-element Vector{Float64}:
 200.0
 170.0
 193.0
 204.0

Similarly when we want to apply a function to a collection of values we follow the same syntax. That's regardless of whether it's built in fuction or a function we defined ourselves.

In [None]:
function expontiate(x, power=2)
    return x^power
end

expontiate (generic function with 2 methods)

Let's try and use this function on an array

In [None]:
fish_weights

4-element Vector{Float64}:
 100.0
  85.0
  96.5
 102.0

In [None]:
expontiate(fish_weights)

MethodError: MethodError: no method matching ^(::Vector{Float64}, ::Int64)
The function `^` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  ^(!Matched::Regex, ::Integer)
   @ Base regex.jl:895
  ^(!Matched::BigFloat, ::Union{Int16, Int32, Int64, Int8})
   @ Base mpfr.jl:717
  ^(!Matched::BigFloat, ::Integer)
   @ Base mpfr.jl:729
  ...


In [None]:
expontiate.(fish_weights)

4-element Vector{Float64}:
 10000.0
  7225.0
  9312.25
 10404.0

## Exercises

### Exercise 1

Create a vector of 10 integers. Expontiate each element to the power of 3.

### Exercise 2

Create a matrix `X` that has 4 rows and 3 columns populated by floats. Create a second matrix `Y` that contains the square roots of `X`.

### Exercise 3

Using matrix `X` from exercise 2 estimate the geometric mean of each column and each row separately.

### Exercise 4

Write a function that takes as arguments a matrix and a vector and estimates the R_square also know as coefficient of determination. The R_square is a very common metric when fitting a linear model providing an indication as to how well the recorded variable(s) or features that we have information about can explain the variance of a variable we are interested in gaining insights about e.g. for prediction purposes.  R_square is estimated by subtracting from 1 the ratio of the sums of squares of the residuals divided by the total sums of squares (https://en.wikipedia.org/wiki/Coefficient_of_determination). In the fucntion we can consider the vector as the response variable and the matrix containing the features. Keep in mind that the coefficient of determination has a lot of limitations and generally limited value in practical applications, but that is more of a topic for a statistics course. 