In [None]:
# Setting up a custom stylesheet in IJulia
# New in 0.6
file = open("style.css") # A .css file in the same folder as this notebook file
styl = readstring(file) # Read the file using readstring instead of realall
HTML("$styl") # Output as HTML

# Collections

<h2>In this lesson</h2>

- [Introduction](#Introduction)
- [Importing the packages for this lesson](#Importing-the-packages-for-this-lesson)
- [Outcome](#Outcome)
- [Arrays](#Arrays)
    - [Creation](#Creations)
    - [Slicing](#Slicing)
    - [Modification](#Modification)
    - [Comprehension](#Comprehension)
    - [Simple calculations using arrays](#Simple-calculations-using-arrays)
    - [NA](#NA)
- [Tuples](#Tuples)
- [Dictionaries](#Dictionaries)

<hr>
<h2>Introduction</h2>

Collections are simply groups of elements.  These elements can be of different Julia types.  Storing elements as collections is one of the most common tasks in scientific computing.  It allows us to subject these objects to a great variety of manipulations and analyses.  We have already seen how the storage of data point elements from an Ebola epidemic can allow us to create models and plot figures.

The most common collection is an **array**.  Arrays are mutable, which means that they can be altered, greatly enhancing their use as objects that we can use and manipulate.  **Tuples** are also collections of elements, but differ from arrays in fundamental ways.  We will have a look at the differences that make them useful.  The final collection we will discuss is the **dictionary**.  Dictionaries are once again groups of elements, but in this type of collection, each element has a value (as usual) but also, a key.

[Back to the top](#In-this-lesson)

<hr>
<h2>Importing the packages for this lesson</h2>

In [None]:
using DataFrames

[Back to the top](#In-this-lesson)

<hr>
<h2>Outcome</h2>

After successfully completing this lecture, you will be able to:

- Create arrays
- Select certain parts of an array
- Modify the elements in an arrays, i.e. add and delete certain elements
- Use in-built functions on arrays
- Deal with NaN values
- Create and understand the place of tuples
- Create dictionaries

[Back to the top](#In-this-lesson)

<hr>
<h2>Arrays</h2>

### Creation of arrays

To recap what we have learned so far, we can create an arrays by simply separating values with commas and placing them inside of square brackets.  Let's create an array called `array1`.

In [None]:
# Array with three elements
array1 = [1, 2, 3]

This would be akin to what we would consider a column vector in mathematics.  We couls also create a row vector by omitting the commas.

In [None]:
array2 = [1 2 3]

We could transpose `array1`, which would make it similar to array `array2`.  Remember that a transpose action interchanges rows and columns.

In [None]:
# Syntax one
transpose(array1)

In [None]:
# Syntax 2
array1'

In [None]:
# Is the transpose equal to the row array
array1' == array2

We used integers as element values.  Arrays will take on the type of the element that is *more excompassing*.  In the example below, two values will be of `Int64` type, but the last is of `Float64` type.  The array will return the `Float64` type.

In [None]:
array3 = [1, 2, 3.0]

We can create arrays with more than just a single row or column of elements.  This would be akin to a mathematical matrix.  We can even create multidimensional arrays.  We achieve this by nesting our square brackets and playing with commas and semicolons.

In [None]:
# Note the omission of commas between the inner nested square brackets
array4 = [[1, 2, 3] [4, 5, 6] [7, 8, 9]]

In [None]:
# If we want to populate the elements along row first, we use semicolons
array5 = [[1 2 3]; [4 5 6]; [7 8 9]]

In [None]:
# Checking how many elements we have
length(array5)

In [None]:
# The order of these elements
for i in 1:length(array5)
    println("Element $(i) is ", array5[i])
end

In [None]:
# Checking the transpose again
array4' == array5

Repeating values is easily accomplished with the `repmat([arrayvalues], number of repititions)` functions.

In [None]:
# Repeating the values 1 and 2 along a column
repmat([1, 2], 3)

In [None]:
# ...or each value as a row
repmat([1 2], 3)

We can use built-in functions to create arrays.  These can be used in other ways as we will see later.

The `linspace(start value, end value, number of steps)` can be used as an array.

In [None]:
# The values 0 to 10
linspace(0, 10, 11)

To access the actuale values, we use the `collect()` function.

In [None]:
array6 = collect(linspace(0, 10, 11))

The `logspace()` function returns a $ {10}^{i} $ to $ {10}^{f} $ collection of $ 50 $ values.

In [None]:
# i = 2 and f = 3
array7 = collect(logspace(2, 3))

We can also use the `range(start value, size of steps, number of values to return)` function.

In [None]:
array8 = collect(range(0, 2, 10))

We also have the UnitRange / StepRange shortcut with start, step, and stop values.

In [None]:
# Omitting the one will leave a default step-size of 1
array9 = collect(0:1:5)

In [None]:
# Just checking the type
typeof(0:1:5)

In [None]:
typeof(0:5)

Finally, we recap other ways of creating arrays that we have encountered in this course.

In [None]:
# Creating an empty array of three rows and three columns
# Using an abstract type to leave the values empty
array10 = Array{Integer}(3, 3)

In [None]:
# Specifying a concrete (numerical) type will add random value
array11 = Array{Int64}(3, 3)

Lastly in this section we will take a look at reshaping an array.  Given an array of elements, we can simply change its dimensions.

In [None]:
# Remember array8
array8

In [None]:
# Chaning it into an array with 2 rows and 5 columns
reshape(array8, 2, 5)

[Back to the top](#In-this-lesson)

### Slicing

When arrays get to larger and larger size, we may want to select only sections of it, based on some rule or rules.  In this section we recap some of the ways in which this can be achieved and add some new ones.

The `rand()` function returns a random value between $ 0 $ and $ 1 $.  We can also specify values to select from.  In the example below we choose integers in the range $ \left[ 10, 20 \right] $ and create an array with ten rows and five columns.

In [None]:
array12 = rand(10:20, 10, 5)

In [None]:
# Selecting all row values in column 2
array12[:, 2]

In [None]:
# All row values in columns 2 and 5
array12[:, [2, 5]]

In [None]:
# All row values in columns 2, 3, and 4
array12[:, 2:4]

In [None]:
# Values in rows 2, 4, 6, and in columns 1 and 5
array12[[2, 4, 6], [1, 5]]

In [None]:
# Values in row 1 from column 3 to the last column
array12[1, 3:end]

Now we take a look at applying some rules.  Note how we use the element-wise logic operator in the form of the dot.  By using the `find()` function we can return the index values.

In [None]:
# Boolean logic (returning only true and false)
array12[:, 1] .> 12

In [None]:
# Returning the index of true values using find()
find(array12[:, 1] .> 12)

[Back to the top](#In-this-lesson)

### Modification

Collections of elements in the form of arrays can be made much more useful if we could alter the actual values.  We'll start off by adding elemnts to the end of an existing array.  This can be done with the `push()!` function.

In [None]:
array13 = [1, 2, 3, 4]

In [None]:
# Adding the value 5 at the end of the array
push!(array13, 5)

We can add a value at the start of an array using the `unshift!()` function.

In [None]:
# Adding the value 0 at the start of the array
unshift!(array13, 0)

We can remove the last element with the `pop!()` function.

In [None]:
# Only the removed value will be displayed
pop!(array13)

In [None]:
# Checking to see if it is still there
array13

We can also remove the first element with the `shift!()` function.

In [None]:
shift!(array13)

In [None]:
array13

We can change any of the values in the array by referencing its index.

In [None]:
# Chaning the second value from 2 to 1000
array13[2] = 1000

In [None]:
array13

[Back to the top](#In-this-lesson)

### Comprehension

Comprehension allows us to create arrays in a much more compact fashion than using `for` loops.  Let's take a look an example using a UnitRange.

In [None]:
# Creating an empty array
array14 = []

In [None]:
# Filling the array with a for loop
for i in 1:5 # Start at 1, step up by 1, end at 5
    push!(array14, 3 * i) # Take each element and multiply by 3
end

array14

Now for the more compact syntax using list comprehension.

In [None]:
array15 = [3 * i for i in 1:5]

In [None]:
# They are the same
array14 == array15

Here are some more examples.

In [None]:
[n^2 for n in 1:3]

In [None]:
[a * b for a in 1:3, b in 1:3]

[Back to the top](#In-this-lesson)

### Simple calculations using arrays

Julia has a large set of functions that act on arrays allowing for simple calculations.  Let's have alook at some of them.

In [None]:
# Remeber array 15
array15

In [None]:
# Summing the elements of the array
sum(array15)

In [None]:
# The average value of the elements in the array
mean(array15)

In [None]:
# The standard deviation of the values in the array
std(array15)

We can perform actions on elements of an array too.  First off, let's take a look at element-wise operations.

In [None]:
array16 = [1, 2, 3]

In [None]:
# Simply adding 5 to each element
5 + array16

In [None]:
# Multiplying each element by 3
3 * array16

In [None]:
# Creating a new array
array17 = [4, 5, 6]

In [None]:
array16 + array17

Multiplying arrays is done according to the rules of matrix multiplication. This is different than simple wishing to multiply each element in one array with the similarly indexed value in another array.  Multiplying two arrays using `array16 * array17` will lead to this error:
```
LoadError: MethodError: `*` has no method matching *(::Array{Int64,1}, ::Array{Int64,1})
Closest candidates are:
  *(::Any, ::Any, !Matched::Any, !Matched::Any...)
  *{T<:Union{Complex{Float32},Complex{Float64},Float32,Float64},S}(!Matched::Union{DenseArray{T<:Union{Complex{Float32},Complex{Float64},Float32,Float64},2},SubArray{T<:Union{Complex{Float32},Complex{Float64},Float32,Float64},2,A<:DenseArray{T,N},I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}, ::Union{DenseArray{S,1},SubArray{S,1,A<:DenseArray{T,N},I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}})
  *{TA,TB}(!Matched::Base.LinAlg.AbstractTriangular{TA,S<:AbstractArray{T,2}}, ::Union{DenseArray{TB,1},DenseArray{TB,2},SubArray{TB,1,A<:DenseArray{T,N},I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD},SubArray{TB,2,A<:DenseArray{T,N},I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}})
  ...
while loading In[68], in expression starting on line 1
```
We need to specify that we need an element-wise operation.

In [None]:
# Using the dot
array16 .* array17

In [None]:
# Matrix multiplication using the transpose of array17
array16 * array17'

[Back to the top](#In-this-lesson)

### NA

When dealing with data we often come across empty elements or missing data.  Julia represents this situation with the `NaN` value.

In [None]:
# The NaN type
typeof(NaN)

In [None]:
# It is a curious beast
NaN == NaN

In [None]:
# It poissons arrays
array18 = [1, 2, NaN, 4, 5]

In [None]:
# The sum of the elements
sum(array18)

We can simply delete the element or change its value.  In very large arrays they may be difficult to spot, though.

In [None]:
# Finding NaN values
isnan.(array18)

Using the DataArrays or DataFrames packages allow us to deal with these empty values.

In [None]:
array19 =  @data([1, 2, 3, 4, NA, 7, NA, 2])

In [None]:
array19_clean = dropna(array19)

[Back to the top](#In-this-lesson)

<hr>
<h2>Tuples</h2>

Tuples are list of values. They are enclosed in parentheses, `()`. Tuples can be heterogenous and, unlike arrays, are immutable (they cannot be changed).

In [None]:
# creating a tuple of heterogenous types
tup = (1, 2, 3, "hello")

In [None]:
# Type of each element
typeof(tup)

In [None]:
# For loop to look at value and type of each element
for i in 1:length(tup)
    println(" The value of the tuple at index number $(i) is $(tup[i]) and the type is $(typeof(tup[i])).")
end

Each index value can be named.

In [None]:
a, b, c, seven = (1, 3, 5, 7)

In [None]:
a

In [None]:
seven

Tuples can also be sliced.

In [None]:
tup2 = (1, 3, 5, 7, 9, 11, 13, 15);

In [None]:
# Last element
tup2[end]

In [None]:
# Elements 2, 3, and 4
typeof(tup2[2:4])

In [None]:
# Reversing the order
tup2[end:-1:1]

Tuples really are immutable. The following line of code, tup2[1] = 5, will result in the error
```
LoadError:
MethodError: `setindex!` has no method matching setindex!(::Tuple{Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64}, ::Int64, ::Int64)
while loading In[173], in expression starting on line 1.
```

We can have tuples of tuples. The indexing of these are a bit different.

In [None]:
tup3 = ((1, 2, 3), 1, 2, (3, 100, 1))

In [None]:
# Finding values at index 1
tup3[1]

In [None]:
# Finding value at index 2 of index 4, which if you count out would be the odd-ball 100
tup3[4][2]

[Back to the top](#In-this-lesson)

<hr>
<h2>Dictionaries</h2>

Dictionaries are collections of two element tuples of the form (key, value).

In [None]:
dict1 = Dict(1 => 77, 2 => 66, 3 => 1)

In [None]:
# The => is shorthand for the Pair() function
dict1 = Dict(Pair(1,100), Pair(2,200), Pair(3,300))

The following is an example of a dynamic dictionary.

In [None]:
dict2 = Dict{Any, Any}(1 => 77, 2 => 66, 3 => "three")

In [None]:
# We can get a bit crazy
dict3 = Dict{Any, Any}("a" => 1, (2, 3) => "hello")

In cases of keys being characters or strings, we can also use symbol notation, which is a colon, :.

In [None]:
dict4 = Dict(:A => 300, :B => 305, :C => 309)

We can access the lement of a dictionary.

In [None]:
dict4[:B]

In case that the keys are not known, it is customary to inlcude and error message to avoid a Julia error. For this, we use the `get()` function.

In [None]:
# Using get() for a key that does NOT exist
# The string is used when a key is not found
get(dict4, :H, "That key does not exist!")

In [None]:
# Using get() for a key that exist
# The string will not be used, seeing that the key exists
# The value will be returned instead
get(dict4, :B, "That key does not exist!")

We can check if either keys or values exist.

In [None]:
# Using in() to check if a (key, value) pair exists
in((:A => 300), dict4)

In [None]:
# Using haskey() to check if a key exists
haskey(dict4, :D)

It is easy to add to a dictionary.

In [None]:
# Adding a key and value to the dictionary
dict4[:D] = 301

In [None]:
dict4

Dictionaries are mutable, i.e. their values can be changed.

In [None]:
# Changing an existing value
dict4[:C] = 1000
dict4

In [None]:
# Using the delete!() function
delete!(dict4, :A)

Finding more information about a dictionary.

In [None]:
# The keys of a dictionary
keys(dict4)

In [None]:
# The values of a dictionary
values(dict4)

In [None]:
# The length of a dictionary
length(dict4)

A dictionary has no size() and no ndims().

In [None]:
for (k, v) in dict4
    println("The key $(k) has a value $(v)")
end

A for loop can be used to populated a dictionary.

In [None]:
procedure_vals = ["Appendectomy", "Colectomy", "Cholecystectomy"]
procedure_dict = Dict{AbstractString,AbstractString}()

for (s, n) in enumerate(procedure_vals)
    procedure_dict["x_$(s)"] = n
end

In [None]:
procedure_dict

In [None]:
# Iterating through a dictionary by key,value tuple
for k in keys(procedure_dict)
    println(k, " is ", procedure_dict[k])
end

In [None]:
# Iterating through a dictionary by key and value
for (k,v) in procedure_dict
    println(k, " is ",v)
end

A dictionary can be sorted.

In [None]:
dict5 = Dict{AbstractString,Int16}("a" => 1,"b" =>2 ,"c" =>3 ,"d" =>4 ,"e" =>5 ,"f" =>6)

In [None]:
# Sorting using a for loop
for k in sort(collect(keys(dict5)))
    println("$(k) is $(dict5[k])")
end

In [None]:
# Listing the keys
keys(dict5)

In [None]:
# Collecting them in an iterable form
# Note that the output is now an array
collect(keys(dict5))

In [None]:
# Sorting the array
sort(collect(keys(dict5)))

[Back to the top](#In-this-lesson)