In [1]:
# Setting up a custom stylesheet in IJulia
file = open("style.css") # A .css file in the same folder as this notebook file
styl = readall(file) # Read the file
HTML("$styl") # Output as HTML

# Collections

<h2>In this lesson</h2>

- [Introduction](#Introduction)
- [Importing the packages for this lesson](#Importing-the-packages-for-this-lesson)
- [Outcome](#Outcome)
- [Arrays](#Arrays)
    - [Creation](#Creations)
    - [Slicing](#Slicing)
    - [Modification](#Modification)
    - [Comprehension](#Comprehension)
    - [Simple calculations using arrays](#Simple-calculations-using-arrays)
    - [NA](#NA)
- [Tuples](#Tuples)
- [Dictionaries](#Dictionaries)

<hr>
<h2>Introduction</h2>

Collections are simply groups of elements.  These elements can be of different Julia types.  Storing elements as collections is one of the most common tasks in scientific computing.  It allows us to subject these objects to a great variety of manipulations and analyses.  We have already seen how the storage of data point elements from an Ebola epidemic can allow us to create models and plot figures.

The most common collection is an **array**.  Arrays are mutable, which means that they can be altered, greatly enhancing their use as objects that we can use and manipulate.  **Tuples** are also collections of elements, but differ from arrays in fundamental ways.  We will have a look at the differences that make them useful.  The final collection we will discuss is the **dictionary**.  Dictionaries are once again groups of elements, but in this type of collection, each element has a value (as usual) but also, a key.

[Back to the top](#In-this-lesson)

<hr>
<h2>Importing the packages for this lesson</h2>

In [2]:
using DataFrames

[Back to the top](#In-this-lesson)

<hr>
<h2>Outcome</h2>

After successfully completing this lecture, you will be able to:

- Create arrays
- Select certain parts of an array
- Modify the elements in an arrays, i.e. add and delete certain elements
- Use in-built functions on arrays
- Deal with NaN values
- Create and understand the place of tuples
- Create dictionaries

[Back to the top](#In-this-lesson)

<hr>
<h2>Arrays</h2>

### Creation of arrays

To recap what we have learned so far, we can create an arrays by simply separating values with commas and placing them inside of square brackets.  Let's create an array called `array1`.

In [3]:
# Array with three elements
array1 = [1, 2, 3]

3-element Array{Int64,1}:
 1
 2
 3

This would be akin to what we would consider a column vector in mathematics.  We couls also create a row vector by omitting the commas.

In [4]:
array2 = [1 2 3]

1x3 Array{Int64,2}:
 1  2  3

We could transpose `array1`, which would make it similar to array `array2`.  Remember that a transpose action interchanges rows and columns.

In [5]:
# Syntax one
transpose(array1)

1x3 Array{Int64,2}:
 1  2  3

In [6]:
# Syntax 2
array1'

1x3 Array{Int64,2}:
 1  2  3

In [7]:
# Is the transpose equal to the row array
array1' == array2

true

We used integers as element values.  Arrays will take on the type of the element that is *more excompassing*.  In the example below, two values will be of `Int64` type, but the last is of `Float64` type.  The array will return the `Float64` type.

In [8]:
array3 = [1, 2, 3.0]

3-element Array{Float64,1}:
 1.0
 2.0
 3.0

We can create arrays with more than just a single row or column of elements.  This would be akin to a mathematical matrix.  We can even create multidimensional arrays.  We achieve this by nesting our square brackets and playing with commas and semicolons.

In [9]:
# Note the omission of commas between the inner nested square brackets
array4 = [[1, 2, 3] [4, 5, 6] [7, 8, 9]]

3x3 Array{Int64,2}:
 1  4  7
 2  5  8
 3  6  9

In [10]:
# If we want to populate the elements along row first, we use semicolons
array5 = [[1 2 3]; [4 5 6]; [7 8 9]]

3x3 Array{Int64,2}:
 1  2  3
 4  5  6
 7  8  9

In [11]:
# Checking how many elements we have
length(array5)

9

In [12]:
# The order of these elements
for i in 1:length(array5)
    println("Element $(i) is ", array5[i])
end

Element 1 is 1
Element 2 is 4
Element 3 is 7
Element 4 is 2
Element 5 is 5
Element 6 is 8
Element 7 is 3
Element 8 is 6
Element 9 is 9


In [13]:
# Checking the transpose again
array4' == array5

true

Repeating values is easily accomplished with the `repmat([arrayvalues], number of repititions)` functions.

In [14]:
# Repeating the values 1 and 2 along a column
repmat([1, 2], 3)

6-element Array{Int64,1}:
 1
 2
 1
 2
 1
 2

In [15]:
# ...or each value as a row
repmat([1 2], 3)

3x2 Array{Int64,2}:
 1  2
 1  2
 1  2

We can use built-in functions to create arrays.  These can be used in other ways as we will see later.

The `linspace(start value, end value, number of steps)` can be used as an array.

In [16]:
# The values 0 to 10
linspace(0, 10, 11)

linspace(0.0,10.0,11)

To access the actuale values, we use the `collect()` function.

In [17]:
array6 = collect(linspace(0, 10, 11))

11-element Array{Float64,1}:
  0.0
  1.0
  2.0
  3.0
  4.0
  5.0
  6.0
  7.0
  8.0
  9.0
 10.0

The `logspace()` function returns a $ {10}^{i} $ to $ {10}^{f} $ collection of $ 50 $ values.

In [18]:
# i = 2 and f = 3
array7 = collect(logspace(2, 3))

50-element Array{Float64,1}:
  100.0  
  104.811
  109.854
  115.14 
  120.679
  126.486
  132.571
  138.95 
  145.635
  152.642
  159.986
  167.683
  175.751
    ⋮    
  596.362
  625.055
  655.129
  686.649
  719.686
  754.312
  790.604
  828.643
  868.511
  910.298
  954.095
 1000.0  

We can also use the `range(start value, size of steps, number of values to return)` function.

In [19]:
array8 = collect(range(0, 2, 10))

10-element Array{Int64,1}:
  0
  2
  4
  6
  8
 10
 12
 14
 16
 18

We also have the UnitRange / StepRange shortcut with start, step, and stop values.

In [20]:
# Omitting the one will leave a default step-size of 1
array9 = collect(0:1:5)

6-element Array{Int64,1}:
 0
 1
 2
 3
 4
 5

In [21]:
# Just checking the type
typeof(0:1:5)

StepRange{Int64,Int64}

In [22]:
typeof(0:5)

UnitRange{Int64}

Finally, we recap other ways of creating arrays that we have encountered in this course.

In [23]:
# Creating an empty array of three rows and three columns
# Using an abstract type to leave the values empty
array10 = Array(Integer, 3, 3)

3x3 Array{Integer,2}:
 #undef  #undef  #undef
 #undef  #undef  #undef
 #undef  #undef  #undef

In [24]:
# Specifying a concrete (numerical) type will add random value
array11 = Array(Int64, 3, 3)

3x3 Array{Int64,2}:
 4513792816  4513120368  4512989360
 4512989312  4513120368  4599372960
 4513120368  4513120352  4513095808

Lastly in this section we will take a look at reshaping an array.  Given an array of elements, we can simply change its dimensions.

In [25]:
# Remember array8
array8

10-element Array{Int64,1}:
  0
  2
  4
  6
  8
 10
 12
 14
 16
 18

In [26]:
# Chaning it into an array with 2 rows and 5 columns
reshape(array8, 2, 5)

2x5 Array{Int64,2}:
 0  4   8  12  16
 2  6  10  14  18

[Back to the top](#In-this-lesson)

### Slicing

When arrays get to larger and larger size, we may want to select only sections of it, based on some rule or rules.  In this section we recap some of the ways in which this can be achieved and add some new ones.

The `rand()` function returns a random value between $ 0 $ and $ 1 $.  We can also specify values to select from.  In the example below we choose integers in the range $ \left[ 10, 20 \right] $ and create an array with ten rows and five columns.

In [27]:
array12 = rand(10:20, 10, 5)

10x5 Array{Int64,2}:
 17  16  11  17  17
 17  14  10  16  10
 18  13  10  15  20
 12  20  14  10  18
 18  17  17  17  16
 18  18  10  19  14
 15  19  19  14  11
 17  11  19  16  19
 14  13  16  19  11
 16  17  15  11  19

In [28]:
# Selecting all row values in column 2
array12[:, 2]

10-element Array{Int64,1}:
 16
 14
 13
 20
 17
 18
 19
 11
 13
 17

In [29]:
# All row values in columns 2 and 5
array12[:, [2, 5]]

10x2 Array{Int64,2}:
 16  17
 14  10
 13  20
 20  18
 17  16
 18  14
 19  11
 11  19
 13  11
 17  19

In [30]:
# All row values in columns 2, 3, and 4
array12[:, 2:4]

10x3 Array{Int64,2}:
 16  11  17
 14  10  16
 13  10  15
 20  14  10
 17  17  17
 18  10  19
 19  19  14
 11  19  16
 13  16  19
 17  15  11

In [31]:
# Values in rows 2, 4, 6, and in columns 1 and 5
array12[[2, 4, 6], [1, 5]]

3x2 Array{Int64,2}:
 17  10
 12  18
 18  14

In [32]:
# Values in row 1 from column 3 to the last column
array12[1, 3:end]

1x3 Array{Int64,2}:
 11  17  17

Now we take a look at applying some rules.  Note how we use the element-wise logic operator in the form of the dot.  By using the `find()` function we can return the index values.

In [33]:
# Boolean logic (returning only true and false)
array12[:, 1] .> 12

10-element BitArray{1}:
  true
  true
  true
 false
  true
  true
  true
  true
  true
  true

In [34]:
# Returning the index of true values using find()
find(array12[:, 1] .> 12)

9-element Array{Int64,1}:
  1
  2
  3
  5
  6
  7
  8
  9
 10

[Back to the top](#In-this-lesson)

### Modification

Collections of elements in the form of arrays can be made much more useful if we could alter the actual values.  We'll start off by adding elemnts to the end of an existing array.  This can be done with the `push()!` function.

In [3]:
array13 = [1, 2, 3, 4]

4-element Array{Int64,1}:
 1
 2
 3
 4

In [4]:
# Adding the value 5 at the end of the array
push!(array13, 5)

5-element Array{Int64,1}:
 1
 2
 3
 4
 5

We can add a value at the start of an array using the `unshift!()` function.

In [5]:
# Adding the value 0 at the start of the array
unshift!(array13, 0)

6-element Array{Int64,1}:
 0
 1
 2
 3
 4
 5

We can remove the last element with the `pop!()` function.

In [6]:
# Only the removed value will be displayed
pop!(array13)

5

In [7]:
# Checking to see if it is still there
array13

5-element Array{Int64,1}:
 0
 1
 2
 3
 4

We can also remove the first element with the `shift!()` function.

In [8]:
shift!(array13)

0

In [9]:
array13

4-element Array{Int64,1}:
 1
 2
 3
 4

We can change any of the values in the array by referencing its index.

In [10]:
# Chaning the second value from 2 to 1000
array13[2] = 1000

1000

In [11]:
array13

4-element Array{Int64,1}:
    1
 1000
    3
    4

[Back to the top](#In-this-lesson)

### Comprehension

Comprehension allows us to create arrays in a much more compact fashion than using `for` loops.  Let's take a look an example using a UnitRange.

In [12]:
# Creating an empty array
array14 = []

0-element Array{Any,1}

In [13]:
# Filling the array with a for loop
for i in 1:5 # Start at 1, step up by 1, end at 5
    push!(array14, 3 * i) # Take each element and multiply by 3
end

array14

5-element Array{Any,1}:
  3
  6
  9
 12
 15

Now for the more compact syntax using list comprehension.

In [14]:
array15 = [3 * i for i in 1:5]

5-element Array{Int64,1}:
  3
  6
  9
 12
 15

In [15]:
# They are the same
array14 == array15

true

Here are some more examples.

In [16]:
[n^2 for n in 1:3]

3-element Array{Int64,1}:
 1
 4
 9

In [17]:
[a * b for a in 1:3, b in 1:3]

3x3 Array{Int64,2}:
 1  2  3
 2  4  6
 3  6  9

[Back to the top](#In-this-lesson)

### Simple calculations using arrays

Julia has a large set of functions that act on arrays allowing for simple calculations.  Let's have alook at some of them.

In [18]:
# Remeber array 15
array15

5-element Array{Int64,1}:
  3
  6
  9
 12
 15

In [19]:
# Summing the elements of the array
sum(array15)

45

In [20]:
# The average value of the elements in the array
mean(array15)

9.0

In [21]:
# The standard deviation of the values in the array
std(array15)

4.743416490252569

We can perform actions on elements of an array too.  First off, let's take a look at element-wise operations.

In [22]:
array16 = [1, 2, 3]

3-element Array{Int64,1}:
 1
 2
 3

In [23]:
# Simply adding 5 to each element
5 + array16

3-element Array{Int64,1}:
 6
 7
 8

In [24]:
# Multiplying each element by 3
3 * array16

3-element Array{Int64,1}:
 3
 6
 9

In [25]:
# Creating a new array
array17 = [4, 5, 6]

3-element Array{Int64,1}:
 4
 5
 6

In [26]:
array16 + array17

3-element Array{Int64,1}:
 5
 7
 9

Multiplying arrays is done according to the rules of matrix multiplication. This is different than simple wishing to multiply each element in one array with the similarly indexed value in another array.  Multiplying two arrays using `array16 * array17` will lead to this error:
```
LoadError: MethodError: `*` has no method matching *(::Array{Int64,1}, ::Array{Int64,1})
Closest candidates are:
  *(::Any, ::Any, !Matched::Any, !Matched::Any...)
  *{T<:Union{Complex{Float32},Complex{Float64},Float32,Float64},S}(!Matched::Union{DenseArray{T<:Union{Complex{Float32},Complex{Float64},Float32,Float64},2},SubArray{T<:Union{Complex{Float32},Complex{Float64},Float32,Float64},2,A<:DenseArray{T,N},I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}}, ::Union{DenseArray{S,1},SubArray{S,1,A<:DenseArray{T,N},I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}})
  *{TA,TB}(!Matched::Base.LinAlg.AbstractTriangular{TA,S<:AbstractArray{T,2}}, ::Union{DenseArray{TB,1},DenseArray{TB,2},SubArray{TB,1,A<:DenseArray{T,N},I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD},SubArray{TB,2,A<:DenseArray{T,N},I<:Tuple{Vararg{Union{Colon,Int64,Range{Int64}}}},LD}})
  ...
while loading In[68], in expression starting on line 1
```
We need to specify that we need an element-wise operation.

In [27]:
# Using the dot
array16 .* array17

3-element Array{Int64,1}:
  4
 10
 18

In [28]:
# Matrix multiplication using the transpose of array17
array16 * array17'

3x3 Array{Int64,2}:
  4   5   6
  8  10  12
 12  15  18

[Back to the top](#In-this-lesson)

### NA

When dealing with data we often come across empty elements or missing data.  Julia represents this situation with the `NaN` value.

In [29]:
# The NaN type
typeof(NaN)

Float64

In [30]:
# It is a curious beast
NaN == NaN

false

In [31]:
# It poissons arrays
array18 = [1, 2, NaN, 4, 5]

5-element Array{Float64,1}:
   1.0
   2.0
 NaN  
   4.0
   5.0

In [32]:
# The sum of the elements
sum(array18)

NaN

We can simply delete the element or change its value.  In very large arrays they may be difficult to spot, though.

In [33]:
# Finding NaN values
isnan(array18)

5-element Array{Bool,1}:
 false
 false
  true
 false
 false

Using the DataArrays or DataFrames packages allow us to deal with these empty values.

In [34]:
array19 =  @data([1, 2, 3, 4, NA, 7, NA, 2])

8-element DataArrays.DataArray{Int64,1}:
 1  
 2  
 3  
 4  
  NA
 7  
  NA
 2  

In [35]:
array19_clean = dropna(array19)

6-element Array{Int64,1}:
 1
 2
 3
 4
 7
 2

[Back to the top](#In-this-lesson)

<hr>
<h2>Tuples</h2>

Tuples are list of values. They are enclosed in parentheses, `()`. Tuples can be heterogenous and, unlike arrays, are immutable (they cannot be changed).

In [36]:
# creating a tuple of heterogenous types
tup = (1, 2, 3, "hello")

(1,2,3,"hello")

In [37]:
# Type of each element
typeof(tup)

Tuple{Int64,Int64,Int64,ASCIIString}

In [38]:
# For loop to look at value and type of each element
for i in 1:length(tup)
    println(" The value of the tuple at index number $(i) is $(tup[i]) and the type is $(typeof(tup[i])).")
end

 The value of the tuple at index number 1 is 1 and the type is Int64.
 The value of the tuple at index number 2 is 2 and the type is Int64.
 The value of the tuple at index number 3 is 3 and the type is Int64.
 The value of the tuple at index number 4 is hello and the type is ASCIIString.


Each index value can be named.

In [39]:
a, b, c, seven = (1, 3, 5, 7)

(1,3,5,7)

In [40]:
a

1

In [41]:
seven

7

Tuples can also be sliced.

In [42]:
tup2 = (1, 3, 5, 7, 9, 11, 13, 15);

In [43]:
# Last element
tup2[end]

15

In [44]:
# Elements 2, 3, and 4
tup2[2:4]

(3,5,7)

In [45]:
# Reversing the order
tup2[end:-1:1]

(15,13,11,9,7,5,3,1)

Tuples really are immutable. The following line of code, tup2[1] = 5, will result in the error
```
LoadError:
MethodError: `setindex!` has no method matching setindex!(::Tuple{Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64}, ::Int64, ::Int64)
while loading In[173], in expression starting on line 1.
```

We can have tuples of tuples. The indexing of these are a bit different.

In [46]:
tup3 = ((1, 2, 3), 1, 2, (3, 100, 1))

((1,2,3),1,2,(3,100,1))

In [47]:
# Finding values at index 1
tup3[1]

(1,2,3)

In [48]:
# Finding value at index 2 of index 4, which if you count out would be the odd-ball 100
tup3[4][2]

100

[Back to the top](#In-this-lesson)

<hr>
<h2>Dictionaries</h2>

Dictionaries are collections of two element tuples of the form (key, value).

In [50]:
dict1 = Dict(1 => 77, 2 => 66, 3 => 1)

Dict{Int64,Int64} with 3 entries:
  2 => 66
  3 => 1
  1 => 77

In [51]:
# The => is shorthand for the Pair() function
dict1 = Dict(Pair(1,100), Pair(2,200), Pair(3,300))

Dict{Int64,Int64} with 3 entries:
  2 => 200
  3 => 300
  1 => 100

The following is an example of a dynamic dictionary.

In [52]:
dict2 = Dict{Any, Any}(1 => 77, 2 => 66, 3 => "three")

Dict{Any,Any} with 3 entries:
  2 => 66
  3 => "three"
  1 => 77

In [53]:
# We can get a bit crazy
dict3 = Dict{Any, Any}("a" => 1, (2, 3) => "hello")

Dict{Any,Any} with 2 entries:
  (2,3) => "hello"
  "a"   => 1

In cases of keys being characters or strings, we can also use symbol notation, which is a colon, :.

In [54]:
dict4 = Dict(:A => 300, :B => 305, :C => 309)

Dict{Symbol,Int64} with 3 entries:
  :C => 309
  :B => 305
  :A => 300

We can access the lement of a dictionary.

In [55]:
dict4[:B]

305

In case that the keys are not known, it is customary to inlcude and error message to avoid a Julia error. For this, we use the `get()` function.

In [56]:
# Using get() for a key that does NOT exist
# The string is used when a key is not found
get(dict4, :H, "That key does not exist!")

"That key does not exist!"

In [57]:
# Using get() for a key that exist
# The string will not be used, seeing that the key exists
# The value will be returned instead
get(dict4, :B, "That key does not exist!")

305

We can check if either keys or values exist.

In [58]:
# Using in() to check if a (key, value) pair exists
in((:A => 300), dict4)

true

In [59]:
# Using haskey() to check if a key exists
haskey(dict4, :D)

false

It is easy to add to a dictionary.

In [60]:
# Adding a key and value to the dictionary
dict4[:D] = 301

301

In [61]:
dict4

Dict{Symbol,Int64} with 4 entries:
  :C => 309
  :B => 305
  :A => 300
  :D => 301

Dictionaries are mutable, i.e. their values can be changed.

In [62]:
# Changing an existing value
dict4[:C] = 1000
dict4

Dict{Symbol,Int64} with 4 entries:
  :C => 1000
  :B => 305
  :A => 300
  :D => 301

In [63]:
# Using the delete!() function
delete!(dict4, :A)

Dict{Symbol,Int64} with 3 entries:
  :C => 1000
  :B => 305
  :D => 301

Finding more information about a dictionary.

In [64]:
# The keys of a dictionary
keys(dict4)

Base.KeyIterator for a Dict{Symbol,Int64} with 3 entries. Keys:
  :C
  :B
  :D

In [65]:
# The values of a dictionary
values(dict4)

Base.ValueIterator for a Dict{Symbol,Int64} with 3 entries. Values:
  1000
  305
  301

In [66]:
# The length of a dictionary
length(dict4)

3

A dictionary has no size() and no ndims().

In [67]:
for (k, v) in dict4
    println("The key $(k) has a value $(v)")
end

The key C has a value 1000
The key B has a value 305
The key D has a value 301


A for loop can be used to populated a dictionary.

In [68]:
procedure_vals = ["Appendectomy", "Colectomy", "Cholecystectomy"]
procedure_dict = Dict{AbstractString,AbstractString}()

for (s, n) in enumerate(procedure_vals)
    procedure_dict["x_$(s)"] = n
end

In [69]:
procedure_dict

Dict{AbstractString,AbstractString} with 3 entries:
  "x_1" => "Appendectomy"
  "x_2" => "Colectomy"
  "x_3" => "Cholecystectomy"

In [70]:
# Iterating through a dictionary by key,value tuple
for k in keys(procedure_dict)
    println(k, " is ", procedure_dict[k])
end

x_1 is Appendectomy
x_2 is Colectomy
x_3 is Cholecystectomy


In [71]:
# Iterating through a dictionary by key and value
for (k,v) in procedure_dict
    println(k, " is ",v)
end

x_1 is Appendectomy
x_2 is Colectomy
x_3 is Cholecystectomy


A dictionary can be sorted.

In [72]:
dict5 = Dict{AbstractString,Int16}("a" => 1,"b" =>2 ,"c" =>3 ,"d" =>4 ,"e" =>5 ,"f" =>6)

Dict{AbstractString,Int16} with 6 entries:
  "f" => 6
  "c" => 3
  "e" => 5
  "b" => 2
  "a" => 1
  "d" => 4

In [73]:
# Sorting using a for loop
for k in sort(collect(keys(dict5)))
    println("$(k) is $(dict5[k])")
end

a is 1
b is 2
c is 3
d is 4
e is 5
f is 6


In [74]:
# Listing the keys
keys(dict5)

Base.KeyIterator for a Dict{AbstractString,Int16} with 6 entries. Keys:
  "f"
  "c"
  "e"
  "b"
  "a"
  "d"

In [75]:
# Collecting them in an iterable form
# Note that the output is now an array
collect(keys(dict5))

6-element Array{AbstractString,1}:
 "f"
 "c"
 "e"
 "b"
 "a"
 "d"

In [76]:
# Sorting the array
sort(collect(keys(dict5)))

6-element Array{AbstractString,1}:
 "a"
 "b"
 "c"
 "d"
 "e"
 "f"

[Back to the top](#In-this-lesson)