# __Chapter 2: Data Types and Structures__

<br>
Tyler J. Brough <br>
Last Update: March 2, 2021 <br>
<br>
<br>

$$
\bar{x} = \frac{1}{N}\sum_{i=1}^{N} x_{i}
$$

## 2.1 Simple Types (Scalar)

<br>

The built-in basic data types are the following:

* Single quotes produce a `char`

<br>

In [1]:
x = 'a'

'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

In [2]:
typeof(x)

Char

<br>

* Double quotes produce a `String`

<br>

In [3]:
x = "a"
typeof(x)

String

<br>

* The Boolean type takes values `true` and `false`

<br>

In [4]:
x = true
typeof(x)

Bool

<br>

* The "default" integer type is `Int64`

<br>

In [5]:
x = 123
typeof(x)

Int64

<br>

* The "default" float type is `Float64`

<br>

In [6]:
x = 123.
typeof(x)

Float64

<br>

* There are other types, such as `Complex{T}`

<br>

In [7]:
a =  + 2im
typeof(a)

Complex{Int64}

<br>

* And `Rational{T}` as well

<br>

In [8]:
a = 2 // 3
typeof(a)

Rational{Int64}

In [9]:
a.num

2

In [10]:
a.den

3

In [11]:
print("What gets printed for a Rational type: $(a)")

What gets printed for a Rational type: 2//3

<br>

* We can also use type annotations as well:

<br>

In [12]:
## Have to put this in a function because type annotations are not allowed for global variables
function thetypes()
    a::Int64 = 123
    println("a is of type: $(typeof(a))")
    
    b::Float64 = 123.
    println("b is of type: $(typeof(b))")
    
    c::Char = 'a'
    println("c is of type: $(typeof(c))")
    
    d::String = "a"
    println("d is of type: $(typeof(d))")
end

thetypes (generic function with 1 method)

In [13]:
thetypes()

a is of type: Int64
b is of type: Float64
c is of type: Char
d is of type: String


<br>
<br>

#### 2.1.1 Basic Mathematic Operations

All standard basic mathematical arithmetic operations are supported in the obvious way:

In [16]:
1 + 2 # addition ## in lisp we would do: `+(1, 2)` (this is prefix notation)

3

<br>

* The `+` sign is called the _operator_

* `1, 2` are called the operands (the things that an operator works on)

<br>

In [17]:
# this should raise an exception

19 + "c"

LoadError: MethodError: no method matching +(::Int64, ::String)
Closest candidates are:
  +(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:538
  +(::T, !Matched::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} at int.jl:86
  +(::Union{Int16, Int32, Int64, Int8}, !Matched::BigInt) at gmp.jl:531
  ...

In [18]:
4 - 2 # subtraction

2

In [19]:
3 * 2 # multiplication

6

In [22]:
12 / 4 # division

3.0

In [23]:
a = 12
typeof(a)

Int64

In [24]:
b = 4
typeof(b)

Int64

In [25]:
c = a / b
typeof(c)

Float64

In [26]:
typeof(a)

Int64

In [27]:
2 ^ 5 # exponentiation

32

In [28]:
exp(1) # the natural exponential function

2.718281828459045

In [29]:
ℯ 

ℯ = 2.7182818284590...

In [30]:
ℯ # \euler + TAB

ℯ = 2.7182818284590...

In [31]:
MathConstants.e # another way to do the natural exp

ℯ = 2.7182818284590...

In [32]:
12 ÷ 2 # division can also be done with \div + TAB

6

In [34]:
12 ÷ 5

2

In [36]:
div(12, 5)

2

In [37]:
3 % 2 # remainder (modulus)

1

In [38]:
π # \pi + TAB

π = 3.1415926535897...

In [39]:
MathConstants.pi

π = 3.1415926535897...

In [40]:
σ = 1.0 # \sigma + TAB

1.0

In [41]:
θ = 0.75 # \theta + TAB

0.75

In [42]:
ρ = -1 # \rho + TAB

-1

In [43]:
λ = 7.0 # \lambda + TAB

7.0

#### __2.1.2 Strings__

The `String` type in Julia can be seen in some ways as a specialized array of individual chars. Unlike arrays, strings are immutable (`a="abc"; a[2] = 'B'` would raise an error). 

<br>

A string on a single row can be created using a single pair of double quotes, while a string on multiple rows can use triple quotes: 

In [44]:
a = "a string"

"a string"

In [45]:
typeof(a)

String

In [46]:
b = "a string\non multiple rows\n"

"a string\non multiple rows\n"

In [47]:
print(b)

a string
on multiple rows


In [48]:
c = """
a string
on multiple rows
"""

"a string\non multiple rows\n"

In [52]:
a[3]

's': ASCII/Unicode U+0073 (category Ll: Letter, lowercase)

Julia supports most typical string operations. For example:

* `split(s, " ")` defaults to whitespace

* `join([s1,s2], "")` 

* `replace(s, "toSearch" => "toReplace")`

* `strip(s)` removes leading and trailing whitespace

<br>

In [53]:
s = "Darth Vader"
split(s, " ")

2-element Array{SubString{String},1}:
 "Darth"
 "Vader"

In [54]:
s1 = "Qui-Gon"
s2 = "Jinn"
join([s1, s2], " ")

"Qui-Gon Jinn"

In [55]:
s = "obj.toSearch"
replace(s, "toSearch" => "toReplace")

"obj.toReplace"

In [56]:
s = " Ramana Maharshi "
strip(s)

"Ramana Maharshi"

To convert string representating numbers to integers and floats use:

In [57]:
myint = parse(Int64,"2017")

2017

In [58]:
myint + 1

2018

In [59]:
typeof(myint)

Int64

To convert integers and floats to strings, use

In [62]:
mystring = string(123)

"123"

In [61]:
typeof(mystring)

String

<br>

##### __Concatenation__

There are several ways to concatenate strings:

* Using the concatenation operator: `*`

In [63]:
firstname = "Robert "
lastname = "Zimmerman"
fullname = firstname * lastname
println(fullname)

Robert Zimmerman


* Using the `string` function: `string(str1, str2, str3)`

In [64]:
fullname = string("Obi-Wan", " ", "Kenobi")
println(fullname)

Obi-Wan Kenobi


* Using interpolation, that is combining string variables using the dollar sign: 

In [67]:
println("Who is $firstname $lastname?")

Who is Robert  Zimmerman?


## 2.2 Arrays (Lists)

Arrays are N-dimensional mutable containers. We will look at one-dimensional arrays in this section.

<br>

There are several ways to create arrays:

In [68]:
a = [1, 2, 3]

3-element Array{Int64,1}:
 1
 2
 3

In [70]:
typeof(a)

Array{Int64,1}

In [71]:
size(a)

(3,)

In [72]:
b = [1 2 3]

1×3 Array{Int64,2}:
 1  2  3

In [73]:
size(b)

(1, 3)

In [74]:
## Empty arrays:
a = [] 

Any[]

In [75]:
a = Int64[]

Int64[]

In [76]:
typeof(a)

Array{Int64,1}

In [77]:
b = Float64[]

Float64[]

In [78]:
typeof(b)

Array{Float64,1}

In [79]:
## n-element zero array
a = zeros(10)

10-element Array{Float64,1}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

In [80]:
## Or like this
a = zeros(Int64, 10)

10-element Array{Int64,1}:
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0

In [81]:
a = ones(10) # Or ones(Int64, 10)

10-element Array{Float64,1}:
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0

In [82]:
a = ones(Int64, 10)

10-element Array{Int64,1}:
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1

In [83]:
## Using the Vector{T}() alias
a = Vector{Float64}(undef, 10)

10-element Array{Float64,1}:
 2.295783366e-314
 2.2957839985e-314
 2.280358921e-314
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 2.5711908612e-314

In [84]:
## Directly using the constructor
a = Array{Int64, 1}(undef, 3)

3-element Array{Int64,1}:
 4781293712
          0
 4781293712

In [89]:
## Using the fill function
a = fill(42, 3)

3-element Array{Int64,1}:
 42
 42
 42

In [90]:
## Using random number generators

a = rand(3)  ## Standard Uniform
b = randn(3) ## Standard Normal

3-element Array{Float64,1}:
 -1.8287694976153672
 -0.29100665070956516
  1.073532534956282

In [92]:
z = randn(100_000_000)

100000000-element Array{Float64,1}:
  0.3085750270797109
 -0.6448149425512378
  1.0997890228613378
  1.151908680716017
  0.458899536167544
  2.2628113701946266
 -0.7802423045161225
 -0.9584250005875952
 -2.6783218711074785
 -1.132539161957162
 -0.52481570024004
  1.7664393688700835
  1.6159185217940668
  ⋮
 -0.3342088644932791
 -0.581746764715784
  1.9538863666131705
 -0.16356501084678932
 -1.4231741510848557
 -0.06378251535524561
 -0.4465424008357694
 -0.19719433264697672
 -1.9529305671488992
  0.9665006523024083
 -0.5453080018353399
 -1.2586812761765827

In [97]:
using StatsBase
mean(z)
var(z)

LoadError: UndefVarError: var not defined

<br>

Square brackets `[]` are used to access the elements of an array.

<br>

In [99]:
b[1]

-1.8287694976153672

In [100]:
b[2] = -99.

-99.0

In [101]:
b

3-element Array{Float64,1}:
  -1.8287694976153672
 -99.0
   1.073532534956282

<br>

The slice syntax `from:step:to` is generally supported and in most cases very fast 

<br>

In [102]:
a = 1:2:10

1:2:9

In [103]:
typeof(a)

StepRange{Int64,Int64}

In [107]:
## Use the `collect` function to transform the iterator into an array
a = collect(2:2:10)

5-element Array{Int64,1}:
  2
  4
  6
  8
 10

In [None]:
## a few examples

In [108]:
collect(4:2:8)

3-element Array{Int64,1}:
 4
 6
 8

In [109]:
collect(8:-2:4)

3-element Array{Int64,1}:
 8
 6
 4

In [110]:
reverse(a)

5-element Array{Int64,1}:
 10
  8
  6
  4
  2

In [111]:
collect(a[end:-1:1])

5-element Array{Int64,1}:
 10
  8
  6
  4
  2

In [112]:
## the keyword `end` gets the last element in an array
a[end]

10

In [114]:
## You can use it to slice an array
a[3:end]

3-element Array{Int64,1}:
  6
  8
 10

In [115]:
## You can use the `vcat` command
y = vcat(2015, 2025:2030, 2100)

8-element Array{Int64,1}:
 2015
 2025
 2026
 2027
 2028
 2029
 2030
 2100

<br>

Ther are many functions that operate on arrays. I will demonstate some of them below. Use the help command to look up additional details. 

<br>

In [116]:
a = [1, 2, 3]
b = 4
push!(a, b) # the bang convention

4-element Array{Int64,1}:
 1
 2
 3
 4

In [117]:
b = [4, 5, 6]
append!(a, b)

7-element Array{Int64,1}:
 1
 2
 3
 4
 4
 5
 6

In [118]:
c = vcat(1, [2, 3], [4, 5])

5-element Array{Int64,1}:
 1
 2
 3
 4
 5

In [119]:
pop!(a) # remove an element from the end of the array

6

In [120]:
popfirst!(a)

1

In [121]:
a = collect(1:10)
deleteat!(a, 4)

9-element Array{Int64,1}:
  1
  2
  3
  5
  6
  7
  8
  9
 10

In [122]:
pushfirst!(a, 4)

10-element Array{Int64,1}:
  4
  1
  2
  3
  5
  6
  7
  8
  9
 10

In [123]:
sort!(a)

10-element Array{Int64,1}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10

In [124]:
a

10-element Array{Int64,1}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10

In [125]:
a = [1, 1, 2, 3, 4, 4, 5]
unique!(a)

5-element Array{Int64,1}:
 1
 2
 3
 4
 5

In [126]:
a = collect(1:10)
reverse!(a)

10-element Array{Int64,1}:
 10
  9
  8
  7
  6
  5
  4
  3
  2
  1

In [127]:
in(4, a)

true

In [128]:
length(a)

10

In [129]:
maximum(a)

10

In [131]:
## Note that `maximum` is different than `max`
max(1, 2, 3, 99)

99

In [132]:
max(a)

LoadError: MethodError: no method matching max(::Array{Int64,1})
Closest candidates are:
  max(::Any, !Matched::Missing) at missing.jl:129
  max(::Any, !Matched::Any) at operators.jl:417
  max(::Any, !Matched::Any, !Matched::Any, !Matched::Any...) at operators.jl:538
  ...

In [133]:
## Or use the splat operator
max(a...)

10

In [134]:
minimum(a)

1

In [135]:
sum(a)

55

In [136]:
cumsum(a)

10-element Array{Int64,1}:
 10
 19
 27
 34
 40
 45
 49
 52
 54
 55

In [137]:
empty!(a)

Int64[]

In [138]:
a = rand(10)
b = vec(a)

10-element Array{Float64,1}:
 0.42034849864126556
 0.40542642051423616
 0.776039520966308
 0.002573042577780349
 0.9222360661055049
 0.23059162603967231
 0.23707084174092863
 0.27663348047471925
 0.6227346486909355
 0.27445722397851724

In [139]:
using Random
a = collect(1:10)
shuffle!(a)

10-element Array{Int64,1}:
  7
  2
  8
  1
  9
  5
 10
  3
  4
  6

In [None]:
isempty(a)

In [None]:
[1, 10]

In [None]:
collect(1:10)

In [None]:
a = [1, 4, 2, 4, 3, 4, 4, 4, 5, 4, 6, 4, 7, 4, 8, 4, 9, 4, 10, 4]
findall(x -> x == 4, a) # this uses a Lambda function (more next chapter!)

In [None]:
## Can nest these functions
deleteat!(a, findall(x -> x == 4, a))

In [None]:
a

In [None]:
enumerate(a)

In [None]:
names = ["Marc", "Anne"]
sex = ["M", "F"]
age = [18, 16]
collect(zip(names, sex, age))

#### __2.2.1 Multidimensional and Nested Arrays__

In this section we deal with multi-dimensional arrays

* `Array{T, 2}` or `Matrix{T}`

In [140]:
a = Array{Float64, 2}

Array{Float64,2}

In [141]:
b = Matrix{Float64} # just an alias for the above

Array{Float64,2}

In [142]:
# A nested array of arrays
a = [[1, 2, 3], [4, 5, 6]]

2-element Array{Array{Int64,1},1}:
 [1, 2, 3]
 [4, 5, 6]

In [143]:
a = [1 4; 2 5; 3 6]

3×2 Array{Int64,2}:
 1  4
 2  5
 3  6

In [144]:
a = zeros(2, 3, 4)

2×3×4 Array{Float64,3}:
[:, :, 1] =
 0.0  0.0  0.0
 0.0  0.0  0.0

[:, :, 2] =
 0.0  0.0  0.0
 0.0  0.0  0.0

[:, :, 3] =
 0.0  0.0  0.0
 0.0  0.0  0.0

[:, :, 4] =
 0.0  0.0  0.0
 0.0  0.0  0.0

In [150]:
a[:,:,1] .= -99.

2×3 view(::Array{Float64,3}, :, :, 1) with eltype Float64:
 -99.0  -99.0  -99.0
 -99.0  -99.0  -99.0

In [151]:
a

2×3×4 Array{Float64,3}:
[:, :, 1] =
 -99.0  -99.0  -99.0
 -99.0  -99.0  -99.0

[:, :, 2] =
 0.0  0.0  0.0
 0.0  0.0  0.0

[:, :, 3] =
 0.0  0.0  0.0
 0.0  0.0  0.0

[:, :, 4] =
 0.0  0.0  0.0
 0.0  0.0  0.0

In [152]:
b = ones(2, 3, 2)

2×3×2 Array{Float64,3}:
[:, :, 1] =
 1.0  1.0  1.0
 1.0  1.0  1.0

[:, :, 2] =
 1.0  1.0  1.0
 1.0  1.0  1.0

In [153]:
a = fill(42, 2, 2, 2)

2×2×2 Array{Int64,3}:
[:, :, 1] =
 42  42
 42  42

[:, :, 2] =
 42  42
 42  42

In [154]:
a = rand(2, 3, 3)

2×3×3 Array{Float64,3}:
[:, :, 1] =
 0.896347  0.53754   0.394013
 0.787442  0.384579  0.0580263

[:, :, 2] =
 0.742093  0.0235708  0.591369
 0.33361   0.247144   0.643625

[:, :, 3] =
 0.56105   0.0286588  0.807244
 0.886624  0.551685   0.426114

In [155]:
a = [3x + 2y + z for x in 1:2, y in 2:3, z in 1:2]

2×2×2 Array{Int64,3}:
[:, :, 1] =
  8  10
 11  13

[:, :, 2] =
  9  11
 12  14

In [157]:
a = [[1, 2, 3], [4, 5, 6]]
mask = [[true, true, false], [false, true, false]]
a[mask]

LoadError: ArgumentError: invalid index: Array{Bool,1}[[1, 1, 0], [0, 1, 0]] of type Array{Array{Bool,1},1}

In [158]:
size(a)

(2,)

In [159]:
ndims(a)

1

In [161]:
a = rand(2, 2, 3)
reshape(a, 3, 2, 2)

3×2×2 Array{Float64,3}:
[:, :, 1] =
 0.759743  0.0843751
 0.586863  0.884224
 0.550453  0.462026

[:, :, 2] =
 0.402692  0.613024
 0.914201  0.0292218
 0.702997  0.599188

In [162]:
a = rand(2, 1, 3)
dropdims(a, dims=(2))

2×3 Array{Float64,2}:
 0.289473  0.923282  0.535287
 0.278215  0.337319  0.283696

In [163]:
a = rand(3, 2)
transpose(a)

2×3 LinearAlgebra.Transpose{Float64,Array{Float64,2}}:
 0.787701  0.652878  0.592623
 0.675178  0.633263  0.766532

## __2.3 Tuples__

Use the `Tuple{T1, T2, T3}` to create an immutable list of elements:

In [164]:
t = (1, 2.5, "a")

(1, 2.5, "a")

In [165]:
typeof(t)

Tuple{Int64,Float64,String}

In [166]:
## also without parentheses:
t = 1, 2.5, "a"

(1, 2.5, "a")

_Immutable_ refers to the fact that once they are created, elements of the data structure cannot be added, removed, or changed

In [167]:
t[1] # index to the first element

1

In [168]:
t[1] = -99 ## this will raise and exception

LoadError: MethodError: no method matching setindex!(::Tuple{Int64,Float64,String}, ::Int64, ::Int64)

## __2.4 Named Tuples__

In [169]:
nt = (a=1, b=2.5)

(a = 1, b = 2.5)

In [170]:
typeof(nt)

NamedTuple{(:a, :b),Tuple{Int64,Float64}}

In [171]:
nt.a

1

In [172]:
nt.b

2.5

In [173]:
keys(nt)

(:a, :b)

In [174]:
values(nt)

(1, 2.5)

In [175]:
collect(nt)

2-element Array{Real,1}:
 1
 2.5

In [176]:
pairs(nt)

pairs(::NamedTuple) with 2 entries:
  :a => 1
  :b => 2.5

In [177]:
person = (firstname="Robert", lastname="Zimmerman", age=79)

(firstname = "Robert", lastname = "Zimmerman", age = 79)

In [184]:
person.firstname

"Robert"

In [185]:
person.lastname

"Zimmerman"

In [186]:
person.age

79

In [188]:
function bob(firstname, lastname, age)
    println("$firstname $lastname is $age years old.")
end

bob (generic function with 1 method)

In [189]:
bob(person...) ## you can splat the tuple as arguments to a function

Robert Zimmerman is 79 years old.


## __2.5 Dictionaries__

Dictionaries store mappings from keys to values and they have an apparently random sorting. Julia dictionaries are very similar to dictionaries in Python.

In [190]:
d = Dict('a'=>1, 'b'=>2, 'c'=>3)

Dict{Char,Int64} with 3 entries:
  'a' => 1
  'c' => 3
  'b' => 2

In [191]:
d['a']

1

In [192]:
## add a key-value pair on the fly
d['d'] = 4

4

In [193]:
d

Dict{Char,Int64} with 4 entries:
  'a' => 1
  'c' => 3
  'd' => 4
  'b' => 2

In [194]:
delete!(d, 'b')

Dict{Char,Int64} with 3 entries:
  'a' => 1
  'c' => 3
  'd' => 4

In [199]:
map((i, j) -> mydict[i] = j, ['a', 'b', 'c'], [1, 2, 3])

LoadError: ArgumentError: invalid index: 'a' of type Char

In [198]:
mydict

Any[]

In [None]:
mydict['a']

In [None]:
get(mydict, 'a', 0) ## 0 is the default value if missing

In [None]:
keys(mydict)

In [None]:
values(mydict)

In [None]:
haskey(mydict, 'a')

In [None]:
in(('a' => 1), mydict)

In [None]:
## iterate through k, v pairs
for (k, v) in mydict
    println("$k is $v")
end

## __2.6 Sets__

Use `Set{T}` to represent collections of unordered, unique values.

In [None]:
s = Set() ## create an empty zero-element set

In [None]:
s = Set([1, 2, 2, 3, 4]) ## initialize with an array of values

In [None]:
push!(s, 5)

In [None]:
delete!(s, 1)

Set operations are allowed:

In [None]:
set1 = Set([1, 2, 3, 4])
set2 = Set([3, 4, 5, 6])
intersect(set1, set2)

In [None]:
union(set1, set2)

In [None]:
setdiff(set1, set2)

## 2.7 Memory and Copy Issues

Please see the chapter for details.

## 2.8 Various Notes on Data Types

Please see the chapter for details.

#### 2.8.1 Random Numbers

In [200]:
## random float in [0, 1]
rand()

0.541849082791352

In [203]:
## random integer in [a, b]
rand(1:22)

22

In [204]:
## random float in a:b with precision to the second decimal place
rand(1.0:0.01:10.0)

9.59

In [205]:
## random float in [a, b] using a particular distribution (Normal, Poisson, ...)
using Random
rand(Uniform(0, 1), 2, 3)

LoadError: UndefVarError: Uniform not defined

In [206]:
using StatsKit.Distributions

In [208]:
rand(Uniform(0, 100), 2, 3)

2×3 Array{Float64,2}:
 45.5287  43.198   72.9254
 39.4916  19.0423  22.393

In [209]:
rand(Normal(10, 2), 5, 5)

5×5 Array{Float64,2}:
  8.28721  11.6094    9.16803   7.44627  10.9825
 12.896     7.27723   9.4642    7.51122  11.4746
  7.81413   8.45189   9.62879  10.6665   13.8341
 10.5784    8.4801   11.223     5.98121   7.42854
  9.89967   8.63155  12.6673    9.87431   7.52598

In [None]:
x = rand(Normal(0, 1), 10_000_000) # generate 10m floats very quickly

In [210]:
y = rand(Beta(16, 16), 1000)

1000-element Array{Float64,1}:
 0.48050152935635215
 0.4408425315493626
 0.5718725736586685
 0.6023176605872133
 0.391718476628025
 0.5935920411386039
 0.5452784938649745
 0.528325461121034
 0.5143280300577848
 0.5127183908396002
 0.528283048047703
 0.6634904305136027
 0.47128326190660014
 ⋮
 0.5540164128327062
 0.38068654815465897
 0.5414864644270995
 0.490803067141414
 0.43415496413239424
 0.37833665593798516
 0.4384359296812707
 0.46815195095828777
 0.5546906464268281
 0.4548145797435033
 0.6511672900601242
 0.4790160164231799

#### 2.8.2 Missing, Nothing, and NaN

See chapter for details.