# Collection Types

In this chapter, we will look more deeply
into multidimensional arrays (or matrices), and into the tuple type as well. A dictionary
type, where you can look up a value through a key, is indispensable in a modern language,
and Julia has this too. Finally, we will explore the set type. Like arrays, all these types are
parameterized; the type of their elements can be specified at the time of object construction.

So, the following are the topics for this chapter:
- Matrices
- Tuples
- Dictionaries
- Sets
- An example project—word frequency

## Matrices

We know that the notation [1, 2, 3] is used to create an array. In fact, this notation
denotes a special type of array, called a (column) vector in Julia

To create this as a row vector (1 2 3), use the notation [1 2 3] with spaces instead of
commas.

In [2]:
[1 2 3]

1×3 Matrix{Int64}:
 1  2  3

In [3]:
typeof(ans)

Matrix{Int64} (alias for Array{Int64, 2})

This array is of type 1 x 3 Array{Int64,2}, so it has two dimensions. (The
spaces used in [1, 2, 3] are for readability only, we could have written this as [1,2,3]).

A matrix is a two- or multidimensional array (in fact, a matrix is an alias for the two-
dimensional case). We can write this as follows:

In [4]:
Array{Int64,1} == Vector{Int64}

true

In [6]:
Array{Int64,2} == Matrix{Int64}

true

To create a matrix, use space-separated values for the columns and semicolon-separated for
the rows:

In [7]:
matrix = [1 2 3; 4 5 6]

2×3 Matrix{Int64}:
 1  2  3
 4  5  6

So, the column vector from the beginning can also be written as [1; 2; 3]. However, you
cannot use commas and semicolons together.

In [8]:
[2,3;2,3]

LoadError: syntax: unexpected semicolon in array expression around In[8]:1

In [46]:
[[1 2],[ 1 2]]

2-element Vector{Matrix{Int64}}:
 [1 2]
 [1 2]

To get the value from a specific element in the matrix, you need to index it by row and then
by column, for example, matrix[2, 1] returns the value 3 (second row, first column).

In [47]:
size(ans)

(2,)

In [21]:
m1 = Array{Int64,2}(undef,(2,2))

2×2 Matrix{Int64}:
 140529354331504  140529354331568
 140529553906512  140529354333360

In [22]:
m1[1,1]

140529354331504

In [23]:
m1[2,2]

140529354333360

Using the same notation, one can calculate products of matrices such as [1 2] * [3 ;
4]; this is calculated as [1 2] * [3 4], which returns the value 11 (which is equal to 1*3 + 2*4).

In [49]:
[1 2] * [1 ; 2]

1-element Vector{Int64}:
 5

In [52]:
[1 2] * [1 , 2]

1-element Vector{Int64}:
 5

In [53]:
[1,2]

2-element Vector{Int64}:
 1
 2

In [54]:
size(ans)

(2,)

In [55]:
[1; 2]

2-element Vector{Int64}:
 1
 2

In [56]:
size(ans)

(2,)

In contrast to this, conventional matrix multiplication is defined with the operator

In [60]:
[1 2 3] .* [1, 2]

2×3 Matrix{Int64}:
 1  2  3
 2  4  6

In [63]:
[1, 2, 3] * [1 2]

3×2 Matrix{Int64}:
 1  2
 2  4
 3  6

To create a matrix from random numbers between 0 and 1, with three rows and
five columns, use ma1 = rand(3, 5), which shows the following results:

In [64]:
m2 = rand(2,2)

2×2 Matrix{Float64}:
 0.574467  0.447271
 0.369652  0.140397

In [68]:
m2 = rand([1,2,3],2,2)

2×2 Matrix{Int64}:
 3  3
 1  1

The ndims function can be used to obtain the number of dimensions of a matrix. Consider
the following example:

In [69]:
ndims(m2)

2

In [70]:
size(m2)

(2, 2)

To get the number of rows (3), run the following command:

In [71]:
size(m2,1)

2

In [72]:
size(m2,2)

2

In [73]:
size(m2)[1]

2

In [74]:
length(m2)

4

If you need an identity matrix, where all the elements are zero, except for the elements on
the diagonal that are 1.0, use the I function (from the LinearAlgebra package) with the
argument 3 for a 3 x 3 matrix:

In [77]:
using LinearAlgebra

In [79]:
idm = Matrix(1.0*I,4,5)

4×5 Matrix{Float64}:
 1.0  0.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0  0.0
 0.0  0.0  1.0  0.0  0.0
 0.0  0.0  0.0  1.0  0.0

You can easily work with parts of a matrix, known as slices; these are similar to those used
in Python and NumPy as follows:

In [80]:
idm[:,2]

4-element Vector{Float64}:
 0.0
 1.0
 0.0
 0.0

In [81]:
idm[2,:]

5-element Vector{Float64}:
 0.0
 1.0
 0.0
 0.0
 0.0

In [83]:
idm[2:end, 2:end]

3×4 Matrix{Float64}:
 1.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0
 0.0  0.0  1.0  0.0

Slicing operations create views into the original array rather than copying the data, so a
change in the slice changes the original array or matrix.

Any multidimensional matrix can also be seen as a one-dimensional vector in column
order, as follows:

In [90]:
show(idm[:])

[1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]

To make an array of arrays (a jagged array), use an Array initialization, and then push!
each array in its place, for example:

In [91]:
jarr = Array{Int64}[]

Array{Int64}[]

In [93]:
push!(jarr,[1, 2, 3])

1-element Vector{Array{Int64}}:
 [1, 2, 3]

In [94]:
push!(jarr,[1,2,3,4])

2-element Vector{Array{Int64}}:
 [1, 2, 3]
 [1, 2, 3, 4]

In [95]:
push!(jarr,collect(1:10))

3-element Vector{Array{Int64}}:
 [1, 2, 3]
 [1, 2, 3, 4]
 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

If ma is a matrix, say [1 2; 3 4], then ma' is the transpose matrix, that is [1 3; 2 4]:

In [99]:
m1 = rand([0, 1], 2, 3)

2×3 Matrix{Int64}:
 1  1  1
 0  1  1

In [100]:
m1'

3×2 adjoint(::Matrix{Int64}) with eltype Int64:
 1  0
 1  1
 1  1

ma' is an operator notation for the transpose(ma) function.

Multiplication is defined between matrices, as in mathematics,

In [103]:
m1 * m1'

2×2 Matrix{Int64}:
 3  2
 2  2

If you need element-wise multiplication, use ma .* ma'

In [106]:
m2 = rand([0,2,3,4,5],2,2)

2×2 Matrix{Int64}:
 5  3
 4  3

In [107]:
m2 .* m2

2×2 Matrix{Int64}:
 25  9
 16  9

The inverse of a matrix ma (if it exists) is given by the inv(ma) function. The inv(ma)
function returns 2 x 2 Array{Float64,2}:

In [108]:
inv(m2 .* m2)

2×2 Matrix{Float64}:
  0.111111  -0.111111
 -0.197531   0.308642

In [109]:
m2 * inv(m2)

2×2 Matrix{Float64}:
 1.0  -8.88178e-16
 0.0   1.0

Trying to take the inverse of a singular matrix (a matrix that does not have
a well-defined inverse) will result in LAPACKException or
SingularException, depending on the matrix type. Suppose you want
to solve the ma1 * X = ma2 equation, where ma1, X, and ma2 are
matrices. The obvious solution is X = inv(ma1) * ma2. However, this is
actually not that good. It is better to use the built-in solver, where X =
ma1 \ ma2. If you have to solve the X * ma1 = ma2 equation, use the
solution X = ma2 / ma1. Solutions that use / and \ are much more
numerically stable, and also much faster.

If v = [1.,2.,3.] and w = [2.,4.,6.], and you want to form a 3 x 2 matrix with these
two column vectors, then use hcat(v, w) (for horizontal concatenation) to produce the
following output:

In [112]:
v = collect(1:10)
w = collect(11:20)
hcat(v,w)

10×2 Matrix{Int64}:
  1  11
  2  12
  3  13
  4  14
  5  15
  6  16
  7  17
  8  18
  9  19
 10  20

In [113]:
vcat(v, w)

20-element Vector{Int64}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20

vcat(v,w) (for vertical concatenation) results in a one-dimensional array with all the six
elements with the same result as append!(v, w)

Thus, hcat concatenates vectors or matrices along the second dimension (columns), while
vcat concatenates along the first dimension (rows). The more general cat can be used to
concatenate multidimensional arrays along arbitrary dimensions.

There is an even simpler literal notation: to concatenate two matrices a and b with the same
number of rows to a matrix c, just execute c = [a b]. now b is appended to the right of a.
To put b beneath c, use c = [a; b].

In [114]:
a = rand([1,2,3,4], 2, 2)
b = rand([1,6,5,8], 2, 2)


2×2 Matrix{Int64}:
 1  6
 1  8

In [115]:
a

2×2 Matrix{Int64}:
 1  2
 3  3

In [116]:
b

2×2 Matrix{Int64}:
 1  6
 1  8

In [117]:
[a b]

2×4 Matrix{Int64}:
 1  2  1  6
 3  3  1  8

In [119]:
[a; b]

4×2 Matrix{Int64}:
 1  2
 3  3
 1  6
 1  8

In [120]:
[a, b]

2-element Vector{Matrix{Int64}}:
 [1 2; 3 3]
 [1 6; 1 8]

The reshape function changes the dimensions of a matrix to new values if this is possible,
for example:

In [121]:
reshape(1:20,(4,5))

4×5 reshape(::UnitRange{Int64}, 4, 5) with eltype Int64:
 1  5   9  13  17
 2  6  10  14  18
 3  7  11  15  19
 4  8  12  16  20

In [124]:
reshape(rand([2,3,4],(3,4)),(4,3))

4×3 Matrix{Int64}:
 4  3  2
 3  4  3
 3  2  2
 3  4  3

When working with arrays that contain arrays, it is important to realize that such an array
contains references to the contained arrays, not their values.

If you want to make a copy of
an array, you can use the copy() function, but this produces only a shallow copy with
references to the contained arrays. In order to make a complete copy of the values, we need
to use the deepcopy() function.

In [126]:
x = Array{Any}(undef, 2)

2-element Vector{Any}:
 #undef
 #undef

In [127]:
x[1] = ones(2,3)

2×3 Matrix{Float64}:
 1.0  1.0  1.0
 1.0  1.0  1.0

In [128]:
x[2] = zeros(3,4)

3×4 Matrix{Float64}:
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0

In [129]:
y = copy(x)

2-element Vector{Any}:
 [1.0 1.0 1.0; 1.0 1.0 1.0]
 [0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0]

In [130]:
y

2-element Vector{Any}:
 [1.0 1.0 1.0; 1.0 1.0 1.0]
 [0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0]

In [131]:
y[1][1,1] = 1000 

1000

In [132]:
y

2-element Vector{Any}:
 [1000.0 1.0 1.0; 1.0 1.0 1.0]
 [0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0]

In [133]:
x

2-element Vector{Any}:
 [1000.0 1.0 1.0; 1.0 1.0 1.0]
 [0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0]

Therefore you have to use deepcopy

In [138]:
x = Array{Any}(undef, 2) #> 2-element Array{Any,1}: #undef #undef
x[1] = ones(2) #> 2-element Array{Float64} 1.0 1.0
x[2] = trues(3) #> 3-element BitArray{1}: true true true
x #> 2-element Array{Any,1}: [1.0,1.0] Bool[true,true,true]

2-element Vector{Any}:
 [1.0, 1.0]
 Bool[1, 1, 1]

In [139]:
a = x
b = copy(x)
c = deepcopy(x)

2-element Vector{Any}:
 [1.0, 1.0]
 Bool[1, 1, 1]

In [140]:
# Now if we change x:
x[1] = "Julia"
x[2][1] = false

false

In [141]:
x

2-element Vector{Any}:
 "Julia"
 Bool[0, 1, 1]

In [142]:
a

2-element Vector{Any}:
 "Julia"
 Bool[0, 1, 1]

In [143]:
isequal(a, x)

true

In [144]:
b

2-element Vector{Any}:
 [1.0, 1.0]
 Bool[0, 1, 1]

In [145]:
isequal(b, x)

false

The reference has changed 

In [148]:
isequal(c, x)

false

In [149]:
c

2-element Vector{Any}:
 [1.0, 1.0]
 Bool[1, 1, 1]

Nothing has changed

To further increase performance, consider using the statically-sized and immutable vectors
and matrices from the ImmutableArrays package, which is a lot faster, certainly for small
matrices, and particularly for vectors.

## Tuples

A tuple is a fixed-sized group of values, separated by commas and optionally surrounded
by parentheses ( )

The type of these values can be the same, but it doesn't have to be; a
tuple can contain values of different types, unlike arrays. A tuple is a heterogeneous
container, whereas an array is a homogeneous container. The type of a tuple is just a tuple
of the types of the values it contains. So, in this sense, a tuple is very much the counterpart
of an array in Julia. Also, changing a value in a tuple is not allowed; tuples are immutable.

In [150]:
x = 1, 22.0, "World", 'x'
a, b, c, d = x

(1, 22.0, "World", 'x')

In [151]:
a

1

In [152]:
typeof(x)

Tuple{Int64, Float64, String, Char}

The argument list of a function (refer to the Defining functions section in Chapter 3,
Functions) is, in fact, also a tuple.

Similarly, Julia simulates the possibility of returning
multiple values by packaging them into a single tuple, and a tuple also appears when using
functions with variable argument lists.

( ) represents the empty tuple, and (1,) is a one-
element tuple. The type of a tuple can be specified explicitly through a type annotation
(refer to the Types section in Chapter 2, Variables, Types, and Operations), such as ('z',
3.14)::Tuple{Char, Float64}.

In [161]:
function funt()
    a::Tuple{String, Number} = ("what the heck", 23)
    print(a)
end


funt (generic function with 1 method)

In [162]:
funt()

("what the heck", 23)

We can index tuples in the same way as arrays by using
brackets, indexing starting from 1, slicing, and index control:

To iterate over the elements of a tuple, use a for loop

In [164]:
a = Tuple(1:10)

(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

In [165]:
for i in a
    print(i)
end

12345678910

A tuple can be unpacked or deconstructed like this: a, b = t3; now a is 5 and b is 6. Notice
that we don't get an error despite the left-hand side not being able to take all the values of
t3. To do this, we would have to write a, b, c, d = t3.

In [166]:
x, y = a

(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

In [167]:
x

1

In [168]:
y

2

In [180]:
bariable = Tuple{String,Number}(("wTF",32))

("wTF", 32)

In [178]:
bariable

("wTF", 32)

In [181]:
variable = Tuple{String,Number}[("wTF",32), ("NOR", 23)]

2-element Vector{Tuple{String, Number}}:
 ("wTF", 32)
 ("NOR", 23)

In [182]:
variable

2-element Vector{Tuple{String, Number}}:
 ("wTF", 32)
 ("NOR", 23)

## Dictionaries 

When you want to store and look up values based on a unique key, then the dictionary type
Dict (also called hash, associative collection, or map in other languages) is what you need.
It is basically a collection of two-element tuples of the form (key, value). To define a
dictionary d1 as a literal value, the following syntax is used:

In [169]:
d1 = Dict(1 => "what the heck", :b => "WTF")

Dict{Any, String} with 2 entries:
  :b => "WTF"
  1  => "what the heck"

the key appears before the => symbol
and the value appears after it, and the tuples are separated by commas.

To explicitly specify the types, use:

In [179]:
d2 = Dict{Number, String}(1 => "one", 2 => "Two")

Dict{Number, String} with 2 entries:
  2 => "Two"
  1 => "one"

If you use the former [] notation to try to define a dictionary, you now
get Array{Pairs{}} instead:

In [196]:
d3 =[1 => "one", 2=> "Two"]

2-element Vector{Pair{Int64, String}}:
 1 => "one"
 2 => "Two"

In [197]:
typeof(d3)

Vector{Pair{Int64, String}} (alias for Array{Pair{Int64, String}, 1})

In [198]:
d2 = Dict{Any,Any}("a"=>1, (2,3)=>true)
d3 = Dict(:A => 100, :B => 200)

Dict{Symbol, Int64} with 2 entries:
  :A => 100
  :B => 200

The Any type is inferred when a common type among the keys or values cannot be
detected.

So a Dict can have keys of different types, and the same goes for the values: their type is
then indicated as Any.

In general, dictionaries that have type {Any, Any} tend to lead to
lower performance since the JIT compiler does not know the exact type of the elements.

Dictionaries used in performance-critical parts of the code should therefore be explicitly
typed.

Notice that the (key, value) pairs are not returned (or stored) in the key order.

If the keys are of the Char or String type, you can also use Symbol as the key type, which
could be more appropriate since Symbols are immutable, for example:

In [199]:
d3 = Dict{Symbol,Int64}(:one => 1, :two => 2)

Dict{Symbol, Int64} with 2 entries:
  :two => 2
  :one => 1

Use the bracket notation, with a key as an index, to get the corresponding value:

In [200]:
d3[:one]

1

In [201]:
d3[:z]

LoadError: KeyError: key :z not found

In [206]:
get(d3,:one,23)

1

In [207]:
get(d3,:x,23)

23

To test if a key is present in Dict, you can use the function haskey as follows:

In [208]:
haskey(d3,:what)

false

In [210]:
haskey(d3,:one)

true

Dictionaries are mutable. If we tell Julia to execute d3[:A] = 150, then the value for key
:A in d3 has changed to 150

In [211]:
d3

Dict{Symbol, Int64} with 2 entries:
  :two => 2
  :one => 1

In [213]:
d3[:one] = 2433

2433

In [215]:
d3

Dict{Symbol, Int64} with 2 entries:
  :two => 2
  :one => 2433

If we do this with a new key, then that tuple is added to the
dictionary:

In [216]:
d3[:three] = 3

3

In [217]:
d3

Dict{Symbol, Int64} with 3 entries:
  :three => 3
  :two   => 2
  :one   => 2433

d4 = Dict() is an empty dictionary of type Any, and you can start populating it in the
same way as in the example with d3.

In [218]:
d4 = Dict()

Dict{Any, Any}()

In [220]:
d4[:WTF] = "WTF"

"WTF"

In [221]:
d4

Dict{Any, Any} with 1 entry:
  :WTF => "WTF"

d5 = Dict{Float64, Int64}() is an empty dictionary with key type Float64 and
value type Int64

In [222]:
d5 = Dict{Float64,Int64}()

Dict{Float64, Int64}()

In [224]:
d5[.3] = 3

3

In [225]:
d5

Dict{Float64, Int64} with 1 entry:
  0.3 => 3

Deleting a key mapping from a collection is also straightforward. delete!(d3, :B)
removes (:B, 200) from the dictionary, and returns the collection that contains only :A
=> 100.

In [226]:
delete!(d5,.3)

Dict{Float64, Int64}()

In [227]:
d5

Dict{Float64, Int64}()

## Keys and values – looping

To isolate the keys of a dictionary, use the keys function ki = keys(d3), with ki being a
KeyIterator object, which we can use in a for loop as follows:

In [3]:
d3 = Dict(1 => "one", 2 => "Two", 3 => "Three")

Dict{Int64, String} with 3 entries:
  2 => "Two"
  3 => "Three"
  1 => "one"

In [4]:
d4 = Dict{Number,String}(1 => "one", 2 => "Two", 3 => "Three")

Dict{Number, String} with 3 entries:
  2 => "Two"
  3 => "Three"
  1 => "one"

In [6]:
for k in keys(d3)
    print(k,", ")
end

2, 3, 1, 

This
also gives us an alternative way to test if a key exists with in. For example, :A in
keys(d3) returns true and :Z in keys(d3) returns false.

In [10]:
2 in keys(d3)

true

If you want to work with an array of keys, use collect(keys(d3))

In [11]:
collect(keys(d3))

3-element Vector{Int64}:
 2
 3
 1

To obtain the values, use the values
function: vi = values(d3), with vi being a Value Iterator object, which we can also
loop through with for:

In [12]:
for v in values(d3)
    print(v,", ")
end


Two, Three, one, 

The order in which the values or keys are returned is
undefined.

Creating a dictionary from arrays with keys and values is trivial because we have a Dict
constructor that can use these; as in the following example:

In [14]:
a = collect(1:10)
b = collect(11:20)

10-element Vector{Int64}:
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20

In [15]:
d5 = Dict(zip(a, b))

Dict{Int64, Int64} with 10 entries:
  5  => 15
  4  => 14
  6  => 16
  7  => 17
  2  => 12
  10 => 20
  9  => 19
  8  => 18
  3  => 13
  1  => 11

Working with both the key and value pairs in a loop is also easy. For instance, the for
loop over d5 is as follows:

In [18]:
for (m, n) in d5
    print(m, n, ", ")
end


515, 414, 616, 717, 212, 1020, 919, 818, 313, 111, 

Alternatively, we can use an index in the tuple:

In [20]:
for v in d5
    print(v[1], v[2], ", ")
end
    

515, 414, 616, 717, 212, 1020, 919, 818, 313, 111, 

Here are some more neat tricks, where dict is a dictionary:

In [21]:
arrkey = [key for (key, _) in d5]

10-element Vector{Int64}:
  5
  4
  6
  7
  2
 10
  9
  8
  3
  1

In [22]:
arrval = [value for (_, value) in d5]

10-element Vector{Int64}:
 15
 14
 16
 17
 12
 20
 19
 18
 13
 11

# Sets 

Array elements are ordered, but can contain duplicates, that is, the same value can occur at
different indices. In a dictionary, keys have to be unique, but the values do not, and the
keys are not ordered. If you want a collection where order does not matter, but where the
elements have to be unique, then use a Set.

In [23]:
s = Set(collect(1:10))

Set{Int64} with 10 elements:
  5
  4
  6
  7
  2
  10
  9
  8
  3
  1

The Set() function creates an empty set Set(Any[]). The preceding line returns Set([7,
14, 13, 11]), where the duplicates have been eliminated.

Operations from the set theory are also defined for s1 = Set([11, 25]) and s2 =
Set([25, 3.14]) as follows:

In [28]:
s1 = Set(rand(collect(1:10), 10));
s2 = Set(rand(collect(5:15), 10));

In [30]:
show(s1)

Set([4, 7, 2, 10, 8, 3])

In [31]:
show(s2)

Set([5, 13, 7, 10, 8])

In [32]:
union(s1,s2)

Set{Int64} with 8 elements:
  5
  7
  8
  4
  13
  2
  10
  3

In [33]:
intersect(s1,s2)

Set{Int64} with 3 elements:
  7
  10
  8

In [34]:
setdiff(s1,s2)

Set{Int64} with 3 elements:
  4
  2
  3

elements that exist in the first set but not in the second one 

In [35]:
issubset(s1,s2)

false

To add an element to a set is easy: push!(s1, 32) adds 32 to set s1. Adding an existing
element will not change the set. To test whether a set contains an element, use in.

In [36]:
push!(s1,289)

Set{Int64} with 7 elements:
  289
  4
  7
  2
  10
  8
  3

In [37]:
s1

Set{Int64} with 7 elements:
  289
  4
  7
  2
  10
  8
  3

In [38]:
10 in s1

true

Set([1,2,3]) produces a set of integers Set([2,3,1]) of the Set{Int64} type. To get a
set of arrays, use Set([[1,2,3]]), which returns Set(Array{Int64,1}[[1, 2, 3]]).

In [55]:
a = Set(Array{Int64}[[1,2,3,4]])

Set{Array{Int64}} with 1 element:
  [1, 2, 3, 4]

In [56]:
a

Set{Array{Int64}} with 1 element:
  [1, 2, 3, 4]

In [59]:
a = Set(Array{Int64,1}([1,2,3,4]))

Set{Int64} with 4 elements:
  4
  2
  3
  1

Sets are commonly used when we need to keep track of objects in no particular order. For
instance, we might be searching through a graph. We can then use a set to remember which
nodes of the graph we have already visited in order to avoid visiting them again.

Checking
whether an element is present in a set is independent of the size of the set. This is extremely
useful for very large sets of data, for exampl

In [62]:
x = Set(collect(1:100)); 

In [79]:
y = Set(collect(1:1000));

In [77]:
@time 99 in x

  0.000003 seconds


true

In [78]:
@time 39940 in y

  0.000003 seconds


true

In [80]:
;subl words1.txt

In [82]:
str = read("words1.txt",String)

"to be, or not to be, that is the question"

In [83]:
nonalpha = r"(\W\s?)"

r"(\W\s?)"

In [85]:
str = replace(str, nonalpha => ' ')

"to be or not to be that is the question"

In [87]:
digits = r"(\d+)"

r"(\d+)"

In [88]:
str = replace(str, digits => ' ')

"to be or not to be that is the question"

In [119]:
word_list = split(str, ' ')
word_list = strip.(word_list)

10-element Vector{SubString{String}}:
 "to"
 "be"
 "or"
 "not"
 "to"
 "be"
 "that"
 "is"
 "the"
 "question"

In [120]:
length_word = length.(word_list) 

10-element Vector{Int64}:
 2
 2
 2
 3
 2
 2
 4
 2
 3
 8

In [121]:
Dictionary = Dict{String, Int64}()

Dict{String, Int64}()

In [122]:
for word in word_list
    if word in keys(Dictionary)
        Dictionary[word] += 1
    else
        Dictionary[word] = 1
    end
end


In [123]:
Dictionary

Dict{String, Int64} with 8 entries:
  "question" => 1
  "or"       => 1
  "that"     => 1
  "not"      => 1
  "is"       => 1
  "the"      => 1
  "to"       => 2
  "be"       => 2