# Generating data

Some simple exercises to generate and manipulate data.

The `rand()` function samples from a uniform [0, 1] distribution

In [1]:
rand()

0.6532140985342512

In [2]:
rand()

0.7572761850876664

All Julia functions are generic and *dispatch* on the entire signature of the arguments.

In [3]:
rand(3)  # returns a vector of (0, 1) uniform random variates

3-element Array{Float64,1}:
 0.484345
 0.817544
 0.584096

In [4]:
rand(2,4) # returns a matrix

2×4 Array{Float64,2}:
 0.248173  0.476979  0.975319  0.28921
 0.684545  0.294892  0.897453  0.88405

To obtain a reproducible sequence, set the random number seed

In [5]:
srand(1234321)
rand(1, 5)

1×5 Array{Float64,2}:
 0.0944218  0.936611  0.258327  0.930924  0.555283

In [6]:
srand(1234321)
rand(1, 3)

1×3 Array{Float64,2}:
 0.0944218  0.936611  0.258327

Unlike in `R`, Julia functions can modify or `mutate` their arguments.  By convention the names of mutating functions end in `!`.  This is only a convention, not a requirement, and serves as a warning to the user that care is needed when calling such a function.

In [13]:
rv = rand(1, 5)

1×5 Array{Float64,2}:
 0.653566  0.458101  0.735234  0.212451  0.845895

In [14]:
rand!(rv)  # overwrites the contents of the array

1×5 Array{Float64,2}:
 0.0171136  0.903764  0.843184  0.143325  0.914686

In [15]:
rv         # display the current contents

1×5 Array{Float64,2}:
 0.0171136  0.903764  0.843184  0.143325  0.914686

## Using the Distributions package

The `Distributions` package provides a large collection of probabilistic distributions and associated functions.

The distribution itself is used as an argument to functions such as `rand`, `cdf`, `quantile`, `pdf`, ...

In [7]:
using Distributions
expdist = Exponential(2.)

Distributions.Exponential{Float64}(θ=2.0)

In [8]:
mean(expdist)

2.0

In [9]:
var(expdist)

4.0

In [10]:
kurtosis(expdist)

6.0

In [12]:
rand(expdist, 5)

5-element Array{Float64,1}:
 7.49991  
 1.38978  
 4.88123  
 0.0333433
 1.65335  

In [18]:
srand(4321234)
rand!(expdist, rv)

1×5 Array{Float64,2}:
 0.99159  7.68577  1.80092  0.0250716  1.16494

In [19]:
cdf(expdist, rv)  # cumulative distribution function

1×5 Array{Float64,2}:
 0.390914  0.978568  0.593618  0.0124575  0.441484

In [20]:
ccdf(expdist, rv)  # complementary cdf (i.e 1. - cdf evaluated with less round-off)

1×5 Array{Float64,2}:
 0.609086  0.0214317  0.406382  0.987542  0.558516

In [25]:
quantile(expdist, cdf(expdist, rv))  # check the round-trip identity

1×5 Array{Float64,2}:
 0.99159  7.68577  1.80092  0.0250716  1.16494

## Patterned data

Constant vectors, matrices, etc. can be generated with `zeros()`, `ones()`, or `fill()`.  The `fill!()` function overwrites the contents of an array.

In [29]:
v = ones(3)

3-element Array{Float64,1}:
 1.0
 1.0
 1.0

In [30]:
fill!(v, 0.5)

3-element Array{Float64,1}:
 0.5
 0.5
 0.5

In [31]:
v'  # ' is the transpose operator (conjugate transpose for complex arrays)

1×3 Array{Float64,2}:
 0.5  0.5  0.5

Sequences are generated with the `:` as in `R`.  Note that `1:5` is an integer sequence but `1.:5.` is a unit-step floating point sequence.

In [32]:
1:5

1:5

In [33]:
typeof(1:5)

UnitRange{Int64}

In [34]:
typeof(1.:5.)

FloatRange{Float64}

Vectors are constructed using brackets

In [35]:
a = [sqrt(2.), pi, exp(1.)]

3-element Array{Float64,1}:
 1.41421
 3.14159
 2.71828

(Unlike in `R`, it is not inefficient to use vectors as a stack, `push!`ing elements onto the end and `pop!`ing them off the end.  Also you can `append!` other vectors.)

`copy(a)` creates a copy, `copy!(a, b)` overwrites the contents of `a` with those of `b`, and `similar(a)` creates an uninitialized object of the same type and dimensions as `a`.

`repeat` allows for inner and outer repetitions through named arguments.

In [38]:
push!(a, inv(√2π))

4-element Array{Float64,1}:
 1.41421 
 3.14159 
 2.71828 
 0.398942

Unicode characters like √ and π are created in Jupyter code blocks (and in the REPL and in Atom) with the corresponding LaTeX name (`\sqrt` and `\pi`) followed by a <TAB> character).

In [46]:
fchar = repeat('A':'F', inner = 5)

30-element Array{Char,1}:
 'A'
 'A'
 'A'
 'A'
 'A'
 'B'
 'B'
 'B'
 'B'
 'B'
 'C'
 'C'
 'C'
 ⋮  
 'D'
 'D'
 'E'
 'E'
 'E'
 'E'
 'E'
 'F'
 'F'
 'F'
 'F'
 'F'

In [43]:
string(fchar[1])

"A"

As can be seen, characters, delimited with single quotes, are different from strings, delimited with double quotes.

Julia can map a function to the elements of an array using `map()` or using the "dot-notation", as in

In [44]:
fstring = string.(fchar)

30-element Array{String,1}:
 "A"
 "A"
 "A"
 "A"
 "A"
 "B"
 "B"
 "B"
 "B"
 "B"
 "C"
 "C"
 "C"
 ⋮  
 "D"
 "D"
 "E"
 "E"
 "E"
 "E"
 "E"
 "F"
 "F"
 "F"
 "F"
 "F"

Alternatives are to use `map()` or to use a *comprehension*

In [47]:
map(string, fchar)

30-element Array{String,1}:
 "A"
 "A"
 "A"
 "A"
 "A"
 "B"
 "B"
 "B"
 "B"
 "B"
 "C"
 "C"
 "C"
 ⋮  
 "D"
 "D"
 "E"
 "E"
 "E"
 "E"
 "E"
 "F"
 "F"
 "F"
 "F"
 "F"

In [48]:
[string(x) for x in fchar]

30-element Array{String,1}:
 "A"
 "A"
 "A"
 "A"
 "A"
 "B"
 "B"
 "B"
 "B"
 "B"
 "C"
 "C"
 "C"
 ⋮  
 "D"
 "D"
 "E"
 "E"
 "E"
 "E"
 "E"
 "F"
 "F"
 "F"
 "F"
 "F"