# Julia for Data Analysis

## Bogumił Kamiński

# Lecture 2. Getting started with Julia

I recommend you refer to the [Julia Manual](https://docs.julialang.org/en/v1/) for
a complete introduction to Julia programming.

## Representing values

In [None]:
using Pkg
Pkg.activate(Base.current_project())

In [1]:
1

1

In [2]:
true

true

In [3]:
"Hello world!"

"Hello world!"

In [4]:
0.1

0.1

In [5]:
[1, 2, 3]

3-element Vector{Int64}:
 1
 2
 3

## Checking type of value

In [6]:
typeof(1)

Int64

In [7]:
typeof(true)

Bool

In [8]:
typeof("Hello world!")

String

In [9]:
typeof(0.1)

Float64

In [10]:
typeof([1, 2, 3])

Vector{Int64} (alias for Array{Int64, 1})

## Checking binary representation of a value

In [11]:
bitstring(1)

"0000000000000000000000000000000000000000000000000000000000000001"

In [12]:
bitstring(1.0)

"0011111111110000000000000000000000000000000000000000000000000000"

In [13]:
bitstring(Int8(1))

"00000001"

### Testing type of a value using `isa`

In [14]:
[1, 2, 3] isa Vector{Int}

true

In [15]:
[1, 2, 3] isa Array{Int64, 1}

true

## Binding of variable names to values (no copying of data is pefromed)

In [16]:
x = 1

1

In [17]:
y = [1, 2, 3]

3-element Vector{Int64}:
 1
 2
 3

### Using Unicode in variable names

In [18]:
Kamiński = 1

1

In [19]:
x₁ = 0.5

0.5

In [20]:
ε = 0.0001

0.0001

In [21]:
? ₁

"[36m₁[39m" can be typed by [36m\_1<tab>[39m

search: x[0m[1m₁[22m

Couldn't find [36m₁[39m
Perhaps you meant x₁, x, y, ε, !, %, &, ', *, +, -, /, :, <, >, \, ^, |, ~ or ÷


No documentation found.

Binding `₁` does not exist.


In [22]:
? ε

"[36mε[39m" can be typed by [36m\varepsilon<tab>[39m

search: [0m[1mε[22m



No documentation found.

`ε` is of type `Float64`.

# Summary

```
primitive type Float64
```

# Supertype Hierarchy

```
Float64 <: AbstractFloat <: Real <: Number <: Any
```


## Most important control flow constructs

### Performing computations depending on Boolean condition

#### Conditional evaluation

In [23]:
x = -7

-7

In [24]:
if x > 0
    println("positive")
elseif x < 0
    println("negative")
elseif x == 0
    println("zero")
else
    println("unexpected condition")
end

negative


In [25]:
x = -7

-7

In [26]:
if x
    println("condition was true")
end

LoadError: TypeError: non-boolean (Int64) used in boolean context

#### Comparisons of floating point values

In [27]:
NaN > 0

false

In [28]:
NaN >= 0

false

In [29]:
NaN < 0

false

In [30]:
NaN <= 0

false

In [31]:
NaN == 0

false

In [32]:
NaN != 0

true

In [33]:
NaN != NaN

true

In [34]:
0.1 + 0.2 == 0.3

false

In [35]:
0.1 + 0.2

0.30000000000000004

In [36]:
isapprox(0.1 + 0.2, 0.3)

true

In [37]:
0.1 + 0.2 ≈ 0.3

true

In [38]:
? ≈

"[36m≈[39m" can be typed by [36m\approx<tab>[39m

search: [0m[1m≈[22m



```
isapprox(x, y; atol::Real=0, rtol::Real=atol>0 ? 0 : √eps, nans::Bool=false[, norm::Function])
```

Inexact equality comparison. Two numbers compare equal if their relative distance *or* their absolute distance is within tolerance bounds: `isapprox` returns `true` if `norm(x-y) <= max(atol, rtol*max(norm(x), norm(y)))`. The default `atol` is zero and the default `rtol` depends on the types of `x` and `y`. The keyword argument `nans` determines whether or not NaN values are considered equal (defaults to false).

For real or complex floating-point values, if an `atol > 0` is not specified, `rtol` defaults to the square root of [`eps`](@ref) of the type of `x` or `y`, whichever is bigger (least precise). This corresponds to requiring equality of about half of the significand digits. Otherwise, e.g. for integer arguments or if an `atol > 0` is supplied, `rtol` defaults to zero.

The `norm` keyword defaults to `abs` for numeric `(x,y)` and to `LinearAlgebra.norm` for arrays (where an alternative `norm` choice is sometimes useful). When `x` and `y` are arrays, if `norm(x-y)` is not finite (i.e. `±Inf` or `NaN`), the comparison falls back to checking whether all elements of `x` and `y` are approximately equal component-wise.

The binary operator `≈` is equivalent to `isapprox` with the default arguments, and `x ≉ y` is equivalent to `!isapprox(x,y)`.

Note that `x ≈ 0` (i.e., comparing to zero with the default tolerances) is equivalent to `x == 0` since the default `atol` is `0`.  In such cases, you should either supply an appropriate `atol` (or use `norm(x) ≤ atol`) or rearrange your code (e.g. use `x ≈ y` rather than `x - y ≈ 0`).   It is not possible to pick a nonzero `atol` automatically because it depends on the overall scaling (the "units") of your problem: for example, in `x - y ≈ 0`, `atol=1e-9` is an absurdly small tolerance if `x` is the [radius of the Earth](https://en.wikipedia.org/wiki/Earth_radius) in meters, but an absurdly large tolerance if `x` is the [radius of a Hydrogen atom](https://en.wikipedia.org/wiki/Bohr_radius) in meters.

!!! compat "Julia 1.6"
    Passing the `norm` keyword argument when comparing numeric (non-array) arguments requires Julia 1.6 or later.


# Examples

```jldoctest
julia> isapprox(0.1, 0.15; atol=0.05)
true

julia> isapprox(0.1, 0.15; rtol=0.34)
true

julia> isapprox(0.1, 0.15; rtol=0.33)
false

julia> 0.1 + 1e-10 ≈ 0.1
true

julia> 1e-10 ≈ 0
false

julia> isapprox(1e-10, 0, atol=1e-8)
true

julia> isapprox([10.0^9, 1.0], [10.0^9, 2.0]) # using `norm`
true
```

---

```
isapprox(x; kwargs...) / ≈(x; kwargs...)
```

Create a function that compares its argument to `x` using `≈`, i.e. a function equivalent to `y -> y ≈ x`.

The keyword arguments supported here are the same as those in the 2-argument `isapprox`.

!!! compat "Julia 1.5"
    This method requires Julia 1.5 or later.



#### Combining several logical conditions

In [39]:
x = -7

-7

In [40]:
x > 0 && x < 10

false

In [41]:
0 < x < 10

false

In [42]:
x < 0 || log(x) > 10

true

#### Short-circut evaluation

In [43]:
x = -7

-7

In [44]:
log(x)

LoadError: DomainError with -7.0:
log will only return a complex result if called with a complex argument. Try log(Complex(x)).

In [45]:
x > 0 && println(x)

false

In [46]:
if x > 0
    println(x)
end

In [47]:
x > 0 || println(x)

-7


In [48]:
if !(x > 0)
    println(x)
end

-7


In [49]:
x = -7

-7

In [50]:
x < 0 && println(x^2)

49


In [51]:
if x < 0
    println(x^2)
end

49


In [52]:
iseven(x) || println("x is odd")

x is odd


In [53]:
if !iseven(x)
    println("x is odd")
end

x is odd


In [54]:
x = -7

-7

In [55]:
if x < 0 && x^2
    println("inside if")
end

LoadError: TypeError: non-boolean (Int64) used in boolean context

#### Ternary operator

In [56]:
x > 0 ? sqrt(x) : sqrt(-x)

2.6457513110645907

In [57]:
if x > 0
    sqrt(x)
else
    sqrt(-x)
end

2.6457513110645907

In [58]:
x = -7

-7

In [59]:
x > 0 ? println("x is positive") : println("x is not positive")

x is not positive


#### Conditional expressions have value

In [60]:
x = -4.0

-4.0

In [61]:
y = if x > 0
        sqrt(x)
    else
        sqrt(-x)
    end

2.0

In [62]:
y

2.0

In [63]:
x = 9.0

9.0

In [64]:
y = x > 0 ? sqrt(x) : sqrt(-x)

3.0

In [65]:
y

3.0

## Loops

### `for` loop

In [66]:
for i in [1, 2, 3]
    println(i, " is ", isodd(i) ? "odd" : "even")
end

1 is odd
2 is even
3 is odd


### `while` loop

In [67]:
i = 1

1

In [68]:
while i < 4
    println(i, " is ", isodd(i) ? "odd" : "even")
    global i += 1
end

1 is odd
2 is even
3 is odd


### `continue` and `break` keywords

In [69]:
i = 0

0

In [70]:
while true
    global i += 1
    i > 6 && break
    isodd(i) && continue
    println(i, " is even")
end

2 is even
4 is even
6 is even


## Compound expressions

In [71]:
x = -7

-7

In [72]:
x < 0 && begin
    println(x)
    x += 1
    println(x)
    2 * x
end

-7
-6


-12

In [73]:
x > 0 ? (println(x); x) : (x += 1; println(x); x)

-5


-5

## First approach to calculating the winsorized mean

In [74]:
x = [8, 3, 1, 5, 7]

5-element Vector{Int64}:
 8
 3
 1
 5
 7

In [75]:
k = 1

1

In [76]:
y = sort(x)

5-element Vector{Int64}:
 1
 3
 5
 7
 8

In [77]:
for i in 1:k
    y[i] = y[k + 1]
    y[end - i + 1] = y[end - k]
end

In [78]:
y

5-element Vector{Int64}:
 3
 3
 5
 7
 7

In [79]:
s = 0

0

In [80]:
for v in y
    global s += v
end

In [81]:
s

25

In [82]:
s / length(y)

5.0

## Defining functions

### `function` keyword

In [83]:
function times_two(x)
    return 2 * x
end

times_two (generic function with 1 method)

In [84]:
times_two(10)

20

### Positional and keyword arguments

In [85]:
function compose(x, y=10; a, b=10)
    return x, y, a, b
end

compose (generic function with 2 methods)

In [86]:
compose(1, 2; a=3, b=4)

(1, 2, 3, 4)

In [87]:
compose(1, 2; a=3)

(1, 2, 3, 10)

In [88]:
compose(1; a=3)

(1, 10, 3, 10)

In [89]:
compose(1) # missing keyword argument that is required

LoadError: UndefKeywordError: keyword argument a not assigned

In [90]:
compose(; a=3)  # missing positional argument that is required

LoadError: MethodError: no method matching compose(; a=3)
[0mClosest candidates are:
[0m  compose([91m::Any[39m) at In[85]:1[91m got unsupported keyword argument "a"[39m
[0m  compose([91m::Any[39m, [91m::Any[39m; a, b) at In[85]:1

### Passing arguments to functions does not copy them

In [91]:
function f!(x)
    x[1] = 10
    return x
end

f! (generic function with 1 method)

In [92]:
x = [1, 2, 3]

3-element Vector{Int64}:
 1
 2
 3

In [93]:
f!(x)

3-element Vector{Int64}:
 10
  2
  3

In [94]:
x

3-element Vector{Int64}:
 10
  2
  3

### Short syntax for defining functions using `=`

In [95]:
times_two(x) = 2 * x

times_two (generic function with 1 method)

In [96]:
compose(x, y=10; a, b=10) = x, y, a, b

compose (generic function with 2 methods)

### Functions are values and can be passed as arguments

In [97]:
map(times_two, [1, 2, 3])

3-element Vector{Int64}:
 2
 4
 6

### Anonymous functions

In [98]:
map(x -> 2 * x, [1, 2, 3])

3-element Vector{Int64}:
 2
 4
 6

In [99]:
sum(x -> x ^ 2, [1, 2, 3])

14

### `do` blocks

In [100]:
sum([1, 2, 3]) do x
    println("processing ", x)
    return x ^ 2
end

processing 1
processing 2
processing 3


14

### Functions modifying their argument have names ending with `!` by convention

In [101]:
x = [5, 1, 3, 2]

4-element Vector{Int64}:
 5
 1
 3
 2

In [102]:
sort(x)

4-element Vector{Int64}:
 1
 2
 3
 5

In [103]:
x

4-element Vector{Int64}:
 5
 1
 3
 2

In [104]:
sort!(x)

4-element Vector{Int64}:
 1
 2
 3
 5

In [105]:
x

4-element Vector{Int64}:
 1
 2
 3
 5

### A simplified definition of a function computing Winsorized mean

In [106]:
function winsorized_mean(x, k)
    y = sort(x)
    for i in 1:k
        y[i] = y[k + 1]
        y[end - i + 1] = y[end - k]
    end
    s = 0
    for v in y
        s += v
    end
    return s / length(y)
end

winsorized_mean (generic function with 1 method)

In [107]:
winsorized_mean([8, 3, 1, 5, 7], 1)

5.0

## The `nothing` value

In [108]:
nothing1() = return

nothing1 (generic function with 1 method)

In [109]:
nothing1()

In [110]:
nothing2() = for i in 1:5
    print(i)
end

nothing2 (generic function with 1 method)

In [111]:
nothing2()

12345