# MCMC0.5: Introduction to Functional Programming

## Why functional?

One may claim that C/Fortran (or sometimes C++ without functional programming extension) is enough to do physics because they are fast, simple, and have enough functionalities. I agree with all the three points, but I think it is acceptable only when you are working only by yourself. Usually, a code by procedural programming is not easy to read, and is not human-oriented. In the procedural programming, you always have to use your brain to mimic how the computer works (is a human being a slave of computers?). Most computational physicists somehow got used to this way of thinking (already brainwashed) and wrote a code as readable to the machine as possible. Of course, it is ok if you are working by yourself and your collaborators do not care about your code, but today it is more important to share a code (via Github) and discuss physics on the Jupyter Notebook (maybe in old days it was enough to have one expert for computation and the others did not even have to see the code). In that case, it is important to write a readable, understandable, and well-organized code for everybody even without any comments!

I want to define functional programming as human-oriented programming. If you want to get a mean value, why not apply a `mean` function onto the collection? Some (brainwashed) people may wonder if it is ok not to sum each time of the iteration to reduce memory, and so on. Such concerns will be resolved later by a concept of lazy evaluation (actually in MCMC6.0). Using this black box, you now do not have to care about at which step functions are actually called. Then, you can just apply statistical functions like `mean`, `std`, etc. just on the collection of data (called iterator) afterwards. This is exactly how we think when we process big data, and you can now write a code as you think.

> In purely functional programming you don't tell the computer what to do as such but rather you tell it what stuff *is*.
> -- <cite> Miran Lipovača [Learn You a Haskell for Great Good!](http://learnyouahaskell.com/introduction)</cite>

More strictly, functional programming may be defined as programming writing a code by compositions of pure functions. In other words, it is a deterministic finite-state machine (FSM). This notion is closely related to the Markov-chain Monte Carlo (MCMC) because MCMC is nothing but a nondeterministic FSM. From physicists' view, it is a "quantum" version of the functional programming (rigorously, nondeterministicity here is not quantum), and every techinique in functional programming is applicable to MCMC, while in MCMC completely pure functional programming is impossible (software cannot produce any true random numbers by itself!). From now on, I will take "pragmatism" by using functional programming as much as possible, but by including a (pseudo)nondeterministic function additionally to make it work as MCMC. In this sense, I do not stick to doing completely functional programming, but use it as a tool to make a code readable. Unless it diminishes the readability of a code, I sometimes use a destructive function only if necessary.

Julia completely enables us to write a code as you think. You can use Unicode for variables, most linear algebra functions already exist, and Julia even discards class-based object-oriented programming, which is unnecessary for MCMC! Let's see the HMC function in MCMC1.0, and compare the function to the algorithm written above the function. If you are good at elementary mathematics, you can write a code exactly like you are doing math on your notebook. Like in [this pdf](https://github.com/MCSMC/MCMC_sample_codes/blob/master/review-MCMC.pdf), people unfortunately still believe that such "how we think" programming in Mathematica is useless. However, today in 2018, Julia and Jupyter Notebook work as a super-fast version of Mathematica/MATLAB, and Julia now works at a speed comparable to C/Fortran. Of course, if you wish to use low-level operations like SSE and AVX, we still need such procedural languages, but for most purposes, just calling BLAS/LAPACK functions is enough. Even more, Julia now supports GPGPU, so even for GPGPU programming Julia can be the first choice, instead of writing CUDA/OpenCL directly.

In this series of notebooks, I will introduce not only MCMC but also functional programming, Julia-like programming, and in addition a Bayesian way of thinking. Think Bayesian, think functional, and last but not least [think Julia](https://benlauwens.github.io/ThinkJulia.jl/latest/book.html)!

## Think functional

In functional programming, a program is regarded as one big function, and this is constructed just by a sequence of functions. In order to make those functions pure, it is important to write a function in the following ways.

1. Do not use/write a destructive function.
1. Do not include a side effect in functions. In other words, functions have to do things other than returning a unique value determined from the input.
1. (optional) static typing

In Julia, every destructive function has `!` after its name:

In [1]:
a = [1, 2, 3]
push!(a, 4)
@show a;

a = [1, 2, 3, 4]


Therefore, the first condition is obeyed if you do not use any installed functions with `!`. In some case, you need to use such destructive functions to speed up, and in this case you should take pragmatism. It is recommended to add `!` when you define a new destructive function. In some sense insertion itself is regarded as a destructive operation (or a side effect), but it is usually unavoidable and we accept destructive insertion if necessary from pragmatism.

Then, how can we replace destructive functions like `push!` and `pop!`? In most cases, destructive operations on arrays can be avoided by iterators and generators (see MCMC2.0). However, for matrices and tensors we sometimes still need a destructive function to do operations in a memory-efficient way. In that case, of course we accept to use destructive functions as pragmatists, or you can formally write a pure wrapper function for the destructive operations to make your program look like a pure functional code.

Then, what is a side effect?

In [2]:
println("Hello, World!")

Hello, World!


`println` is a "function," but it is doing more than a pure function is supposed to do. It outputs words on your screen and returns nothing! Including such I/O operations, something more than a mathematically pure function should do is all called a side effect. Some side effects are avoidable and others are unavoidable. It is important to reduce side effects as much as possible, but we as pragmatists accept useful side effects like I/O operations, insertion, control syntax, etc. On the other hand, the most important thing that you should have in mind is not to rewrite a global variable inside each function. Global variables should be defined always with `const` and even it is recommended to pass global variables as arguments.

In [3]:
const b = 2
"""
square integers
"""
function square(x::Int64)::Int64
    return x ^ 2 # you can omit return
end
square(b)

4

Julia has dynamic typing, but also has a strong typing structure. You can write a code without declaring types, but if you want to speed up your code, you have to be conscious of a type. The "type stability" is the most important problem to speed up Julia, so I will discuss it later.

~ under construction ~

Unfortunately, doing completely pure functional programming in Julia is not realistic. This is mostly because of the matter of a speed for numerical calculations. Most pure functional languages are not suitable for numerics and several times slower than C/Fortran. The speed and the referential transparency are in most cases trade-off, so you should not stick to the purity. In modern pure functional languages, referential transparency is almost always guaranteed, but in Julia you have to write a referentially transparent code conciously, which is a bit exhausting. Writing a hybrid code of two styles may be important from pragmatists' view.

## Higher-order functions

The most important feature of functional programming is higher-order functions. Higher-order functions are just functions taking functions in its arguments. It is possible because functions are first-class objects in Julia.

### map

The most important higher-order function is `map`. See how it works.

In [4]:
map(square, [1, 2, 3])

3-element Array{Int64,1}:
 1
 4
 9

In Julia you won't see this function many times because there is a very nice abbreviation.

In [5]:
square.([1, 2, 3])

3-element Array{Int64,1}:
 1
 4
 9

This code is same as `map(square, ...)`. `@.` macro is also useful.

In [6]:
@. log(square([1, 2, 3]))

3-element Array{Float64,1}:
 0.0               
 1.3862943611198906
 2.1972245773362196

`map` is useful for parallelization. Just replace `map` by `pmap`, a parallel version. The details will be discussed in MCMC4.0.

In [7]:
using Distributed
addprocs(2)
@everywhere function squarese(x::Float64)::Float64
    println(x) # This is a side effect!
    x ^ 2
end
pmap(squarese, [1.0, 2.0, 3.0, 4.0])

      From worker 3:	2.0
      From worker 2:	1.0
      From worker 2:	3.0
      From worker 3:	4.0


4-element Array{Float64,1}:
  1.0
  4.0
  9.0
 16.0

When being parallelized, which element of the array is called first becomes *a priori* unknown. In this case, writing a function without a side effect is very important. You should keep in mind that the function for `pmap` should not include a side effect.

The problem is that `map` in Julia does not do lazy evaluation. Each time we call map arrays are stored in the memory.

In [8]:
map(square, 1 : 3)

3-element Array{Int64,1}:
 1
 4
 9

`1 : 3` is UnitRange and an abstract way to define `[1, 2, 3]` without storing all the elements in the memory (more abstractly it works as an iterator, see MCMC2.0), but the result becomes a vector, so the memory for the array is allocated when `map` is called.

In order to do lazy evaluation, use `Base.Generator` instead. In this way, a returned value is still something abstract called generator (see MCMC2.0 for more details).

In [9]:
const gene = Base.Generator(square, 1 : 3) # this is not a Vector

Base.Generator{UnitRange{Int64},typeof(square)}(square, 1:3)

Again it is important to use a function without a side effect because when the function will be called is *a priori* unknown. `collect` transforms abstract things to arrays by calling all the lazy functions.

In [10]:
collect(gene) # Vector

3-element Array{Int64,1}:
 1
 4
 9

Note that `IterTools.imap` has more sophisticated implementation for this lazy evaluation. I will come back to this point later in MCMC2.0 and MCMC6.0.

### filter

The next important higher-order function is `filter`.

In [11]:
"""
Filtering odd numbers
"""
function fil(x::Int64)::Bool
    x & 1 # same as x % 2 == 1
end
filter(fil, [1, 2, 3])

2-element Array{Int64,1}:
 1
 3

A lazy version is the following.

In [12]:
const iter = Iterators.filter(fil, 1 : 5)

Base.Iterators.Filter{typeof(fil),UnitRange{Int64}}(fil, 1:5)

This lazy function again returns something abstract called `Iterators.Filter`.

In [13]:
collect(iter)

3-element Array{Int64,1}:
 1
 3
 5

Finally, every abstract thing was embodied!

### reduce

In Julia, operators are functions.

In [14]:
+(2, 3)

5

This resembles Lisp.

~ figure will be inserted ~

In [15]:
reduce(+, [1, 2, 3])

6

It is simple to write just:

In [16]:
sum([1, 2, 3])

6

Functions like `sum` and `prod` (`reduce` with `+` and `*`) are generally called "reduce."

You should be careful if the function is not associative.

In [17]:
reduce(-, [3, 2, 1])

0

Reduction is done from left to right here. `reduce` is usually used in combination with `map`.

In [18]:
mapreduce(square, +, [1, 2, 3])

14

`accumulate` is a cousin of `reduce`:

In [19]:
accumulate(+, [1, 2, 3])

3-element Array{Int64,1}:
 1
 3
 6

If you have time, let's try writing a code for a lazy version of `accumulate`!

### foreach

`map`, `filter` and `reduce` were the three most important functions. Compared to those, `foreach` is not important because it can be written instead by a "for loop." `foreach` just reduces the length of your code.

In [20]:
foreach(println, ["abc", "def"])

abc
def


is same as

In [21]:
for str in ["abc", "def"]
    println(str)
end

abc
def


### zip

`zip` for arrays is not a higher-order function in a strict sense, but I will introduce it here because it is usually used in combination with other higher-order functions.

In [22]:
const z = zip([1, 2, 3], [4, 5, 6])

Base.Iterators.Zip2{Array{Int64,1},Array{Int64,1}}([1, 2, 3], [4, 5, 6])

`zip` is already lazy, so we have to collect.

In [23]:
collect(z)

3-element Array{Tuple{Int64,Int64},1}:
 (1, 4)
 (2, 5)
 (3, 6)

If one of the UnitRanges (arrays/iterators) stops, then everybody stops.

In [24]:
foreach(println, zip(1 : 2, 3 : 6))

(1, 3)
(2, 4)


It is useful to remember that "zip unzips zip."

In [25]:
const tuple = ([1, 2, 3], [2, 3, 4], [3, 4, 5])
collect(zip(zip(tuple...)...)) # tuple... means that Tuple is expanded as arguments.

3-element Array{Tuple{Int64,Int64,Int64},1}:
 (1, 2, 3)
 (2, 3, 4)
 (3, 4, 5)

## Anonymous functions and Currying

Anonymous functions (lambda expressions) are very useful. First, there are two ways to define a function.

In [26]:
"""
Mean of two values
"""
function mean1(x, y)::Float64
    (x + y) / 2
end
mean1(2.0, 3.0)

2.5

In [27]:
mean2(x, y)::Float64 = (x + y) / 2
mean2(2.0, 3.0)

2.5

In addition, there is a way to define a function without a name.

In [28]:
mean3 = (x, y) -> (x + y) / 2

#3 (generic function with 1 method)

Since this has no name, the substance of this function is just a number. A function itself is an object.

In [29]:
func = +
func(2, 3)

5

~ under construction ~

In [30]:
curriedfunc(x) = y -> func(x, y)
curriedfunc(2)(3)

5

In [31]:
map(curriedfunc(2), [1, 2, 3])

3-element Array{Int64,1}:
 3
 4
 5

## Recursion

In [32]:
"""
Fibonacci numbers

only works for n >= 0
"""
function fib(n::Int64)::Int64
    if n < 2
        n
    else
        fib(n - 2) + fib(n - 1)
    end
end
fib(30)

832040

This is slow because the same number is calculated many times.

In [33]:
"""
Fibonacci numbers with a tail call
"""
function fibtail(n::Int64, a = 0, b = 1)::Int64
    if n < 1
        a
    else
        fibtail(n - 1, b, a + b)
    end
end
fibtail(50)

12586269025

In most cases, writing functions by recursion is not a good strategy in Julia because tail call optimization is not supported (currently in Julia 1.0). Writing a loop explicitly is much better in most cases.

## Type stability

This is something specific to Julia, but very important. Julia has a strong type system, although Julia itself has dynamical typing. This is made possible by a strong type inference machine before compilation.

~ under construction ~

This is a very very bad code.

In [34]:
θbad(x) = x > 0 ? 1.0 : 0

θbad (generic function with 1 method)

In [35]:
θbad.([-2, -1, 0, 1, 2])

5-element Array{Real,1}:
 0  
 0  
 0  
 1.0
 1.0

This is called type instable because the type is different depending on the value of `x`. Real is something abstract including both integers and floating point numbers

You should instead write:

In [36]:
θint(x)::Int64 = x > 0 ? 1 : 0
θint.([-2, -1, 0, 1, 2])

5-element Array{Int64,1}:
 0
 0
 0
 1
 1

Or

In [37]:
θfloat(x)::Float64 = x > 0 ? 1.0 : 0.0
θfloat.([-2, -1, 0, 1, 2])

5-element Array{Float64,1}:
 0.0
 0.0
 0.0
 1.0
 1.0

However, I thik it is better to determine the type based on the input.

In [38]:
"""
Heaviside theta
"""
function θ(x::T)::T where T
    ifelse(x > 0, 1, 0)
end
θ.([-2, -1, 0.0, 1, 2.0])

5-element Array{Float64,1}:
 0.0
 0.0
 0.0
 1.0
 1.0

I recommend to write a type indicator as much as possible for every function.

## Tips: Nondeterministic unit test

Apparently, generating a (pseudo-)random number is a side effect because it changes its return value every time (or more accurately `rand()` function has its own internal state).

In [48]:
println(rand())
println(rand());

0.795663004491048
0.5060834942954711


Therefore, the following function is not referentially transparent.

In [49]:
function calcπ()
    hit = 0
    for i in 1 : 10000
        x = rand()
        y = rand()
        if x ^ 2 + y ^ 2 < 1.0
            hit += 1
        end
    end
    4.0 * (hit / 10000)
end
calcπ()

3.1276

In order to separate the transparent part simply, just gives random values as arguments.

In [50]:
function calcπ(xlist::Vector{Float64}, ylist::Vector{Float64})
    zlist = zip(xlist, ylist)
    hit = length(collect(Iterators.filter(z -> z[1] ^ 2 + z[2] ^ 2 < 1.0, zlist)))
    4.0 * (hit / length(zlist))
end
calcπ(rand(10000), rand(10000))

3.1488

In this way, the function `calcπ` can be made pure. Another way is the following.

~ under construction ~

It seems very much like overengineering. In the case of modern functional languages like Haskell, there is a system called monad (especially State Monad) to elegantly make a function pure hiding any side effect.