In [25]:
using Plots; gr()
using Interact

## Functions

A **function** is the most fundamental concept in computing (and also in mathematics). 
A function is a piece of a program that receives **input arguments**, processes them by doing certain calculations on them, and returns **outputs**.

Julia allows us to define simple functions in a simple way. For example, a sigmoid function that we will use extensively later in the course can be defined by the equation

$$\sigma(x) := \frac{1}{1 + \exp(-x)}.$$

Julia allows us to define this function with the following simple syntax:

In [26]:
σ(x) = 1 / (1 + exp(-x))

σ (generic function with 1 method)

We can plot this function to see what it looks like:

In [27]:
plot(σ, -5, 5)
hline!([0, 1], ls=:dash, lw=3)
vline!([0], ls=:dash, lw=3)

This particular function takes any real number as input, and gives an output between $0$ and $1$. It is continuous and smooth.

## Parameters

Instead of a single function, we can think about a whole class (set) of functions that look similar but differ in the value of a **parameter**.  Let's make a new function that uses the previous $\sigma$ function, but also has a parameter, $w$. Note that Julia treats parameters just as extra arguments:

$$f_w(x) = f(x; w) = \sigma(w \, x).$$

Mathematically speaking, we can think of $f_w$ as a different function for each different value of the parameter $w$.

In Julia, this becomes

In [28]:
f(x, w) = σ(w * x)

f (generic function with 1 method)

Note that Julia just treats parameters as additional arguments to the function.

Mathematically we can write this in two different ways:

$$f(x; w) = f_w(x).$$

We can now investigate the effect of $w$ interactively. To do so, we need a way of writing in Julia "the function of one variable $x$ that we obtain when we fix the value of $w". We write this as an "anonymous function":

    x -> f(x, w)
    
We can read this as "the function that maps $x$ to the value of $f(x, w)$. 

In [29]:
@manipulate for w in -2:0.01:2
    plot(x->f(x, w), -5, 5, ylims=(0, 1))
end

## Fitting a function to data

Suppose we are given a single data point $(x_1, y_1) = (2, 0.8)$. We can try to "fit" a function $f_w$ by adjusting the parameter $w$ until the function passes through the data:

In [30]:
@manipulate for w in -2:0.01:2
    plot(x->f(x, w), -5, 5, ylims=(0, 1))
    scatter!([2], [0.8])
end

We can calculate how we are from the goal, for example by finding the vertical distance from the curve to the point.
We will call this function $C$; it is a function of $w$:

$$C(w) = (y_1 - f(x_1, w))^2.$$

In [31]:
@manipulate for w in -2:0.01:2
    plot(x->f(x, w), -5, 5, ylims=(0, 1))

    x1, y1 = 2, 0.8
    
    plot!([x1, x1], [y1, f(x1, w)])
    scatter!([x1], [y1])
    title!("Distance^2 = $((y1 - f(x1, w))^2)")

end

Let's draw $C(w)$ as a function of the parameter $w$:

In [32]:
x0, y0 = 2.0, 0.8

plot(w->(y0 - f(x0, w))^2, -3, 3, xlabel="w", ylabel="C(w)", ylims=(0, 0.7))

We see that there is a special value of $w$ where the function $C$ reaches $0$, since for this value of $w$, the graph of $f$ does pass exactly through the point $(x_0, y_0)$. We could find the place $w^*$ where the function hits $0$ by zooming in on that piece of the graph.

In [33]:
x0, y0 = 2.0, 0.8
@manipulate for w in -2:0.1:2
    plot(w->(y0 - f(x0, w))^2, -3, 3, xlabel="w", ylabel="C(w)", ylims=(0, 0.7))
    vline!([w])
    title!("Vertical line at w = $w")
end

Why did we use such a complicated function $C$ with those squares inside? We could just take the distance (instead of the distance squared) using the absolute value function:

In [34]:
x0, y0 = 2.0, 0.8

plot(w->abs(y0 - f(x0, w)), -3, 3, xlabel="w", ylabel="C_abs(w)", ylims=(0, 0.7))

Now we see why squares are generally preferred: using the absolute value gives a cost function that is *not smooth*. This makes it difficult to use methods from calculus to find the minimum. Nonetheless, using non-smooth functions is much more common nowadays.

## More data

Suppose there are now two data points to fit, the previous $(x_1, y_1)$ together with $(x_2, y_2) = (-3, 0.3)$.
We will calculate the cost function as the sum of the squared vertical distances from the graph to each data point:

In [35]:
xs = [2, -3]
ys = [0.8, 0.3]

@manipulate for w in -2:0.01:2
    plot(x->f(x, w), -5, 5, ylims=(0, 1))
    
    scatter!(xs, ys)

    for i in 1:2
        plot!([xs[i], xs[i]], [ys[i], f(xs[i], w)])
    end
    
    d² = sum(abs2, ys .- f.(xs, w))
    
    title!("Distance^2 = $(d²)")

end

In [36]:
?sum

search: [1ms[22m[1mu[22m[1mm[22m [1ms[22m[1mu[22m[1mm[22m! [1ms[22m[1mu[22m[1mm[22mabs [1ms[22m[1mu[22m[1mm[22mmary [1ms[22m[1mu[22m[1mm[22mabs2 [1ms[22m[1mu[22m[1mm[22m_kbn cum[1ms[22m[1mu[22m[1mm[22m cum[1ms[22m[1mu[22m[1mm[22m! con[1ms[22m[1mu[22m[1mm[22me



```
sum(f, itr)
```

Sum the results of calling function `f` on each element of `itr`.

```jldoctest
julia> sum(abs2, [2; 3; 4])
29
```

```
sum(itr)
```

Returns the sum of all elements in a collection.

```jldoctest
julia> sum(1:20)
210
```

```
sum(A, dims)
```

Sum elements of an array over the given dimensions.

```jldoctest
julia> A = [1 2; 3 4]
2×2 Array{Int64,2}:
 1  2
 3  4

julia> sum(A, 1)
1×2 Array{Int64,2}:
 4  6

julia> sum(A, 2)
2×1 Array{Int64,2}:
 3
 7
```


After playing with this for a while, it is intuitively obvious that we cannot make the function pass through both data points for any value of $w$. 
Now we have the following cost function:

$$C(w) = \sum_i [y_i - f_w(x_i)]^2$$

Let's plot it:

In [37]:
C(w) = sum(abs2, ys .- f.(xs, w))

plot(C, -1, 1, xlabel="w", ylabel="C(w)", ylims=(0, 1.2))

We see that $C$ has a minimum close to $0$ for a special value of $w$; let's call it $w^*$. From the graph, we can see that it's around $w^* \simeq 0.4$. We could again zoom in on that region of the graph to estimate it more precisely.

Isn't there a better way of using the computer to find this value of $w^*$?

## More parameters

If we add more parameters to a function, we may be able to improve how it fits to data. For example, we could define a new function $g$ with another parameter, a shift or **bias**:

$$g(x; w, b) := \sigma(w \, x) + b$$

In [38]:
g(x, w, b) = σ(w*x) + b

g (generic function with 1 method)

Let's try to fit this to the data:

In [39]:
xs = [2, -3]
ys = [0.8, 0.3]

@manipulate for w in -2:0.01:2, b in -2:0.01:2
    plot(x->g(x, w, b), -5, 5, ylims=(0, 1))
    
    scatter!(xs, ys)

    for i in 1:2
        plot!([xs[i], xs[i]], [ys[i], g(xs[i], w, b)])
    end
    
    d² = sum(abs2, ys .- g.(xs, w, b))
    
    title!("Distance^2 = $(d²)")

end

You should be able to convince yourself that we can now make the curve pass through both points simultaneously. 
Let's look at the graph of $C_2$, the cost function with both parameters:

In [40]:
C2(w, b) = sum(abs2, ys .- g.(xs, w, b))

C2 (generic function with 1 method)

In [41]:
?surface

search: [1ms[22m[1mu[22m[1mr[22m[1mf[22m[1ma[22m[1mc[22m[1me[22m [1ms[22m[1mu[22m[1mr[22m[1mf[22m[1ma[22m[1mc[22m[1me[22m! [1mS[22m[1mu[22m[1mr[22m[1mf[22m[1ma[22m[1mc[22m[1me[22m [1ms[22mch[1mu[22m[1mr[22m[1mf[22m[1ma[22m[1mc[22mt [1ms[22mch[1mu[22m[1mr[22m[1mf[22m[1ma[22m[1mc[22mt!



No documentation found.

`Plots.surface` is a `Function`.

```
# 1 method for generic function "surface":
surface(args...; kw...) in Plots at /Users/jane/.julia/v0.6/RecipesBase/src/RecipesBase.jl:381
```


In [42]:
methods(surface)

In [43]:
plotlyjs()

Plots.PlotlyJSBackend()

In [44]:
ws = -2:0.05:2
bs = -2:0.05:2

surface(ws, bs, C2, alpha=0.8, zlims=(0,3))

If we rotate the surface around, we can see that indeed there is a unique point $(w^*, b^*)$ where the function $C_2$ attains its minimum.

If we add more data, however, we will again not be able to fit all of the data; we will only be able to attain a "best fit"

In [45]:
xs = [2, -3, -1, 1]
ys = [0.8, 0.3, 0.4, 0.4]

4-element Array{Float64,1}:
 0.8
 0.3
 0.4
 0.4

In [46]:

@manipulate for w in -2:0.01:2, b in -2:0.01:2
    plot(x->g(x, w, b), -5, 5, ylims=(-0.2, 1))
    
    scatter!(xs, ys)

    for i in 1:length(xs)
        plot!([xs[i], xs[i]], [ys[i], g(xs[i], w, b)])
    end
    
    d² = sum(abs2, ys .- g.(xs, w, b))
    
    title!("Distance^2 = $(d²)")

end

In [47]:
C2(w, b) = sum(abs2, ys .- g.(xs, w, b))

C2 (generic function with 1 method)

In [48]:
ws = -2:0.05:2
bs = -2:0.05:2

surface(ws, bs, C2, alpha=0.8, zlims=(0,3))