This notebook is an implementation of the Andrew Ng Coursera course material on machine learning. It is not part
of the official coursera class material nor is it officially supported by eithter Prof. Ng nor Coursera.

Jupyter Notebook
----------------

In Jupyter notebooks you can execute Julia (and Python, R and more) code from within your browser. A notebook is organized in *cells*, which either contain Markdown text, Figures or actual  (Julia) code. The code in each cell is executed by pressing "shift+enter" while the cursor is in the corresponding cell. Variables, functions etc. which are defined in a cell are stored in memory after execution. They are available from the global name space and hence accessible from other cells too (only after a first execution though). You can insert or remove cells from the menu bar.


Installation notes
--

This notebook is designed to run under Julia v0.6.

Julia comes with its own package managing system. This functionality is accessed by the **Pkg.method_name** syntax. To install new packages simply type **Pkg.add("Package-Name")** from a Jupyter notebook or the standard Julia command line (REPL).

For plotting we will use the Plots.jl package. Plots.jl is a metapackage which ties together multiple plotting libraries available for Julia and provides a unified interface. It is possible to install different plotting backends and it is encouraged to install and try out some of them for yourself and settle with what suits you best.
By default we will be using *PlotlyJS*.
--> If you are on linux or os-x make sure to have cmake installed before continuing with de steps below to install PlotlyJS.

In [2]:
# Install missing packages for plotting.
# This cell has to be executed only the very first time you use this
# Jupyter notebook.
# Execute this cell with "shift+enter"

Pkg.add("Plots")
Pkg.add("Optim")
Pkg.add("DataFrames")
Pkg.update()

[1m[36mINFO: [39m[22m[36mPackage Plots is already installed
[39m[1m[36mINFO: [39m[22m[36mPackage Optim is already installed
[39m[1m[36mINFO: [39m[22m[36mPackage DataFrames is already installed
[39m[1m[36mINFO: [39m[22m[36mUpdating METADATA...
[39m[1m[36mINFO: [39m[22m[36mUpdating MbedTLS master...
[39m[1m[36mINFO: [39m[22m[36mComputing changes...
[39m[1m[36mINFO: [39m[22m[36mNo packages to install, update or remove
[39m

Machine Learning Online Class - Exercise 1: Linear Regression

Instructions
------------

This file contains code that helps you get started on the
linear regression exercise. You will need to complete the following functions:

   - plot_data()
   - compute_cost()


`x` refers to the population size in 10,000s and `y` to profit in $10,000s


# 1 Warm up

First let's get comfortable with executing code in the Jupyter notebook and some
Julia basics.
Base Julia functionality can be extended by importing packages.
For visualization of data we will be using the `Plots` package. Additionally the `include()` function is used to import code from single (user generated) files. Loading a package for the first time usually takes a while as it needs to be compiled first.

In [3]:
# First import some necessary packages and files.
using Plots
include("grad_descent.jl");

[1m[36mINFO: [39m[22m[36mPrecompiling module Reexport.
[39m[1m[36mINFO: [39m[22m[36mPrecompiling module StaticArrays.
[39m[1m[36mINFO: [39m[22m[36mPrecompiling module RecipesBase.
[39m[1m[36mINFO: [39m[22m[36mPrecompiling module PlotUtils.
[39m[1m[36mINFO: [39m[22m[36mPrecompiling module PlotThemes.
[39m[1m[36mINFO: [39m[22m[36mPrecompiling module Showoff.
[39m[1m[36mINFO: [39m[22m[36mPrecompiling module StatsBase.
[39m[1m[36mINFO: [39m[22m[36mPrecompiling module NaNMath.
[39m[1m[36mINFO: [39m[22m[36mPrecompiling module FileIO.
[39m[1m[36mINFO: [39m[22m[36mPrecompiling module Requires.
[39m

Learn how to use the `eye()` function to return the identity matrix. You can access help and documentation
by typing a question mark in front of the function name and hitting "shift+enter". Use `?eye` to learn about the
`eye` function and use it to create a 5 by 5 identity matrix `M`.

The line **eye([T::Type=Float64,] m::Integer, n::Integer)** in the docs lets you know that the function can take an optional argument specifying the type of the matrix elements, and two integers to define the shape of the matrix.

In [5]:
?eye

# The pound sign indicates comments in Julia.
# Remove the # from the last line and fill in the
# necessary code to create the matrix M
# M = eye(...)

search: [1me[22m[1my[22m[1me[22m K[1me[22m[1my[22m[1mE[22mrror sp[1me[22m[1my[22m[1me[22m [1me[22mlt[1my[22mp[1me[22m k[1me[22m[1my[22mtyp[1me[22m sup[1me[22mrt[1my[22mp[1me[22m cod[1me[22m_t[1my[22mp[1me[22md @cod[1me[22m_t[1my[22mp[1me[22md



```
eye([T::Type=Float64,] m::Integer, n::Integer)
```

`m`-by-`n` identity matrix. The default element type is [`Float64`](@ref).

```
eye(m, n)
```

`m`-by-`n` identity matrix.

```
eye([T::Type=Float64,] n::Integer)
```

`n`-by-`n` identity matrix. The default element type is [`Float64`](@ref).

```
eye(A)
```

Constructs an identity matrix of the same dimensions and type as `A`.

```jldoctest
julia> A = [1 2 3; 4 5 6; 7 8 9]
3×3 Array{Int64,2}:
 1  2  3
 4  5  6
 7  8  9

julia> eye(A)
3×3 Array{Int64,2}:
 1  0  0
 0  1  0
 0  0  1
```

Note the difference from [`ones`](@ref).


Note, there are several methods of the function `eye()`. What this means is that there are several methods named  `eye` but take different input parameters. Depending on the input parameters you provide, the compiler will figure out what exact version to use. Take a look at all the different methods of the function `eye` (or any other function you want to examine) with `methods(functionname)`. Being able to execute specialized functions depending on the the input parameter types is a big part of what makes Julia fast.

In [6]:
methods(eye)

Finally, one possible source of confusion when using a Jupyter notebook for the first time is the fact that code can be executed "out of order". This results from the fact that code cells can be executed independently from each other, which can have unexpected consequences for the state of your variables. As an example look at the three code cells below (A to C). If you execute A then B then C, `mysum` will be equal to 10. However, if you execute A, B, B, C, then `mysum = 20`. Always remember to check for the right order of evaluation of your code cells.

In [17]:
# Cell A

mysum = 0

0

In [18]:
# Cell B

for i in 1:10
    mysum += 1
end

In [19]:
# Cell C

@show(mysum);

mysum = 10


# 2 Linear Regression with one Variable
In this part of this exercise, you will implement linear regression with one
variable to predict profits for a food truck. Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for
profits and populations from the cities. You would like to use this data to help you select which city to expand
to next.

The file **data/ex1data1.txt** contains the dataset for our linear regression problem. The first column is the population of a city and the second column is the profit of a food truck in that city. A negative value for profit indicates a loss.

## 2.1 Loading and Plotting Data
Before starting on any task, it is often useful to understand the data by visualizing it. For this dataset, you can use a scatter plot to visualize the data, since it has only two properties to plot (profit and population). Many other problems that you will encounter in real life are multi-dimensional and can’t be plotted on a 2-d plot.

The dataset is loaded from the data file into the variables `x`
and `y` but you have to complete the code in `plot_data()`.

Find docs under https://juliaplots.github.io/ and try to get a result similar to the graph below.


![](../figures/Fig_1.png)

In [31]:
data = readdlm("../data/ex1data1.txt", ',')
x = data[:, 1]
y = data[:, 2]

function plot_data(x, y)
  # Plot the data points x and y into a new figure.
  # Instructions: Plot the training data into a figure using the
  #               "scatter" command. Set the axes labels using
  #               the "xaxis!" and "yaxis!" commands and change values
  #               of the configuration parameters in the scatter function
  #               to get a nice visualization.
  #
  # ====================== YOUR CODE HERE =====================

  scatter(x, y, marker=:o, markersize=1, color=:red, label="label")
  xaxis!("<your x label here>")
  yaxis!("<your y label here>")
end

plot_data(x, y)

## 2.2 Gradient Descent
In this part, you will fit the linear regression parameters $\theta$ to our dataset
using gradient descent.

### 2.2.1 Update Equations
The objective of linear regression is to minimize the cost function

$$ J(\theta) = \frac{1}{2m}\sum_{i=1}^m\left(h_{\theta}(x^{(i)}) - y^{(i)}\right)^2$$

where the hypothesis $h_{\theta}(x)$ is given by the linear model

$$ h_{\theta} = \theta^\top x = \theta_0 + \theta_1x_1$$

Recall that the parameters of your model are the $\theta_j$ values. These are
the values you will adjust to minimize cost $J(\theta)$. One way to do this is to
use the batch gradient descent algorithm. In batch gradient descent, each
iteration performs the update

$$\theta_j := \theta_j - \alpha\frac{1}{m}\sum_{i=1}^m\left(h_{\theta}(x^{(i)}) - y^{(i)}\right)x_j^{(i)}$$
(simultaneously update $\theta_j$ for all $j$)

With each step of gradient descent, your parameters $\theta_j$ come closer to the optimal values that will achieve the lowest cost $J(\theta)$.

***
**Implementation note:** We store each example as a row in the the  matrix `X`. To take into account the intercept term $\theta_0$, we add an additional first column to `X` and set it to all ones. This allows us to treat $\theta_0$ as simply another ‘feature’.
***

### 2.2.2 Implementation

We have already set up the data for linear regression. In the
following lines, we add another dimension to our data to accommodate the intercept term. We also initialize the initial parameters to 0 and the learning rate alpha to 0.01.

As you perform gradient descent to learn minimize the cost function $J(\theta)$,
it is helpful to monitor the convergence by computing the cost. In this
section, you will implement a function to calculate $J(\theta)$ so you can check the
convergence of your gradient descent implementation.
Your next task is to complete the code of `compute_cost()` which
is a function that computes $J(\theta)$. As you are doing this, remember that the
variables `X` and `y` are not scalar values, but matrices whose rows represent
the examples from the training set.
Once you have completed the function and execute the code you will see the cost
printed to the screen.
You should expect to see a cost of  **J = 32.07**.

The next two cells will then perform gradient descent and plot the results.

In [32]:
function compute_cost(X, y, theta)
  # Compute cost for linear regression
  # Computes the cost J of using theta as the
  # parameter for linear regression to fit the data points in X and y

  # Instructions: Compute the cost of a particular choice of theta
  #               You should set J to the cost.
  
  # Initialize some useful values, such as the number of training examples m.
  m = length(y)
  
  # h = ...
  # J = ...

  # ====================== will be removed ======================
  h = X * theta;
  J = 1/(2*m) * sum((h .- y).^2)
  # ============================================================
  return J
end

m = length(y)
X = [ones(m) x]      # Add a column of ones to x
theta = zeros(2, 1)  # initialize fitting parameters

# Some gradient descent settings
iterations = 1500
alpha = 0.01

J = compute_cost(X, y, theta)
@show(J);

J = 32.072733877455676


In [33]:
J_history = gradient_descent!(theta, X, y, alpha, iterations);
@show(theta)
@show(J_history[end]);

theta = [-3.63029; 1.16636]
J_history[end] = 4.483388256587725


In [34]:
plot_data(x, y)
plot!(X[:,2], X*theta, color=:black, label="Model", show=true)

# Predict values for population sizes of 35,000 and 70,000
predict1 = ([1 3.5] * theta) * 10000
predict2 = ([1 7] * theta) * 10000

println("For population = 35,000, we predict a profit of $predict1")
println("For population = 70,000, we predict a profit of $predict2")

Created new window in existing browser session.
For population = 35,000, we predict a profit of [4519.77]
For population = 70,000, we predict a profit of [45342.5]


[2036:2074:1009/220708.592491:ERROR:browser_gpu_channel_host_factory.cc(103)] Failed to launch GPU process.


### Part 4: Visualizing J(theta_0, theta_1)
In this part you do not have to program anything, but make sure you understand the following code and the output it generates.

In [13]:
# Grid over which we will calculate J
theta0_vals = linspace(-10, 10, 100);
theta1_vals = linspace(-1, 4, 100);

# initialize J_vals to a matrix of 0's
J_vals = zeros(length(theta0_vals), length(theta1_vals));

# Fill out J_vals
for i = 1:length(theta0_vals)
    for j = 1:length(theta1_vals)
       t = [theta0_vals[i], theta1_vals[j]];    
       J_vals[i,j] = compute_cost(X, y, t);
    end
end

contour(theta0_vals, theta1_vals, J_vals', levels=60, title="J as function of theta")
scatter!([theta[1]], [theta[2]])
xlabel!("theta0")
ylabel!("theta1")