# Week 2
## Matlab Resources
### Onramp
- Go to: https://matlabacademy.mathworks.com/ and click on the MATLAB Onramp button to start learning MATLAB

### Tutorials
#### Get Started with MATLAB and MATLAB Online
- [What is MATLAB?](https://youtu.be/WYG2ZZjgp5M)\*
- [MATLAB Variables](https://youtu.be/0w9NKt6Fixk)\*
- [MATLAB as a Calculator](https://youtu.be/aRSkNpCSgWY)\*
- [MATLAB Functions](https://youtu.be/RJp46UVQBic)\*
- [Getting Started with MATLAB Online](https://youtu.be/XjzxCVWKz58)
- [Managing Files in MATLAB Online](https://youtu.be/B3lWLIrYjC0)

#### Vectors
- [Creating Vectors](https://youtu.be/R5Mnkrk9Mos)\*
- [Creating Uniformly Spaced Vectors](https://youtu.be/_zqTOV5yl8Y)\*
- [Accessing Elements of a Vector Using Conditions](https://youtu.be/8D04GW_foQ0)\*
- [Calculations with Vectors](https://youtu.be/VQaZ0TvjF0c)\*
- [Vector Transpose](https://youtu.be/vgRLwjHBmsg)

#### Visualization
- [Line Plots](https://youtu.be/-hhJoveE4sY)\*
- [Annotating Graphs](https://youtu.be/JyovEGPSdoI)\*
- [Multiple Plots](https://youtu.be/fBx8EFuXFLM)\*

#### Matrices
- [Creating Matrices](https://youtu.be/qdTdwTh6jMo)\*
- [Calculations with Matrices](https://youtu.be/mzzJ9gnMrYE)\*
- [Accessing Elements of a Matrix](https://youtu.be/uWPHxpTuZRA)\*
- [Matrix Creation Functions](https://youtu.be/VPcbpVd_mPA)\*
- [Combining Matrices](https://youtu.be/ejTr3ekTTyA)
- [Determining Array Size and Length](https://youtu.be/IF9-ffmxuy8)
- [Matrix Multiplication](https://youtu.be/4hsx3bdNjGk)
- [Reshaping Arrays](https://youtu.be/UQpDIHlFo8A)
- [Statistical Functions with Matrices](https://youtu.be/Y97W3_u7cM4)

#### MATLAB Programming
- [Logical Variables](https://youtu.be/bRMg4GsFDQ8)\*
- [If-Else Statements](https://youtu.be/JZSuU-Laigo)\*
- [Writing a FOR loop](https://youtu.be/lg65bzgvI5c)\*
- [Writing a WHILE Loop](https://youtu.be/PKH5lCMJXbk)
- [Writing Functions](https://youtu.be/GrcNN04eqXU)
- [Passing Functions as Inputs](https://youtu.be/aNCwR9dRjHs)

#### Troubleshooting
- [Using Online Documentation](https://youtu.be/54n5zJwR8aM)\*
- [Which File or Variable Am I Using?](https://youtu.be/Z09BvGeYNdE)
- [Troubleshooting Code with the Debugger](https://youtu.be/DB4aJMnZtNQ)

***Indicates content covered in Onramp**

## Multivariate Linear Regression
### Notation
- $n$ = nubmer of features
- $x^{(i)}$ = input (features) of $i^{th}$ training example
- $x_j^{(i)}$ = value of feature $j$ in $i^{th}$ training example

### Hypothesis
- For convenience of notation, we define $(x_0^{(i)}=1)$, so all $x_0$'s are equal to 1
- $h_{\theta}(x) = \theta_0 x_0 + \theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n$
- $x, \theta \in \mathbb{R}^{n+1}$
- This way it can be also written in vector form:
    - $h_{\theta}(x)=\theta^T x=\begin{bmatrix}\theta_0 & \theta_1 & \cdots & \theta_n\end{bmatrix} \cdot \begin{bmatrix}x_0 \\ x_1 \\ \vdots \\ x_n \end{bmatrix}$
    
### Cost Function
$$J(\theta)=\frac{1}{2m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})^2$$
    
### Gradient Descent $(n\geq 1)$
- Repeat the following for $j=0,\dots, n$:
$$\theta_j := \theta_j - \alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})x_j^{(i)}$$

### Feature Scaling
- Speeds gradient descent up, because $\theta$ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven
- To prevent this we modify the ranges to roughly the same
    - $-1 \leq x_{(i)} \leq 1$ or
    - $-0.5 \leq x_{(i)} \leq 0.5$
- We use **feature scaling** (division part) and **mean normalization** (subtraction part):
$$x_i := \frac{x_i - \mu_i}{s_i}$$
- $\mu_i$ is the **average** of all the values for feature (i)
- $s_i$ is the **range** of values (max - min) or the **standard deviation**

### Debugging Gradient Descent
- Make a plot with *number of iterations* on the x-axis and plot the cost function $J(\theta)$ over the number of iterations of gradient descent
- If $J(\theta)$ ever increases, then you probably need to decrease the learning rate $\alpha$
- **Automatic convergence tests** are also possible:
    - You declare convergence if $J(\theta)$ decreases by less than E in one iteration, where E is some small value such as $10^{-3}$
    - However in practice it's difficult to choose E
    
### Features and Polynomial Regression
- We can combine multiple features into one feature, e.g. width and height into one feature, area (= width x height)
- Also our hypothesis function need not be linear, if that doesn't fit the data well
- We could use a quadratic, cubic, square function etc.
    - Square function example: $h_{\theta}(x) = \theta_0 + \theta_1 x_1 + \theta_2 \sqrt{x_1}$
- Choosing features this way, don't forget that feature scaling becomes even more important

### Normal Equation
#### How it Works
- Another way of minimizing $J$
- Explicit, non-iterative, analytical way
- We minimize $J$ by explicitly taking its derivatives with respect to the $\theta_j$'s, and set them to zero
$$\theta = (X^T X)^{-1}X^Ty$$
- Don't forget to add $x_0^{(i)}$ to the $X$ matrix (which equals 1)
- There is **no need** to do feature scaling with the normal equation
- Because it needs to calculate the inverse of $X^T X$, it's slow if $n$ is very large
- As a broad rule, you should switch to gradient descend for $n \geq 10000$
- Normal equation has a runtime of $O(n^3)$, as compared to gradient descend, which has a runtime of $O(kn^2)$

#### Noninvertibility
- Using `pinv` in octave/matlab will give us a value of $\theta$ even if $X^T X$ is not invertible (singular/degenerate)
- If $X^T X$ is **noninvertible**, possible reasons are:
    - Redundant features, where two features are very closely related (i.e. linearly dependent)
    - Too many features (e.g. $m \leq n$)
        - In this case delete some features or
        - Use **regularization**
        
## Octave/Matlab Commands
### Basic Operations
```
% equal
1 == 2

% not equal
1 ~= 2

% and
1 && 0

% or
1 || 0

% xor
xor(1,0)

% change prompt
PS1('>> ')

% semicolon supresses output
a = 3;

% display (print)
disp('Hello World')

% string format
sprintf('2 decimals: %0.2f', pi)

% change how many digits should be shown
format short
format long

% generate row vector start:step:end
1:0.25:2

% you can also leave out the step param i.e. start:end, this will by default increment by 1
1:5

% generate matrix consisting of ones (row count, column count)
ones(2,3)

% generate matrix consisting of zeros (row count, column count)
zeros(1,3)

% generate matrix consisting of random values between 0 and 1 (row count, column count)
rand(1,3)

% generate matrix consisting of normally distributed random values (row count, column count)
randn(1,3)

% plot histogram (data, optional: bin/bucket count) NOTE: in matlab histogram should be used instead of hist
hist(randn(1, 10000))

% generate identity matrix for the given dimension
eye(6)

% help for given function
help eye
```

### Moving Data Around
```
% number of rows
size(A, 1)

% number of columns
size(A, 2)

% gives the size of the longest dimension, but usually only used on vectors
length(A)

% current working directory
pwd

% change directory
cd

% list files and folders
ls

% load data
load featuresX.dat

% or the same calling
load('featuresX.dat')

% shows variables in current scope
who

% or for the detailed view
whos

% remove variable from scope
clear featuresX

% get first 10 elements
priceY(1:10)

% saves variable v into file hello.mat
save hello.mat v

% clear all variables
clear

% saves in a human readable format (no metadata like variable name)
save hello.txt v -ascii

% fetch everything in the second row (":" means every element along that row/column)
A(2,:)

% fetch everything from first and third row
A([1 3], :)

% can also be used for assignments
A(:,2) = [10; 11; 12]

% append another column vector to right
A = [A, [100; 101; 102]]

% put all elements of A into a single vector
A(:)

% concat two matrices
C = [A B]

% or the same as
C = [A, B]

% or put it on the bottom
C = [A; B]
```

### Computing on Data
```
% multiple A11 with B11, A12 with B12 etc. (element-wise)
A .* B

% element-wise squaring
A .^ 2

% element-wise inverse
1 ./ A

% element-wise log
log(v)

% element-wise exp
exp(v)

% element-wise abs
abs(v)

% same as -1*v
-v

% element-wise incremental by e.g. 1
v + ones(length(v), 1)

% or use this (+ and - are element-wise)
v + 1

% returns max value and index
[val, ind] = max(a)

% element-wise comparison
a < 3

% tells the indexes of the variables for which the condition is true
find(a < 3)

% generates a matrix of n x n dimension, where all rows, columns and diagonals sum up to the same value
magic(3)

% find used on matrices, returns rows and columns
[r,c] = find(A >= 7)

% adds up all elements
sum(a)

% product of all elements
prod(a)

% round down
floor(a)

% round up
ceil(a)

% element-wise max
max(A, B)

% column-wise max
max(A,[],1)

% or use
max(A)

% row-wise max
max(A,[],2)

% max element
max(max(A))

% or turn A into a vector
max(A(:))

% column-wise sum
sum(A,1)

% row-wise sum
sum(A,2)

% diagonal sum
sum(sum(A .* eye(length(A))))

% other diagonal sum
sum(sum(A .* flipud(eye(length(A)))))
```

### Plotting Data
```
t=[0:0.01:0.98]

% plot given x and y data
plot(t, sin(t))

% plots next figures on top of the open one (old one)
hold on

% sets x-axis label
xlabel('time')

% sets y-axis label
ylabel('value')

% show legend
legend('sin', 'cos')

% show title
title('my plot')

% saves open plot as png
print -dpng 'myPlot.png'

% close open plot
close

% multiple plots
figure(1); plot(t, sin(t));
figure(2); plot(t, cos(t));

% divides plot into a 1x2 grid, access first element
subplot(1,2,1)

% set x-axis range to 0.5 -> 1 and y-axis range to -1 -> 1
axis([0.5 1 -1 1])

% clear plot
clf

% plot matrix
imagesc(A)

% show colorbar with values
colorbar

% change to gray colormap
colormap gray

% comma chaining of commands, useful e.g. if you want to change colormap etc. (output doesn't get surpressed like when using ";")
a=1, b=2, c=3
```