### Handin 2

# Info

Everything should be completed and approved in person. Groups are fine, as 1 random person will have to present on behalf of the group.


The objectives for this handin is:
* Investigate loss curves
* Linear Regression
* Feature Encoding 
* Simple Interface with Dash
* Speeding up with Numba


# Task 1

Prove that there exist an $\alpha \in R$ such that $y$ becomes 2.  (Taken from a math exam at BI Nydalen)

1) $\alpha x + y = 4$   
2) $-x + 3y = 2$  


In [7]:
import numpy as np
y = 2

for a in np.arange(0, 10, 0.5):
    for x in np.arange(0, 10, 0.5):
        if(a * x + y) == 4:
            if(-x + 3*y) == 2:
                print(a)
                break

0.5


# Task 2 -- Investigating the loss curve


We are going to investigate how an algorithm navigates the L2 loss curve, first using Fortuna then using GD.

To this end we will use our very simple model $f_\theta(x) = \theta$ to model the training data given below.


### Task 2a
Visualize the model $f_\theta(x), \theta=0.34$ alongside the training data in the plot below.


In [8]:
import numpy as np
import plotly.express as px
import plotly.graph_objects as go


x_train = np.arange(0.0, 1.0, 0.025)
y_train = 0.4 + x_train * 0.55 + np.random.randn(x_train.shape[0])*0.2


figure = go.Figure(data=px.line(x=[0,1], y=[0.34, 0.34]).data + px.scatter(x=x_train, y=y_train, title="train dataset").data)
figure.show()

### Task 2b

Create a plot that shows the loss curve for $\theta$ in the range [0, 1], using the Mean Squared Error loss function.  
That is, $L(x, y) = \frac{1}{m} \sum [ f_\theta(x_k) - y_k)^2 ]$. Where $m$ is the number of data points in the training set. Remember: $f_{\theta}(x) = \theta$.


Using the plot find the value of $\theta$ that minimize the loss.


In [9]:
# -- CODE -- for Task 2b goes here.
import plotly.express as px
import plotly.graph_objects as go


x_train = np.arange(0.0, 1, 0.025)
y_train = 0.4 + x_train * 0.55 + np.random.randn(x_train.shape[0])*0.2

def get_loss(theta, ys):
    return (1 / len(ys)) * ((np.ones_like(ys)*theta - ys)**2).sum()


best_loss = np.inf
best_theta = 0

loss_curve = np.array([])

for theta in np.arange(0, 1, 0.025):
    loss = get_loss(theta, y_train)

    loss_curve = np.append(loss_curve, loss)
    if loss < best_loss:
        best_theta = theta
        best_loss = loss

print(f'Best loss: {best_loss}, with theta: {best_theta}')


scatter = px.scatter(x=x_train, y=y_train)
low = px.scatter(x=[best_theta], y=[best_loss])

figure = go.Figure(data=px.line(x=x_train, y=loss_curve).data + scatter.data)
figure.show()


Best loss: 0.04709877607910298, with theta: 0.65


 ### Task 2c
Redo Task 2b, however, this time we want the model to be $f_\theta(x) = ax + b$ with $a,b \in \theta$.
1) Set $b = 0.1$ and plot the loss curve over $a \in [-1, 1]$.
2) Set $b = 2.0$ and plot the loss curve over $a \in [-1, 1]$.  
Before you run the code, try to envision how the loss curves would look like.  
How did the actual loss curves look like?



In [21]:
# -- CODE -- for Task 2b goes here.
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

a = np.arange(-1, 1, 0.05)

x_train = np.arange(0.0, 1, 0.025)
y_train = a * np.random.randn(x_train.shape[0]) + 0.1

def get_loss(theta, ys):
    return (1 / len(ys)) * ((np.ones_like(ys)*theta - ys)**2).sum()


best_loss = np.inf
best_theta = 0

loss_curve = np.array([])

for theta in np.arange(0, 1, 0.025):
    loss = get_loss(theta, y_train)

    loss_curve = np.append(loss_curve, loss)
    if loss < best_loss:
        best_theta = theta
        best_loss = loss

print(f'Best loss: {best_loss}, with theta: {best_theta}')


scatter = px.scatter(x=x_train, y=y_train)
low = px.scatter(x=[best_theta], y=[best_loss])

figure = go.Figure(data=px.line(x=x_train, y=loss_curve).data + scatter.data)
figure.show()


Best loss: 0.43592420395835624, with theta: 0.15000000000000002


# Task 3
Train a  linear regression with a L2 loss on the training data using Gradient Descent. 
The code below should give a (non-vectorized) on how it is found.  

The gradient is found as:  
$ L = \frac{1}{2}(\hat{y} - y )^2 $  

$ \hat{y} = f_\theta(x) = \theta$  


$ \frac{\partial L}{\partial\theta} = \frac{\partial L}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial \theta}$ (using the chain rule).

With:  
$\frac{\partial L}{\partial \hat{y}} = (\hat{y} - y) \times 1 = (\hat{y} - y)$  

$\frac{\partial \hat{y}}{\partial \theta} = 1  $


Gives us:  
$\frac{\partial L}{\partial\theta} = (\hat{y} - y)$


### Questions:
1) Find a set of hyperparameters that converge for $\theta_{\text{init}} = 5.5$.  How can we determine if the algorithm has converged?  
2) Can you find a learning rate that the algorithm does not converge for?  
3) What is the "best" learning rate for this particular dataset?  



In [11]:
import numpy as np
import plotly.express as px
import plotly.graph_objects as go


def gradient_of_J(theta, x, y):
    # 
    y_hat = theta
    
    # dL / dy_hat
    dLdy = (y_hat - y)
    
    # dy_hat / dTheta
    dy_HatdTheta = 1
    
    # chain rule
    dLdTheta = dLdy * dy_HatdTheta
    
    return dLdTheta


def calculate_l2_loss_non_vectorized(theta, xs, ys):
    loss = 0.0
    for k in range(ys.shape[0]):
        y_pred = theta 
        loss += (y_pred - ys[k])**2

    
    mean_loss = loss / ys.shape[0]
    return mean_loss


    

initial_theta = 5.5

learning_rate = 0.44
theta = np.array([initial_theta])
m = x_train.shape[0]
n_steps = 10

print("Running GD with initial theta: {:.2f}, learning rate: {} over {} datapoints for {} steps".format(
    theta.item(),
    learning_rate,
    m,
    n_steps))


search_history = []
for steps in range(n_steps):

    gradient_theta_sum = 0.0
    for k in range(m):
        gradient_theta_sum += gradient_of_J(theta, x_train[k], y_train[k])

    mean_gradient = (1/m) * gradient_theta_sum
    loss = calculate_l2_loss_non_vectorized(theta, x_train, y_train)

    print("[visit] theta: {:.2f} => loss: {:.2f}".format(theta.item(), loss.item()))

    # update theta using GD
    theta = theta - (learning_rate * mean_gradient)
    search_history.append((theta, loss))

    


# quick helper to generate plots 
loss_x = np.arange(-4, 6, 0.01)

loss_y = np.array([calculate_l2_loss_non_vectorized(t, x_train, y_train) for t in loss_x])

fig = px.line(x=loss_x, y=loss_y, title="GD History : Marks are iterations.")


x_visit, _ = list(zip(*search_history))
x_visit = np.concatenate(x_visit)
y_visit = np.array([calculate_l2_loss_non_vectorized(t, x_train, y_train) for t in x_visit])

fig.add_trace(go.Scatter(x=x_visit, y=y_visit, name='GD history',
                         line = dict(color='firebrick', width=8, dash='dot')))

fig.show()

Running GD with initial theta: 5.50, learning rate: 0.44 over 40 datapoints for 10 steps
[visit] theta: 5.50 => loss: 29.01
[visit] theta: 3.14 => loss: 9.34
[visit] theta: 1.83 => loss: 3.17
[visit] theta: 1.09 => loss: 1.23
[visit] theta: 0.67 => loss: 0.63
[visit] theta: 0.44 => loss: 0.44
[visit] theta: 0.31 => loss: 0.38
[visit] theta: 0.24 => loss: 0.36
[visit] theta: 0.20 => loss: 0.35
[visit] theta: 0.18 => loss: 0.35


## Task 4: Gradient Descent
Below is a simple vectorized impl. of GD that can be used as a starting point. 
Please make sure you understand exactly HOW it works (so that you could have implemented one yourself).

1) Change the code to use Stochastic Gradient Descent.
2) Re-organize the code and add numba as to make the SGD go pew pew (faster).

Numba: https://numba.readthedocs.io/en/stable/


In [20]:
import numpy as np
import plotly.express as px
from typing import Tuple, Union, List


def predict(theta, xs):
    return np.dot(xs, theta)

def J_squared_residual(theta, xs, y):
    h = predict(theta, xs)
    sr = ((h - y)**2).sum()
    return sr

def gradient_J_squared_residual(theta, xs, y):
    h = predict(theta, xs)
    grad = np.dot(xs.transpose(), (h - y))
    return grad


def get_subset(xs: np.array, ys: np.array, indexes: np.array, batch_size: int) -> Union[Tuple[np.array, np.array], Tuple[None, None]]:
    if (len(xs) < batch_size) or (len(xs) != len(ys)) or (len(indexes) < batch_size):
        return None, None

    x_data = np.empty([batch_size, len(xs[0])])
    y_data = np.empty([batch_size, 1])

    np.random.shuffle(indexes)

    for i in range(0, batch_size):
        x_data[i] = xs[indexes[i]]
        y_data[i] = ys[indexes[i]]

    return x_data, y_data


# the dataset (already augmented so that we get a intercept coef)
data_x = np.array([[1.0, 0.5], [1.0, 1.0], [1.0, 2.0]])
data_y = np.array([[1.0], [1.5], [2.5]])

n_features = data_x.shape[1]
# variables we need
theta = np.zeros((n_features, 1))
learning_rate = 0.2
m = data_x.shape[0]

# run GD
j_history = []
n_iters = 100
for it in range(n_iters):
    x, y = get_subset(data_x, data_y, np.arange(0, len(data_x)), int(len(data_x) * 0.4))
    if x is None:
        print("Bad Input In (get_subset)")
        break
    j = J_squared_residual(theta, x, y)
    j_history.append(j)

    theta = theta - (learning_rate * (1/m) * gradient_J_squared_residual(theta, x, y))

print("theta shape:", theta.shape)

# append the final result.
j = J_squared_residual(theta, data_x, data_y)
j_history.append(j)
print("The L2 error is: {:.2f}".format(j))


# find the L1 error.

y_pred = predict(theta, data_x)

l1_error = np.abs(y_pred - data_y).sum()
print("The L1 error is: {:.2f}".format(l1_error))


# Find the R^2
# if the data is normalized: use the normalized data not the original data (task 3 hint).
# https://en.wikipedia.org/wiki/Coefficient_of_determination
u = ((data_y - y_pred)**2).sum()
v = ((data_y - data_y.mean())** 2).sum()
print("R^2: {:.2f}".format(1 - (u/v)))
print(theta)



# plot the result
fig = px.scatter(j_history, title="J(theta) - Loss History")
fig.show()


theta shape: (2, 1)
The L2 error is: 0.01
The L1 error is: 0.11
R^2: 1.00
[[0.58094179]
 [0.93377051]]


# Isak Housing Inc

### Note: no pandas, sklearn or similar libraries should be used, numpy, dash, numba and plotly should be sufficient. Ask if you wonder about a library.

The project consists of 4 parts: 

1) 

Go though the data and understand how encode the various features. 
* Clean the data for potential noise and simply wrong input.
* Make sure you identify how a linear classifier will be affected by the encoding scheme. 
* How do you handle missing data?
* Encode the features.

2) 
Train a linear model based on the data in 'train.jsonl'.  
Either write your own from scratch or modify the SGD example given in the handin.
The input should consist of two parts: the features for the house, and how many years into the future the price should be predicted at. 

Simplified Example:  
Input:[ features: 20m^2 house, built in 1979, .... | years: 4 ]  
Output: In 4 years the house will be worth 2 100 000 NOK.  

3) 
Implement the dashboard interface such that we can take 3 parameters. (See 'isak_dashboard.py' on canvas for a starting point for the dashboard.)  
A) A file with house features and their cost. (example: 'houses_on_marked_setA_3_years.jsonl')  
B) A budget to buy houses with.  
C) How many years we can sit on the houses before we sell them.

Output: a list of houses to buy and our estimated earnings.

4) 
Test the interface on the different auctions found in the auctions folder on canvas. Record the output.   
The auction files are of the format: 
* houses_on_marked_setX_yearY.jsonl - where X is the set id, and Y is the number of years into the future the houses are sold.
* prices_in_future_setX_Y_years.jsonl - contains the prices (use the id field to connect the two datasets)  

5) 
A client wants to know how the model works, my inspecting the weights (that is: $\theta$) give a 
overview of the most important factors in the model. Are there any houses that 
should be avoided wrt. future sale?



### Dataset
Features found in the dataset:

* Id: Transaction id.
* Built: The year the house was built.
* Color: The color of the house.  
* Size: The size of the house.  
* Sun: The % of the day the sun is hitting the house.

* Year: The year the house was sold.  
* Month: The month the house was sold (1 - Jan, 2 - Feb, ..., 12 - Dec)  
* Price: The price when it was sold in (Year)  




