# Experiment 10 - Implementing Non Parametric Locally Weighted Regression on Non-linear Data

## Theory

### Parametric vs Non-Parametric Learning Algorithms
Parametric — In a Parametric Algorithm, we have a fixed set of parameters such as theta that we try to find(the optimal value) while training the data. After we have found the optimal values for these parameters, we can put the data aside or erase it from the computer and just use the model with parameters to make predictions. Remember, the model is just a function.

Non-Parametric — In a Non-Parametric Algorithm, you always have to keep the data and the parameters in your computer memory to make predictions. And that’s why this type of algorithm may not be great if you have a really really massive dataset.

### Need for NPLW Regression
We specifically apply this regression technique when the data to fit is non-linear. In Linear Regression we would fit a straight line to this data but that won’t work here because the data is non-linear and our predictions would end up having large errors. We need to fit a curved line so that our error is minimized.

### How NPLW Regression Works
In Locally weighted linear regression, we give the model the `x` where we want to make the prediction, then the model gives all the `x(i)`’s around that `x` a higher weight close to one, and the rest of `x(i)`’s get a lower weight close to zero and then tries to fit a straight line to that weighted `x(i)`’s data.

This means that if want to make a prediction for the green point on the x-axis (see figure below), the model gives higher weight to the input data i.e. `x(i)`’s near or around the circle above the green point and all else `x(i)` get a weight close to zero, which results in the model fitting a straight line only to the data which is near or close to the circle. The same goes for the purple, yellow, and grey points on the x-axis.

![](./assets/NPLWR-graph.webp)

### Important Formulae
In Linear regression, we had the following loss function 

![](./assets/linear-regression-loss-function.webp)

The modified loss for locally weighted regression 

![](./assets/modified-loss-function.webp)

`w(i)` (the weight for the ith training example) is the only modification.

where the weighting function is, 

![](./assets/weighting-function.webp)

`x` is the point where we want to make the prediction. `x(i)` is the `ith` training example.

The value of this function is always between 0 and 1.

So, if we look at the function, we see that

    If |x(i)-x| is small, w(i) is close to 1.
    If |x(i)-x| is large, w(i) is close to 0.

The `x(i)`’s which are far from `x` get `w(i)` close to zero and the ones which are close to `x`, get `w(i)` close to 1.

### Calculating Error
In the loss function, it translates to error terms for the `x(i)`’s which are far from `x` being multiplied by almost zero and for the `x(i)`’s which are close to `x` get multiplied by almost 1. In short, it only sums over the error terms for the `x(i)`’s which are close to `x`.

### Algorithm
Actually, there exists a closed-form solution for this algorithm which means that we do not have to train the model, we can directly calculate the parameter theta using the following formula.

![](./assets/closed-form-solution-for-theta.webp)

# Code

## Importing Libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import warnings

%matplotlib inline

warnings.filterwarnings('ignore')

## Generating Non-linear Data
A good way to create a non-linear dataset is to mix sines with different phases. The dataset we will work with in this experiment is created with the following Python script and exported to a CSV file.

Create a variable to set the number of samples to create.

In [2]:
numberSamples = 500

Now, creating a lambda function that takes the input and convolves it as a mixture of two `cos` functions.

In [3]:
deLinearise = lambda X: np.cos(1.5 * np.pi * X) + np.cos(5 * np.pi * X)

Now, creating `X` and `y` using the above function and `np.random.rand`.

In [4]:
X = np.sort(np.random.rand(numberSamples)) * 2
y = deLinearise(X) + np.random.randn(numberSamples) * 0.1

Here's what `X` looks like.

In [5]:
X, X.shape

(array([0.00975651, 0.01760754, 0.02249123, 0.0337934 , 0.03798259,
        0.03997827, 0.0406476 , 0.04604542, 0.05684867, 0.05778396,
        0.06299428, 0.06739615, 0.07297972, 0.07310793, 0.07521071,
        0.07902422, 0.08075793, 0.08307696, 0.08420137, 0.08500867,
        0.08585414, 0.0882585 , 0.09907335, 0.10169082, 0.10202946,
        0.10494266, 0.12672329, 0.13336665, 0.13912524, 0.1478204 ,
        0.15002329, 0.15242703, 0.15722036, 0.15770599, 0.15962885,
        0.16152884, 0.16394384, 0.17439408, 0.17465534, 0.17928118,
        0.19106266, 0.19207551, 0.19323635, 0.20035229, 0.20227658,
        0.20471835, 0.20528104, 0.20531759, 0.21166994, 0.21233725,
        0.21647175, 0.22708453, 0.23618164, 0.23876948, 0.23900659,
        0.23923307, 0.24328713, 0.25219293, 0.26501772, 0.26594532,
        0.2675287 , 0.27463363, 0.27532993, 0.29462189, 0.30193047,
        0.31482402, 0.31700156, 0.31778929, 0.317892  , 0.32278083,
        0.32454564, 0.33237558, 0.33437727, 0.33

And now, `y`.

In [6]:
y, y.shape

(array([ 1.89251840e+00,  1.89333061e+00,  1.85488772e+00,  1.75319884e+00,
         1.77754628e+00,  1.94294606e+00,  1.75184712e+00,  1.64894886e+00,
         1.48695773e+00,  1.58301967e+00,  1.60135063e+00,  1.46002097e+00,
         1.43025700e+00,  1.36824501e+00,  1.29987294e+00,  1.22952563e+00,
         1.18804753e+00,  1.24952890e+00,  1.15483843e+00,  1.13624685e+00,
         1.25366932e+00,  1.11023390e+00,  9.51811377e-01,  7.84659270e-01,
         7.90051955e-01,  8.67475027e-01,  3.50013931e-01,  3.38153563e-01,
         1.65719601e-01,  7.63799987e-02, -5.99003549e-02, -1.39603184e-01,
        -1.83036899e-01,  5.42738008e-02,  6.84130135e-02, -1.56364848e-01,
        -1.93379296e-01, -9.77615903e-02, -3.10547628e-01, -3.56565469e-01,
        -5.13671472e-01, -3.30888599e-01, -2.06410382e-01, -5.20060025e-01,
        -3.77924298e-01, -3.60263073e-01, -5.20903212e-01, -2.85019638e-01,
        -3.66561206e-01, -4.54544874e-01, -4.32227643e-01, -3.38218612e-01,
        -2.5

### Reshaping Arrays 

In [7]:
X = X.reshape(X.shape[0], 1)
X.shape

(500, 1)

In [8]:
y = y.reshape(y.shape[0], 1)
y.shape

(500, 1)

## Calculating Predictions

Defining a function to calculate the diagonal weight matrix. This function takes in the test point, the training data and the value of `tau` which corresponds to the radius of the circle surrounding a point. All the points laying in the circle are considered for the regression problem. 

In [9]:
# Weight Matrix in code. It is a diagonal matrix.def wm(point, X, tau): 
def calculateWeightMatrix(point, X, tau):
    '''
    The parameters of this function are,
		tau --> bandwidth
		X --> Training data.
		point --> the x where we want to make the prediction.
    ''' 
    
    # m is the No of training examples .
    m = X.shape[0] 
    
    # Initialising W as an identity matrix.
    w = np.mat(np.eye(m)) 
    
    # Calculating weights for all training examples [x(i)'s].
    for i in range(m): 
        xi = X[i] 
        d = (-2 * tau * tau) 
        w[i, i] = np.exp(np.dot((xi - point), (xi - point).T) / d) 
        
    return w

Now, defining a function to predict the `y` value for a point which uses the previously defined function to calculate the weight matrix.

In [10]:
def predict(X, y, point, tau):
    # m = number of training examples.
    m = X.shape[0]

    onesColumn = np.ones(m).reshape(m, 1)

    # Appending a column of ones in X to add the bias term. Just one parameter: theta, that's why adding a column of ones to X and also adding a 1 for the point where we want to predict.
    X_ = np.append(X, onesColumn, axis=1)

    # point is the x where we want to make the prediction.
    point_ = np.array([point, 1])

    # Calculating the weight matrix using the wm function we wrote      #  # earlier.
    w = calculateWeightMatrix(point_, X_, tau)

    # Calculating parameter theta using the formula.
    theta = np.linalg.pinv(X_.T * (w * X_)) * (X_.T * (w * y))

    # Calculating predictions.
    pred = np.dot(point_, theta)

    # Returning the theta and predictions
    return theta, pred

In [11]:
def getPredictionsForSingleTau(numberPredictions, tau):
    # Empty list for storing predictions.
    predictions = []

    # Predicting for all numberPredictions values and storing them in predictions.
    for point in X:
        _, predicted = predict(X, y, point, tau)
        predictions.append(predicted)

    # Reshaping predictions
    predictions = np.array(predictions).reshape(numberPredictions, 1)

    return predictions

## Plotting Predictions

In [12]:
def plotPredictions(XTest, predictionList, tauList):
    plt.figure(figsize=(20, 15), dpi=80)

    plt.plot(X, y, "o", color='orange', alpha=0.3, label='Training Data Points')

    for i, prediction in enumerate(predictionList):
        plt.plot(XTest, prediction, label=f'tau = {tauList[i]}')

    plt.legend()
    plt.show()

## Putting it all Together

In [13]:
tauList = np.arange(0, 0.1, step=0.05)
tau = 0.08

predictionList = []

# for tau in tauList:
#     predictions = getPredictionsForSingleTau(numberSamples, tau)
#     predictionList.append(predictions)

predictionList = getPredictionsForSingleTau(numberSamples, tau)

predictionList = np.array(predictionList)

plotPredictions(X, predictionList, tau)

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

# References
1. [Medium Article](https://towardsdatascience.com/locally-weighted-linear-regression-in-python-3d324108efbf)
2. [Creating Random Non-Linear Data](https://www.oreilly.com/library/view/effective-amazon-machine/9781785883231/586af32b-cd7f-40f4-bfe1-bea45b67a804.xhtml)