<a href="https://colab.research.google.com/github/dyjdlopez/numeth2021/blob/main/Week%209-13%20-%20Curve%20Fitting%20Techniques/NuMeth_4_Curve_Fitting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Curve Fitting Techniques
$_{\text{©D.J. Lopez | 2021 | Computational Methods for Computer Engineers}}$

Curve fitting is one of the most used algorithms for optmization expecially in business applications. The use of curve-fitting functions ranges from engineering and signal applications such as approximations and signal replication to business applications in forecasting and operations optimization. In this module, we will discuss the several techniques that can be used in curve fitting. Specifically, we will cover:

* Linear Regression
* Multiple Linear Regression
* Least-Squares Method (Normal Equation Method)
* Metrics of Regression
* Linear Interpolation
* Lagrange Method
* Newton's Method

## 4.1 Curves
When talking about curves, it is not simply wavy lines or simple drawing elements. Rather in our course, we take curves as functions. In the previous lessons we have seen the graphs of the functions and we call them curves as well. But in this module, we are going to identify what is the function based on sets of data. We can use this for idnetifying missing data or creating approximation for new data considering the function we created. 

### 4.1.1 Extrapolation
Extrapolation can be imagined as the appoximation of data beyond the dataset based on the given data,function, or curve describing a specific dataset. One method in data extrapolation that we will discuss in this course is regression.

### 4.1.2 Interpolation
Interpolation similar to extrapolation approximates data based on existing data, function, or curve but rather finding data beyond the given set of data it finds more specific or missing data points within a dataset.

## 4.2 Extrapolation / Regression

### 4.2.1 Linear Regression
As the name suggests, linear regression tries to find the best fit straight line to a given dataset. This algorithm is one of the simplest yet most important alogorithm in regression since it is the foundation of many more complex regression techniques.

The goal of this algorithm is finding a linear equation that would best describe a set of data. The equation to be used in finding that linear equation is function is given as:
$$y = \omega_0 + \omega_1 X \\ _{\text{(Eq. 4.1)}}$$
Whereas $X$ is the dataset while $y$ is the corresponding values for each datapoint in $X$. The variable $\omega$ is called the weight of the dataset consiting of $\omega_0$ and $\omega_1$. In other literature, $\omega_0$ is called the bias term sometimes written as $b$. The following equations are used to solve for $\omega_0$ and $\omega_1$:
$$\omega_0 = r\frac{\sigma_y}{\sigma_x}=\frac{\bar{y}*\sum(x_i^2)-\bar{x}\sum(x_i*y_i)}{\sum(x^2_i-n\bar{x}^2)}\\ _{\text{(Eq. 4.2.1)}}$$
$$\omega_1 =\bar{y}-\omega_0\bar{x}= \frac{\sum(x_i*y_i)-\bar{x}\sum(y_i)}{\sum(x^2_i-n\bar{x}^2)}\\ _{\text{(Eq. 4.2.2)}}$$

$$\omega_0 = r\frac{\sigma_y}{\sigma_x}\\ _{\text{(Eq. 4.2.3)}}$$
Whereas $r$ is the Pearson correlation solved as:
$$r = \frac{\sum((x-\bar{x})(y-\bar{y}))}{\sqrt{\sum(x-\bar{x})\sum(y-\bar{y})}}\\ _{\text{(Eq. 4.2.4)}}$$

$$\omega_1 =\bar{y}-\omega_0\bar{x} \\ _{\text{(Eq. 4.2.5)}}$$



In [None]:
'''
Since we are going to use datasets for this module, we will be generating dummy 
data with numpy. We will use as matplotlib for visualizing the results as well.
'''
import numpy as np
import matplotlib.pyplot as plt

In [None]:
n = 10
X = np.arange(0,n,1,dtype=np.float64)

m = np.random.uniform(0.4,0.5,(n,))
b = np.random.uniform(8,10,(n,))

y = m*X+b 

print(f"X: {X}")
print(f"y: {y}")
print(f"w1 approx = {m.mean()},w0 approx. = {b.mean()}")

plt.figure(figsize=(5,5))
plt.grid()
plt.scatter(X,y)
plt.show()

In [None]:
def linear_regressor(X,y):
  X = np.array(X)
  y = np.array(y)
  n = X.size
  w0 = (y.mean()*np.sum(X**2)-X.mean()*np.sum(X*y)) / (np.sum(X**2) - n*X.mean()**2)
  w1 = (np.sum(X*y) - X.mean()*np.sum(y)) / (np.sum(X**2) - n*X.mean()**2)
  return w0,w1
w0,w1 = linear_regressor(X,y)
print("Linear Regression Equation: y = {:.3f}x + {:.3f}".format(w1, w0))

In [None]:
## Plotting the Regression line
def show_regline(X,y,w1,w0):
  x_min, x_max = X.min() - 1, X.max() + 1
  linex = np.linspace(x_min, x_max)
  liney = w1*linex+w0
  plt.figure(figsize=(5,5))
  plt.grid()
  plt.scatter(X,y)
  plt.plot(linex, liney, c='red')
  plt.show()
show_regline(X,y,w1,w0)

In [None]:
def lin_reg(val,w0,w1):
  return w1*val + w0 #model
print(lin_reg(10, w0, w1))
X_new, y_new = X.copy(), y.copy()
for i in range(10,16):
  X_new = np.insert(X_new,-1, i)
  y_new = np.insert(y_new,-1, lin_reg(i,w0,w1))
show_regline(X_new, y_new, w1, w0)

In [None]:
np.random.seed(100)
X_1 = np.arange(0, 20, 1)
y_1 = X_1 - 2 * (X_1 ** 2) + 0.5 * (X_1 ** 3) + np.random.normal(-3, 3, 20)

plt.figure(figsize=(5,5))
plt.grid()
plt.scatter(X_1, y_1)
plt.show()

In [None]:
w0_q,w1_q = linear_regressor(X_1, y_1)
show_regline(X_1,y_1,w0_q,w1_q) 

### 4.2.2 Multiple Linear Regression
Multiple linear regression, as the name suggests uses more linear regressors in the algorithm. This can be used if there are more than one features to a dataset. The MLR can be formulated as:
$$y = \omega_0 + \omega_1 x_1 + \omega_2 x_2 + ... + \omega_n x_n \\ _{\text{(Eq. 4.4)}}$$
Whereas $\omega_0$ is the bias term while $\omega_n$ are the weights or slopes of the features $x_n$. The simplest way to implement an MLR algorithm is looping over each feature and their dataset and compute the corresponding weights. In this course, we are going to implement vectorization in implementing MLR. So instead of hte linear equation in Eq. 4.4 we can re-form the equation to the matrix equation:
$$y = \omega X^T$$
Whereas $\omega$ is a vector that includes all the weights of the features $\begin{bmatrix}\omega_0 \\ \omega_1 \\ \omega_2 \\ \vdots \\ \omega_n\end{bmatrix}$. While $X$ are the data of each feature vector $\begin{bmatrix}1\\ x_1 \\ x_2 \\ \vdots \\ x_n\end{bmatrix}$.

We will use the **Normal Equation** in solving MLR. The Normal equation uses the Least-Squares Cost function and is formulated as:
$$\theta = (X^TX)^{-1}X^Ty \\ _{\text{(Eq. 4.5)}}$$
Whereas $\theta$ is the hypothesis or model to be created while $X$ represents the data vector and $y$ represents the labels or values corresponding to the data vector. The term $(X^TX)^{-1}$ is called the **pseudoinverse** or the **Moore-Penrose** matrix. The pseudoinverse of a matrix term of Eq. 4.5 assures that the data are normal or orthogonal. This helps check the property of Autocorrelation between the features of the data. The other properties of datasets that are safe for linear regression are Homoscedasticity, Non-multicollinearity, and Non-endogeneity. These properties will be discussed in depth in the Machine Learning Course of the AIDA Electives. 


In [None]:
X = np.array([
              [1,2,3],
              [7,3,2],
              [9,6,10],
])
y = np.array([[4,3,8]]).T
bias = np.ones(y.shape)
X_train = np.append(bias,X, axis=1).T
X_dot = X_train @ X_train.T
pseudoinv = np.linalg.inv(X_dot)
y_dot = X_train @ y
theta = pseudoinv @ y_dot
for i in range(len(theta)):
  print(f"w{i} : {float(theta[i])}")

### 4.2.3 Metrics of Regression
For us to determine how regression models are reliable or accurate we can use the following statistics.

#### Measures of Reliability
Measures of reliability or predictability tells how models are reliable for predicting new values. Some of statistics used here are the R-Squared and the Adjusted R-Squared.

##### *R-Squared ($R^2$)*
Represents the proportion of the variance for a prediction that is explained by the inputs in a regression model. The formula is given as:

$$\text{R}^2 = 1 - \frac{\sum(y-\hat{y})^2}{\sum(y-\bar{y})^2} \\ _{\text{(Eq. 4.6)}}$$
Whereas the numerator for the rational part is called the **residual of the sum of squares** in which $\hat{y}$ is the prediction of the model and $y$ is from the testing dataset. The denominator of the rational part is called the **total sum of squares**.


###### *Adjusted R-Squared ($\text{Adj }R^2$)*
A modified version of the R-squared which has been adjusted to the number of predictors in the model. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. The formula is given as:
$$\text{Adj. R}^2 = 1 - \begin{bmatrix}\frac{(1-\text{R}^2)(n-1)}{n-p-1}\end{bmatrix} \\ _{\text{(Eq. 4.6)}}$$
Whereas $n$ is the size of the sample and $p$ is the number of predictors.

#### Measures of Error 
Measures of error can tell how "off" predicted values are from the true values or ground truth. These stastical measures can also serve as a minimisation cost function in optimizing a model.

##### *Mean Squared Error (MSE)*
The MSE shows an estimation of the deviations of the predictions from the ground truths by getting the average of the squared errors. It can be also interpreted as the mean of the Euclidean distances of the predictions and ground truths. MSE is best when considering the existence of outliers in the data. The formula is given as:
$$\\ \text{MSE}=\frac{1}{n}\sum(y-\hat{y})^2 \\ _{\text{(Eq. 4.7)}}$$  

##### *Root Mean Squared Error (MSE)*
The limitations of MSE as a measure of measure is its intepretability wherein it does not express the error in the original measurement units. The RMSE can also be considered as the standard deviation of the residuals unlike the MSE which is the variance. The formula of is given as:
$$\text{RMSE}= \sqrt{\text{MSE}} \\ _{\text{(Eq. 4.8)}}$$  

##### *Mean Absolute Error (MAE)*
The MAE, as its name suggests, takes the average of the Manhattan distances of the predictions and the ground truths. If outliers are not much of a concern for the problem MAE can be a better choice than MSE and RMSE. The formula is given as:
$$\\ \text{MAE}=\frac{1}{n}\sum{|y-\hat{y}|} \\ _{\text{(Eq. 4.9)}}$$  

### 4.2.4 Applied Uses of Linear Regression
Refered discussion: [Applied Linear Regression](https://colab.research.google.com/github/dyjdlopez/numeth2021/blob/main/Week%209-13%20-%20Curve%20Fitting%20Techniques/NuMeth_4_5_Applied_Linear_Regression.ipynb)

## 4.3 Interpolation
Interpolation as previously discussed, pertains to the approximation of values within a given range. It is unlike regression or extrapolation that is trying to approximate values beyond the range. Interpolation can be used to increase the resolution of values between such ranges.

### 4.3.1 Linear Interpolation
This method is the simplest implementation of interpolation considering the modified midpoint formula. Like linear regression, this method is best for linear equations but would have inaccurate approximations for polynomials with higher degrees. The formula is given as:

$$ y = y_1 + \frac{y_2-y_1}{x_2-x_1}(x-x_1) \\ _{\text{(Eq. 4.9)}}$$


In [None]:
np.random.seed(30)
X_2 = np.arange(0, 8, 1, dtype=float)
y_2 = X_2 + 4*(X_2 ** 2) - 3*np.random.normal(0, 3, X_2.size)

plt.figure(figsize=(5,5))
plt.grid()
plt.scatter(X_2, y_2)
plt.show()

We can see in this sample that there is a big gap between 5 and 6 of the independent variables. We can apply linear interpolation in bridging the gap.

In [None]:
def lin_interp(x, x1, x2, y1, y2): 
  return y1 + ((y2-y1)/(x2-x1)) * (x-x1)

In [None]:
y_56 = lin_interp(2.5, X_2[2], X_2[3], y_1[2], y_2[3])
print(y_56)
X_2new = X_2.copy()
y_2new = y_2.copy()
X_2new = np.insert(X_2new, 3, 2.5)
y_2new = np.insert(y_2new, 3, y_25)

plt.figure(figsize=(5,5))
plt.grid()
plt.scatter(X_2new, y_2new)
plt.plot(X_2new, y_2new)
plt.show()

In [None]:
## If we want to increase the resolution of the graph we can perform linear interpolation for every datapoint
## We need to make a routine using the formula we created.
inp = 0
X_2new = X_2.copy()
y_2new = y_2.copy()
for i, xi in enumerate(X_2):
  if i !=0:
    xi -= 0.5
    y = lin_interp(xi, X_2[i-1], X_2[i], y_2[i-1], y_2[i])
    print(xi, y)
    X_2new = np.insert(X_2new, 2*i-1, xi)
    y_2new = np.insert(y_2new, 2*i-1, y)

plt.figure(figsize=(12,12))
plt.grid()
plt.scatter(X_2, y_2)
plt.plot(X_2new, y_2new)
plt.show()

### 4.3.2 Lagrange Method
The Lagrange method is based on creating a polynomial of degree $n-1$. The degree is dependent on the number of points considered in the dataset $n$. It can be characterized as:
$$ y(x) = P_1(x)y_1 + P_2(x)y_3 + P_3(x)y_3 + ... + P_n(x)y_n \\ _{\text{(Eq. 4.10)}}$$
This can also be expressed as:
$$ y(x) = \sum_{i=0}^n P_i(x)y_i \\ _{\text{(Eq. 4.11)}}$$
Whereas $P(x)$ is the function for the lagrangian polynomial coefficient. Formulated as:
$$ P_i(x) = \prod_{j=0 \\ j\neq i}^n \frac{(x-x_j)}{(x_i-x_j)} \\ _{\text{(Eq. 4.12)}}$$
Eq. 4.11 can then be re-formulated as:
$$ y(x) = \sum_{i=0}^n  y_i \begin{pmatrix}\prod_{j=0 \\ j\neq i}^n \frac{(x-x_j)}{(x_i-x_j)} \end{pmatrix} \\ _{\text{(Eq. 4.13)}}$$

In [None]:
def coeff(x,i,X):
  x_temp = np.delete(X,i)
  return ((x-x_temp)/(X[i]-x_temp)).prod()

In [None]:
x = 0.5
for i in range(X_2.size):
  Pi = coeff(x,i,X_2)
  print(Pi)

In [None]:
def lagrange(x,Y,X):
  y = 0
  for i in range(X.size):
    y += Y[i]*coeff(x,i,X)
  return y  

In [None]:
lagrange(0.5, y_2, X_2)

In [None]:
X_3new = X_2.copy()
y_3new = y_2.copy()
for i, xi in enumerate(X_2):
  if i !=0:
    xi -= 0.5
    y = lagrange(xi,y_3new,X_3new)
    X_3new = np.insert(X_3new, 2*i-1, xi)
    y_3new = np.insert(y_3new, 2*i-1, y)

plt.figure(figsize=(12,12))
plt.grid()
# plt.scatter(X_3new, y_3new)

plt.plot(X_3new, y_3new, label="Lagrange")
plt.plot(X_2new, y_2new, label="Linear", color='green')
plt.scatter(X_2, y_2, color='red')
plt.legend()
plt.show()
print(X_3new)
print(y_3new)

### 4.3.3 Newton's Method
The Newton's method can be applied to datapoints to obtain Newton's polynomial. Unlike Lagrange's Method, In Newton's Method, when more data points are to be used, additional basis polynomials and the corresponding coefficients can be calculated, while all existing basis polynomials and their coefficients remain unchanged. Due to the additional terms, the degree of interpolation polynomial is higher and the approximation error may be reduced. This can be used when the interval difference is not same for all sequence of values. This method Newton's polynomial is in the form:
$$y(x) = a_0 + (x-x_1)a_1 + (x-x_1)(x-x_2)a_2 + ... + (x-x_1)(x-x_2)...*(x-x_n)a_n \\_{\text{Eq.4.14}}$$
The two steps in obtaining the polynomial are:
1. Dividing the Differences
2. Substitution

#### 4.3.3.1 Dividing the Differences
This step is done to obtain the coefficients of the polynomials. These coeefficients are the $a_i$ from Eq. 4.14.
The divided differences are applied to create a table of values whereas column indicate the degree of the polynomial ($n$) plus 1. While the rows described by the datapoints ($x_i$). For example of a cubic polynomial with 4 datapoints:

<table style="width:200%">
<tr><th>----(0)-----</th><th>----(1)-----</th><th>----(2)-----</th><th>----(3)-----</th><th>----(4)-----</th>
</tr>
<tr><td>$x_1$</td><td>$y_1^{(1)}=y_1$</td></tr>
<tr><td>$x_2$</td><td>$y_2^{(1)}=y_2$</td><td>$y_2^{(2)}$</td></tr>
<tr><td>$x_3$</td><td>$y_3^{(1)}=y_3$</td><td>$y_3^{(2)}$</td><td>$y_3^{(3)}$</td></tr>
<tr><td>$x_4$</td><td>$y_4^{(1)}=y_4$</td><td>$y_4^{(2)}$</td><td>$y_4^{(3)}$</td><td>$y_4^{(4)}$</td></tr>
</table>
The general equation to be used in deriving each $y_i$ is formulated as:
$$y_i^{(j+1)} = \frac{y_i^{(j)}-y^{(j)}}{x_i - x_j}, \text{for: }j = \{0,1,2,..n\} \text{ and: } i = \{j+1,...,n+1\} \\_{\text{Eq. 4.15}}$$
The coefficients $a$ can be obtained from the main diagonal of the table, such that $a = \{y_i^{(j)} | i=j\} $



In [None]:
n = X_2.size
y_temp = np.zeros((n, n))
y_temp[:,0] = y_2
for j in range(n-1):
  for i in range(j+1, n):
    y_temp[i, j+1] = (y_temp[i,j]-y_temp[j,j])/(X_2[i] - X_2[j])
print(y_temp)

In [None]:
def newton_coeff(X,y):
  n = X.size
  y_temp = np.zeros((n, n))
  y_temp[:,0] = y
  for j in range(n-1):
    for i in range(j+1, n):
      y_temp[i, j+1] = (y_temp[i,j]-y_temp[j,j])/(X[i] - X[j])
  a = np.diag(y_temp)
  return y_temp, a

newton_coeff(X_2,y_2)

#### 4.3.3.2 Substitution
For the last step, the polynomial is calculated for a given $x$ value in Eq. 4.14. We can re-formulate Eq. 4.14 into its general form by:
$$y(x) = a_0 + \sum^n_{i=0}\begin{bmatrix}\prod^i_{j=1}(x-x_j)\end{bmatrix}a_i$$

In [None]:
### Newton coeff matrix
xp = 5.6
_, a = newton_coeff(X_2,y_2)
coeff_mat = np.zeros(n)
for i in range(0,n):  
  coeff_mat[i] = 1 if i==0 else np.product(xp-X_2[:i])
yp = a @ coeff_mat
yp

In [None]:
def newton_interp(xp,y,X):
  n = X.size
  _, a = newton_coeff(X,y)
  coeff_mat = np.zeros(n)
  for i in range(0,n):  
    coeff_mat[i] = 1 if i==0 else np.product(xp-X[:i])
  return a @ coeff_mat

In [None]:
X_4new = X_2.copy()
y_4new = y_2.copy()
for i, xi in enumerate(X_2):
  if i !=0:
    xi -= 0.5
    y = newton_interp(xi,y_4new,X_4new)
    X_4new = np.insert(X_4new, 2*i-1, xi)
    y_4new = np.insert(y_4new, 2*i-1, y)

plt.figure(figsize=(12,12))
plt.grid()

plt.plot(X_4new, y_4new, label="Newton", color="purple")
# plt.plot(X_3new, y_3new, label="Lagrange", color="blue")
plt.plot(X_2new, y_2new, label="Linear", color='green')
plt.scatter(X_2, y_2, color='red')
plt.legend()
plt.show()
