# Least Squares 

## Introduction with a linear function
In the lecture we started the discussion of the least squares method using the example of a linear function $f(x) = ax + b$ and a set of data points $(x_i, y_i)$, $i=1,\ldots,n$. We defined the error function as
$$
L(a,b) = \sum_{i=1}^n (y_i - y(x_i))^2 = \sum_{i=1}^n r_i = r^T r=\|r\|^2,
$$
where $y(x_i) = ax_i + b$ is the linear function and $r_i$ is the residual for the $i$-th data point. The least squares method then consists in finding the values of $a$ and $b$ that minimize the error function $L(a,b)$:
$$
\min_{a,b} L(a,b).
$$
In general this is an optimization problem that can be solved using the methods discussed in the lecture. However, in the case of a linear function $f(x) = ax + b$ we learned that the solution can be found in closed form. We can rewrite the error $r$ as
$$
r = \begin{pmatrix}
r_1 \\
r_2 \\
\vdots \\
r_n
\end{pmatrix} =
\begin{pmatrix}
1  x_1 \\
1  x_2 \\
\vdots \vdots \\
1  x_n
\end{pmatrix}
\begin{pmatrix}
a \\
b
\end{pmatrix} -
\begin{pmatrix}
y_1 \\
y_2 \\
\vdots \\
y_n
\end{pmatrix} = A p - y,
$$
where $p = (a,b)^T$ is the vector of parameters. This is similar to the linear system $Ax = b$ that we discussed in the lasst exercise. However, this time our system of equations is overdetermined, i.e. we have more equations than unknowns. In this case we cannot expect to find a solution $p$ such that $Ap = y$ (i.e. $r=0$). Instead we have to find a solution $p$ such that $Ap \approx y$, i.e. $r \approx 0$. The least squares solution is defined as the solution that minimizes the error $\|r\|^2$. We can find this solution by solving the normal equations
$$
A^T A p^* = A^T y.
$$
In the case of a linear function $f(x) = ax + b$ we can solve these equations in closed form and obtain the solution
$$
p^* = (A^T A)^{-1} A^T y.
$$
Remember that $(A^T A)^{-1} A^T$ is called the Moore-Penrose pseudoinverse of $A$ and is denoted by $A^+$. We can now compute the least squares solution $p^*$ for a given set of data points $(x_i, y_i)$, $i=1,\ldots,n$ if $A$ is not rank deficient. 

## Linear Least Squares Tasks

### Setup

First, we'll need to generate a set of data points that roughly follow a linear trend but with some noise added. We'll use the function $f(x) = 2x + 3$ to generate the $y_i$ values. We'll then add some Gaussian noise to the $y_i$ values to simulate real-world measurements.

In [None]:
using Pkg
Pkg.generate("learningF")
Pkg.activate("learningF")
Pkg.add("Plots")
Pkg.add("Statistics")

In [None]:
using Plots
using Statistics

In [None]:
# Generate x values
x = LinRange(-10, 10, 100)

# Generate y values with added noise
y = 2 .* x .+ 3 + randn(length(x))

# Plot the data points
scatter(x, y, label="Data Points")

### Task: Matrix Formation

The next step is to form the matrix $A$. 

1. Construct the matrix $A$. Remember, the first column should be all ones (for the constant term in the linear function) and the second column should be the $x_i$ values.

### Task: Solving the Normal Equations

Now, we are ready to solve the normal equations to get the least squares solution.

1. Compute the matrix $A^T A$ and the vector $A^T y$ in a seperate cell and view the results.

In [None]:
# Compute the matrix $A^T A$ 


In [None]:
# Compute the vector $A^T y$


2. Solve the system of equations $A^T A p = A^T y$ to find $p$ 

In [None]:
# Solve the linear system $A^T A p = A^T y$
p = ...

### Task: Verification

Finally, verify the solution you obtained. 

1. Compute the residuals $r_i$ and the total  mean squared error.

2. Plot the original data points, the true line $f(x)$, and the line corresponding to your solution. 

In [None]:
# Compute the residuals $r_i$ and the total mean squared error
residuals = ...
error = ...

In [None]:
# Plot the original data points, the true line $f(x)$, and the line corresponding to your solution.
plot!(x, p[1] .+ p[2] .* x, label="Least Squares Solution")
plot!(x, 2 .* x .+ 3, label="True Line")

### Task: Using the Julia built-in function
You can use also the build in function `\` to solve linear equations in Julia since it uses the least squares method if the system is overdetermined. 

1. Build a linear equation of type $Ax=b$ and solve it with the `\` function. Compare the result with the solution you obtained erlier.

In [None]:
# Solve the linear equation using the build in function
A = ...
b = ...
p = A \ b

# Compute the residuals $r_i$ and the total mean squared error
residuals = ...
error = ...
println("Error: ", error)

# Plot the original data points, the true line $f(x)$, and the line corresponding to your solution.
plot!(x, p[1] .+ p[2] .* x, label="Least Squares Solution")
plot!(x, 2 .* x .+ 3, label="True Line")

### Task: Implement a function that computes the least squares solution
Now we want to implement a function that computes the least squares solution for a given set of data points $(x_i, y_i)$, $i=1,\ldots,n$.

1. Implement a function `least_squares` that takes as input the vectors $x$ and $y$ and returns the least squares solution $p^*$ together with the matrix $A$. 

In [None]:
function least_squares(x, y)
    
    p, A
end

In [None]:
p, A = least_squares(x, y)
residuals = ...
error = ...
println("Error: ", error)

## Linear Least Squares using real world data

We measured the current $I$ of the first joint of our Openmanipulator robot for different goal currents $I_g$ and obtained the following data points seen in the image below: 

<img src="./omp_currents.png">

We want to find a linear function $I = f(I_g)$ that describes the relationship between the goal current $I_g$ and the actual current $I$. We want to know how accuarate we can predict the actual current $I$ for a given goal current $I_g$. Fist let's load the data points into Julia.

In [None]:
Pkg.add("CSV")

In [None]:
using CSV

csv = CSV.File(joinpath(@__DIR__, "omp_currents.csv"))      # load and parse csv

x_train = csv.columns[5].column                      # mask column 5 (Goal Current)
y_train = csv.columns[4].column                      # mask column 4 (Present Current)

mean = Statistics.mean(x_train)           # calculate the mean
std = Statistics.std(x_train)             # calculate the variance

function dataPreProcess(x, y)
    x = Float64.(x)
    y = Float64.(y)
    # make the data zero mean and unit variance
    ...
    x, y
end


function dataPostProcess(x, y)
    # data post procseeing (make it human frindly again)
    ... 
    x, y
end

# data pre processing
x_train, y_train = dataPreProcess(x_train, y_train)

In [None]:
# plot the goal current vs the present current
plot(x_train, label="Goal Current")
plot!(y_train, label="Present Current")

### Task: Compute the parameters of the linear function
Now we want to compute the parameters $p_1$ and $p_2$ of the linear function $I = f(I_g) = p_1*I_g + p_2$ that best fits the data points.

1. Compute the parameters $a$ and $b$ of the linear function $I = f(I_g) =p_1*I_g + p_2$ that best fits the data points using the least squares method you implemented earlier.

2. Estimate the mean squared error of the linear function.

In [None]:
... 

In [None]:
# Plot the original data points and the line corresponding to your solution
scatter(x_train, y_train, xlabel="Goal Current", ylabel="Present Current", label="Training Data", legend=:topleft)
# plot the least squares solution
plot!(x_train, p[1] .+ p[2] .* x_train, label="Least Squares Solution")

## Solving underdetermined systems using the pseudoinverse
If we have a linear system $Ax = b$ with more unknowns than equations, i.e. $A$ is rank deficient, then we cannot use the known tools to find a solution $x$ such that $Ax = b$. We can however define: 
$$
B = A^T
$$
and use the pseudoinverse of $B^+$ 
$$
B^+ = (B^T B)^{-1} B^T
$$
with 
$$
(B^+)^T = A^T (A A^T)^{-1}
$$
Hence, if we use the pseudoinverse of the system we find the solution with minimal norm, i.e. the solution that minimizes $\|x\|$. In this exercise we will use the pseudoinverse to find the least squares solution of an underdetermined system of equations.

### Setup

We first need to create an underdetermined system of equations. Let's make this system a little simpler for ease of computation.

In [None]:
# A 2x3 matrix $A$ with random values.
A = rand(2, 3)

# 2-dimensional vector $b$ also with random values.
b = rand(2)

### Task: Compute the Pseudoinverse

Now we will compute the pseudoinverse $B^+$ of the transposed matrix $B = A^T$.

1. Compute the matrix $B$ by transposing $A$.

2. Compute the pseudoinverse $B^+$ using the formula provided. Note that this might involve computing a matrix inverse.

In [None]:
# Compute the matrix $B$ by transposing $A$.

# Compute the pseudoinverse $B^+$ using B^+ = (B^T B)^{-1} B^T


### Task: Solve the Underdetermined System

With the pseudoinverse $B^+$, we can now solve for $x$.

1. Compute the solution $x = (B^+)^T b$.

2. Print the solution $x$.

In [None]:
# Compute the solution $x = (B^+ b$.

# Print the solution $x$.


### Task: Verification

Finally, we should verify the solution we obtained.

1. Compute the vector $Ax$ and compare it with $b$. Given that we have an underdetermined system, they do not have to be the same.

2. Compute the norm of $x$ 

3. Compare your solution with the one obtained using the function `\`. Also compare the norm of the solution obtained with `\` with the norm of the solution obtained with the pseudoinverse.

In [None]:
# Compute the vector $Ax$ and compare it with $b$. Given that we have an underdetermined system, they do not have to be the same. 


# Compute the norm of $x$ and discuss why this solution is preferable when we have an underdetermined system.


# compare to \
