# Question 1

Load the data from the file *trajectory.csv*. The data consists of three rows representing a trajectory. The first row represents time, the second row represents the $x$-coordinate at each timestep, and the third row represents the $y$-coordinate.

**Q1 a (3 marks)**

Plot the $x$-coordinate against time (i.e., t-x figure) and also for the $y$-coordinate (i.e., t-y figure). Also plot the trajectory in 2D (i.e., x-y figure).

You need to respectively use variables 't', 'x', 'y' to represent time, x-coordinate, and y-coordinate, as these variables will be needed in other questions.

### Linear regression using two different sets of basis functions
 
The following code blocks show two attempts to perform linear regression. The only difference is the choice of basis functions. The first one is based on exponential:

In [None]:
import numpy as np
basis_num = 50 # number of basis functions

def exponential_basis_function(t, basis_num):
    c = np.zeros(basis_num)  # centers of basis functions
    h = np.zeros(basis_num)  # widths of basis functions
    for i in range(basis_num):
        c[i] = 1.0 / basis_num * i
        h[i] = 50
    res = np.exp(-h * (t - c) ** 2)
    return res

def fit_exponential(basis_num, t):
    phi_pred = np.zeros((len(t), basis_num))    # shape: 200 * basis_num 
    for idx, tt in enumerate(t):
        phi = exponential_basis_function(tt, basis_num)
        addsum = np.sum(phi, axis=-1)
        phi_pred[idx] = phi / addsum
    return phi_pred

The second one uses polynomials as basis functions:

In [None]:
def polynomial_basis_function(t):
    # a polynomial basis function
    res1 = np.ones_like(t)
    res2 = t
    res3 = t**2
    res4 = t**3
    res = np.stack([res1, res2, res3, res4]).T
    return res

Fit the 2-D trajectory using exponential basis functions and predict a 2-D trajectory [x y] for all time inputs t

In [None]:
Phi = fit_exponential(basis_num, t)  
pos = np.stack([x, y]).T
w = np.linalg.pinv(Phi.T @ Phi) @ Phi.T @ pos
predict_pos = Phi @ w

**Q1 b (6 marks)**

b1. The correct predictions for the exponential basis functions have already been provided above. Employ linear regression to fit the trajectories with polynomial basis functions (defined by 'polynomial_basis_function(t)').

b2. Plot the $x$-coordinate of the original data set against time (i.e., t-x plot) in one figure. Do the same for the $y$-coordinate (i.e., t-y plot) in another figure. Now overlay the predictions using the exponential basis functions (i.e., plot "predicted x against t" and "original x against t" in the same figure, plot "predicted y against t" and "original y against t" in the same figure). Do the same for the predictions based on the polynomial basis functions.

b3. Also plot the 2D (i.e., x-y) trajectory of the original data and the predicted 2D trajectories for the exponential and polynomial basis functions.

**Q1 c (8 marks)**

If you have done the previous question correctly, one set of basis functions appears to deliver much better predictions than the other set of basis functions. On closer inspection you will find that the number of basis functions used in the best predictions is much larger than for the poor predictions. We must compare like with like so modify the code so that
you can control the number of basis functions. Then plot both sets of predictions for 2, 5, 10 and 50 basis functions. Show these 2D plots and comment if you can really say that one set of basis functions outperforms the other, qualitatively.

**Q1 d (5 marks)**

Consider what basis functions (beyond exponential or the polynomial basis functions) might be suitable to model the data "data2.npy". In "data2.npy", the first column is time t while the second column is x-coordinate. You should be able to do this with a small set of basis functions and get a good fit. 

Plot your prediction and the original data in the same figure (i.e., t-x).

In [None]:
import numpy as np
import matplotlib.pyplot as plt

#  plot the trajectory
data = np.load("data2.npy") # size: 200*2, the first column is time input while the second column is x-coordinate.
t = data[:, 0]
target = data[:, 1]

plt.plot(t, target, label='target')
plt.legend()
plt.xlabel("t")
plt.ylabel("x")
plt.show()

# Question 2

**Q2 a (4 marks)**

You are given a data array called "shape_array.npy" that comprises 7 samples organised as columns in the array, where each column corresponds to one sample. The data format in each column is: [x_1, y_1, z_1, x_2, y_2, z_2, ………, x_N, y_N, z_N], where (x_i, y_i, z_i) corresponds to the i-th 3D point of a blood vessel. By plotting all 3D points in one column, you can obtain the shape of a blood vessel of that sample.

Plot seven figures to show the 3D blood vessel shape for each sample separately. Also plot two arbitrary shapes on top of each other to get a feeling of how similar or dissimilar the shapes are.

**Q2 b (10 marks)** 

Next, perform eigendecomposition of the covariance matrix estimated from the given data array. Finally, project original data onto lower-dimensional space and reconstruct data.

Proceed as follows:

1. Subtract the mean from the data, so that it is centered around the origin.

2. Estimate the covariance matrix from the centred data.

3. Calculate eigenvectors and eigenvalues using numpy functions

4. Project centered data (1845 dimension) into a lower-dimension space (You need to choose a reasonable dimension). 

5. Reconstruct the blood vessel shape from the lower dimension data in step 4.


As a sanity check plot a blood vessel shape reconstructed from the eigenvectors on top of the original blood vessel shape. Explain how much data reduction you have achieved. Comment on your results.

**Q2 c (4 marks)** 

Research PCA analysis using the *scikit-learn* library. Perform PCA analysis and show the reconstructed data of any blood vessel shape on top of the original blood vessel shape. There are variables in the PCA object that correspond to the eigenvalues used for choosing projection eigenvectors. Compare the eigenvalues  and eigenvectors you have computed in the previous question with the eigenvalues  and the eigenvectors computed by the *scikit-learn* library. Compare the reconstructed coordinates from both methods. Comment on your results.