# Fitting a line in 3D

Source: [Stackoverflow](https://stackoverflow.com/a/2333251)

If you are trying to predict one value from the other two, then you should use lstsq with the a argument as your independent variables (plus a column of 1's to estimate an intercept) and b as your dependent variable.

If, on the other hand, you just want to get the best fitting line to the data, i.e. the line which, if you projected the data onto it, would minimize the squared distance between the real point and its projection, then what you want is the first principal component.

One way to define it is the line whose direction vector is the eigenvector of the covariance matrix corresponding to the largest eigenvalue, that passes through the mean of your data. That said, eig(cov(data)) is a really bad way to calculate it, since it does a lot of needless computation and copying and is potentially less accurate than using svd. See below:

Generate some data that lies along a line 
and perturb with some Gaussian noise.

For a description of ```np.newaxis``` look here
[stackoverflow](https://stackoverflow.com/a/41267079).

For a explanation of ```j``` in ```np.mgrid``` look here
[Numpy Docs](https://numpy.org/doc/stable/reference/generated/numpy.mgrid.html).

In [None]:
import numpy as np

%matplotlib inline

x = np.mgrid[-2:5:120j]
y = np.mgrid[1:9:120j]
z = np.mgrid[-5:3:120j]

data = np.concatenate((x[:, np.newaxis], 
                       y[:, np.newaxis], 
                       z[:, np.newaxis]), 
                      axis=1)

data += np.random.normal(size=data.shape) * 0.4
pass

Calculate the mean of the points, i.e. the 'center' of the cloud.

Do an SVD on the mean-centered data.

Now ```vv[0]``` contains the first principal component, i.e. the direction
vector of the 'best fit' line in the least squares sense.

In [None]:
data_mean = data.mean(axis=0)
uu, dd, vv = np.linalg.svd(data - data_mean)

Now generate some points along this best fit line, for plotting.

I use ```-7, 7``` since the spread of the data is roughly ```14```
and we want it to have mean ```0``` (like the points we did
the svd on). Also, it's a straight line, so we only need two points.

Shift by the mean to get the line in the right place.

In [None]:
line_points = vv[0] * np.mgrid[-7:7:2j][:, np.newaxis]
line_points += data_mean

Verify that everything looks right.

In [None]:
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d as m3d

ax = m3d.Axes3D(plt.figure())
ax.scatter3D(*data.T)
ax.plot3D(*line_points.T)
plt.show()