 ## Dimensionality reduction
 
Dimensionality reduction is a method to represent a given dataset with lesser number of features (dimensions) that it originaly had. This provides advantages for clustering, classification and in general interpret  data.

In this section we will use an example taken from petrophysics where the task is to perform simple depth alignment of borehole logs. The examples are taken from a research paper:
[S. Acharya and K. Fabian, 2024](https://asmedigitalcollection.asme.org/OMAE/proceedings/OMAE2024/87868/V008T11A019/1202880)
The problem is illustrated by the figure below which shows the same log type, but measured at different times and with different systems resulting in a misalignment in depth.
![Log depth misalignment](clusterlogs.png)

# The data set
The data set we are going to use consists of four different well logs; Bulk density, Resistivity, gamma ray and neutron porosity.
Before application of machine learning each of the logs are standarized to have a mean value of 0 and a standard deviation of 1, as shown in the figure below.

![Log normalization](dim-norm.png)

    
# Principal Component Analysis

The log dataset is organized into a matrix $\mathbf{X}$ where each of four column contains
the (normalized) log measurements. The number of measurements is of the order of $N=10^5$ and there is a row for each measurement. The dimension of $\mathbf{X}$ is thus 
$N \times 4$.
The $mathbf{X}$ matrix can be factorized into three separate matrices using so-called
SVD decomposition

\begin{eqnarray}
\mathbf{X} = \mathbf{U} \mathbf{\Sigma} \mathbf{W}
\end{eqnarray}

Here $\mathbf{\Sigma}$ is an $N \times 4$ matrix where the diagonal contains the so-called
singular values of $\mathbf{X}$. $\mathbf{W}$ is a $4 \times 4 $ matrix whose columns contains orthogonal unit vectors, while $\mathbf{U}$ is an $N\times N$ matrix whos columns are ortogonal unit vectors.
We can now define a new transformed data matrix by the relation

\begin{eqnarray}
\mathbf{T} = \mathbf{X} \mathbf{W} = \mathbf{U}\mathbf{\Sigma}
\end{eqnarray}

The $\mathbf{T}$ is a projection of the original data onto the orthogonal unit vectors
defined by the columns of $\mathbf{W}$.

# Dimensionality reduction
The $\mathbf{T}$ can be approximated with a truncated matrix $\mathbf{T}_L$ by using
only the $L < 4 $ largest singular values of $\mathbf{\Sigma}$.

\begin{eqnarray}
\mathbf{T}_L = \mathbf{X} \mathbf{W}_L = \mathbf{U}_L\mathbf{\Sigma}_L
\end{eqnarray}

Here $\mathbf{W}_L$ is an $N\times L$ matrix. The first column of the truncated matrix 
$\mathbf{T}$ corresponds to the so-called
first principal value of $\mathbf{X}$. The second principal component would correspond to the second column
of $\mathbf{T}$ while the third and fourth principal component are given by the third and fourth column.

The principal component has the property that the transformed data set includes the data in the original dataset with the maximum variance. So that the first component maps data with larger variance than the second, third and fourth component. The idea is then that the datapoints with the largest variance contains the largest amount of information. In the figure below the variance of the four principal components are plotted.

![Variance of principal components](dim-var.png)

We see that the two first columns contains 80$\%$ of the variance. The dataset can then be reduced from a
dimension of $N\times 4$ to $N\times 2$. 

To identify the layers the next step would be to use the kmeans clustering algorithm on the transformed and
truncated dataset $\mathbf{T}_L$.

The result of the procedure is shown in the figure below.

![pca](dim-pca.png)

Below is the result of using the kmeans clustering algorithm on the full dataset.
Note that the pca based method detects an extra layer.

![kmeans](dim-kmeans.png)


# Autoencoder

An autoencoder works in a similar way as the pca dimensionality reduction. It is a neural network which learns to encode a data vector into a representation with lower dimensionality. The network also have a decode stage which uses the encoded data set to reconstruct the original dataset, i.e. it increases the dimensionality.
The figure below shows the layout of the network.

![Autodecoder](dim-autoencoder.png)

The network is trained by using a loss function which minimizes the error between an input vector $\mathbf{x}$
and the output vector $\hat{\mathbf{x}}$ :

\begin{eqnarray}
  $\mbox{min}\, ||\mathbf{x}-\hat{\mathbf{x}}
\end{eqnarray}

We can use an autoencoder network to perform dimensionality reduction on our log dataset, as shown in the figure below.

![Autodecoder](dim-autolog.png)

Here we reduce the dimensionality of the log dataset from four into two. The figure below shows the training error
and the estimation error for the log data sets as function of the number of epochs.

![Autodecoder](dim-err.png)

The error function is reduced by approximately 50 $\%$ over 100 epochs, which is not fantastically good, but tolerable.

The output from the middle layer of the autoencoder gives a new transformed dataset with two dimensions  which is used as an input for the k-means clustering algorithm.
The result is shown in the figure below

![Autodecoder](dim-autocluster.png)

Comparing this with the pca based clustering we see that the autoencoder essentially gives the same results.

Both the pca based clustering and the autoencoder based clustering performs better than the clustering using the k-means algorithm on the original (not dimensionality reduced) dataset. A possible explanation for this is that reducing the dimension of the input data set also removes noise from the data. This is shown in the table below
which summarises clustering errors for the different algorithms.

![Autodecoder](dim-res3.png)

Finaly the table below shows a summary of the effectivness of the three different algorithms used in the study.

![Autodecoder](dim-res4.png)






