Factor analysis is a very popular method in statistics that reduces
linear dimensionality of a particular model. The best way to understand
factor analysis is to consider a generative model.

### Basic factor analysis model

A basic factor
analysis model is of the form
$$
\begin{eqnarray}
x & = & Lf+\epsilon,
\end{eqnarray}
$$

where $x\in\mathbf{R}^{n}$ is the observed random vector, $f\in\mathbf{R}^{r}$
(with $r\leq n$) is a random vector of common factor variables or
scores, $L\in\mathbf{R}^{n\times r}$ is a matrix of factor loadings,
and $\epsilon\in\mathbf{R}^{n}$ is a vector of uncorrelated random
variables. 

The observed random vector $x$ may contain series of achievement
tests, psychological evaluation, intellectual performance etc. Without
loss of generality, we assume that $x$ is mean-centered *i.e.*,
$\mathbf{E}(x)=0$, the vectors $f$ and $\epsilon$ are uncorrelated,
and the covariance matrix of $f$ is the identity matrix.

We will denote $\mathbf{cov}(\epsilon)=D=\mathbf{diag}(d_{1},d_{2},\ldots,d_{n}).$
Then the covariance matrix of $x$, denoted by $\Sigma$ can be written
as: 
$$
\Sigma=X+D,
$$
 where $X=LL^{T}$ is the covariance matrix corresponding to the common
factors. The statistical method of factor analysis involves looking
at $N$ samples generated by the model (1), *i.e.*, given $x_{1},\ldots,x_{N}$
generated by (1) we want to estimate the matrices $X$ and $D$.

### Optimization problem in consideration

The optimization problem to determine the matrices $X,D$ can be written
as: 

$$
\begin{equation}
\begin{array}{ll}
\textrm{minimize} & \left\Vert \Sigma-X-D\right\Vert _{F}^{2}\\
\textrm{subject to} & D=\mathbf{diag}(d)\\
 & d\geq0\\
 & X\succeq0\\
 & \mathbf{rank}(X)\leq r\\
 & \Sigma-D\succeq0\\
 & \|X\|_{2}\leq M,
\end{array}
\end{equation}
$$

where $X\in\mathbf{S}^{n}$ and the diagonal matrix $D\in\mathbf{S}^{n}$
with nonnegative diagonal entries are the decision variables, and
$\Sigma\in\mathbf{S}_{+}^{n}$ (a positive semidefinite matrix), $r\in\mathbf{Z}_{+},$
and $M\in\mathbf{R}_{++}$ are the problem data. 

A proper solution for the optimization problem above requires that
both $X$ and $D$ are positive semidefinite. Furthermore, when $\Sigma-D$,
which is the covariance matrix for the common parts of the variables,
is not positive semidefinite, that would as embarrassing as having
a negative unique variance in $D$, as noted by ten Berge [here, page 326](https://link.springer.com/chapter/10.1007/978-3-642-72253-0_44). To prevent the aforementioned undesriable situation we enforce the constraint $\Sigma-D\succeq0$.



### Nuclear norm heuristic

The optimization problem above is nonconvex. To approximately solve
it, we use the nuclear norm heuristic. In this heuristic, we solve
the following relaxed convex optimization problem: 

$$
\begin{array}{ll}
\textrm{minimize} & \left\Vert \Sigma-X-D\right\Vert _{F}^{2}+\lambda\left\Vert X\right\Vert _{*}\\
\textrm{subject to} & D=\mathbf{diag}(d)\\
 & d\geq0\\
 & X\succeq0\\
 & \Sigma-D\succeq0\\
 & \|X\|_{2}\leq M,
\end{array}
$$

where $\lambda$ is a positive parameter that is related to the rank
of the decision variable $X$. Note that, as $X$ is positive semidefinite,
we have $\|X\|_{*}=\mathbf{tr}(X).$ 

To compute the value of $\lambda$ corresponding to a desired $r$
such that $\mathbf{rank}(X)\leq r$ we solve the relaxed problem for
different values of $\lambda$, and find the smallest value of $\lambda$
for which we have $\mathbf{rank}(X)\leq r$. 

In [3]:
using CSV
using DataFrames

In [12]:
present_dir = pwd()

"C:\\Users\\shuvo\\Google Drive\\GitHub\\blog\\codes"

In [16]:
## Download the data
df = download("https://userpage.fu-berlin.de/soga/300/30100_data_sets/food-texture.csv", string(present_dir,"\\food-texture.csv"))

"C:\\Users\\shuvo\\Google Drive\\GitHub\\blog\\codes\\food-texture.csv"

In [17]:
isfile("food-texture.csv")

true

In [19]:
df = CSV.read("food-texture.csv")

Unnamed: 0_level_0,Column1,Oil,Density,Crispy,Fracture,Hardness
Unnamed: 0_level_1,String,Float64,Int64,Int64,Int64,Int64
1,B110,16.5,2955,10,23,97
2,B136,17.7,2660,14,9,139
3,B171,16.2,2870,12,17,143
4,B192,16.7,2920,10,31,95
5,B225,16.3,2975,11,26,143
6,B237,19.1,2790,13,16,189
7,B261,18.4,2750,13,17,114
8,B264,17.5,2770,10,26,63
9,B353,15.7,2955,11,23,123
10,B360,16.4,2945,11,24,132


In [None]:
As we see from the code above, the dataset has 50 rows and 6 columns.