# Covariance Matrices in High Dimensions

### Sang-Yun Oh

# Covariance Matrices

* Suppose data consist of $p$ variables and $n$ measurements
* Variables: blood pressure, height, weight, etc.
* Measurements: $n$-subjects, units, items, etc.
* Data matrix: $\boldsymbol{Y}\in\mathbb{R}^{n\times p}$

## Covariance Matrix

* Covariance matrix is defined as
$$
\Sigma=E(\boldsymbol{Y}-\boldsymbol{\mu})(\boldsymbol{Y}-\boldsymbol{\mu})^{\prime}=\left(\sigma_{i j}\right),
$$

* Sample covariance matrix is
$$
\mathbf{S}=\frac{1}{n} \sum_{i=1}^{n}\left(\boldsymbol{Y}_{i}-\overline{\boldsymbol{Y}}\right)\left(\boldsymbol{Y}_{i}-\overline{\boldsymbol{Y}}\right)^{\prime},
$$
* Other types of covariance estimates

## Examples of Covariance Matrices

* Stocks: $p=4300+$ companies and 20 days per month  
    Relationship between stocks (volatility structure)

* Genomics: $p\approx 20000$ genes and 100s of subjects  
    Co-expression of genes (gene relevance network)

* Neuroimaging: $p=90000$ voxels (or hundreds of aggregated ROIs) and thousands of time points  
    Functional connectivity network

* Ecology: $p=23$ environmental variables and $n=12$ locations  
    Community abundance (Warton 2008)
    
**Data are often high dimensional**

## Properties of Covariance Matrices

* Matrix $\Sigma$ is symmetric
* All eigenvalues are nonnegative
* Nonnegative definite
* If singular, dimensionality can be reduced

## Estimation of Covariance Matrices

* High dimensionality makes covariance estimation challenging
* $O(p^2)$ parameters (high estimator variance)
* $\mathbf{S}$ is singular when $p>n$
* Even when $p<n$, eigenvalues can be inaccurate

![](images/eigenvalues.png)

## Uses of Covariance Matrices

* Markowitz Portfolio Problem (encodes market volatility)
* Regression (OLS)  
$$
\widehat{\beta}_{\text {OLS}}=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \boldsymbol{X}^{\prime} \boldsymbol{Y}
$$
* Canonical Correlation Analysis (CCA): associations among two sets of variables ([examples](https://stats.idre.ucla.edu/r/dae/canonical-correlation-analysis/))
* Input to clustering algorithm
* Inverse of $\Sigma$ often needed but doesn't exist in high dimensional setting

## Regularization

* Eigenvalue structure (Ledoit-Wolf, condition number)

* Sparsity pattern (graphical models)

* Structural assumptions (banding, tapering, low-rank)

All stabilize estimates (bias variance trade off)

## Eigen-structure regularization

* Ledoit-Wolf Estimator (Ledoit and Wolf, 2004)
$$
\widehat{\mathbf{\Sigma}}=\alpha I+(1-\alpha) \mathbf{S}
$$

* Condition number regularization (Won et al., 2009)
$$
\begin{array}{ll}
\operatorname{maximize} & l(\Sigma) \\
\text { subject to } & \operatorname{cond}(\Sigma) \leq \kappa_{\max },
\end{array}
$$  
    where $l(\Sigma)$ is the Gaussian Likelihood

* Eigenvector regularization (sparse PCA, SVD, etc)

## Sparsity pattern (Graphical Model)

* $\mathbf{\Theta}=\mathbf{\Sigma}^{-1}$ appear in many situations
* $\mathbf{\Theta}$ can be regularized directly

* Assume $\boldsymbol{Y}_{1}, \ldots, \boldsymbol{Y}_{n} \sim N_{p}(\boldsymbol{0}, \boldsymbol{\Sigma})$ and 
$$
L(\boldsymbol{\Sigma})=\frac{1}{(2 \pi)^{n p / 2}|\boldsymbol{\Sigma}|^{n / 2}} \exp \left\{-\frac{1}{2} \sum_{i=1}^{n} \boldsymbol{Y}_{i}^{\prime} \boldsymbol{\Sigma}^{-1} \boldsymbol{Y}_{i}\right\} .
$$

* Compute $\boldsymbol{\Theta}$:
$$
\ell_{P}(\boldsymbol{\Theta})=\log |\boldsymbol{\Theta}|-\operatorname{tr}(\mathbf{S} \Theta)-\lambda\|\boldsymbol{\Theta}\|_{1},
$$

Reference: [High‐Dimensional Covariance Estimation](https://ucsb-primo.hosted.exlibrisgroup.com/permalink/f/1egv95m/01UCSB_ALMA51276966020003776) by Mohsen Pourahmadi