Skip to content

dsyoonstat/argpca

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

argpca: Adaptive Reference-Guided PCA for High-Dimension, Low-Sample Size Data

argpca is a Python implementation of Adaptive Reference-Guided PCA (ARG-PCA),
a method designed for high-dimension, low-sample-size (HDLSS) data when prior information about the true PC subspace is available, proposed in Yoon and Jung (2025).


Why ARG-PCA?

Classical PCA performs poorly in HDLSS settings: the sample PC subspace is inconsistent,
and the principal angles between the sample and true PC subspaces converge to a non-zero
random limit. When prior information is available---such as domain-specific directions
known (or believed) to be aligned with the true PC subspace---ARG-PCA leverages this to
improve estimation accuracy. A representative example is the normalized vector of ones,
often used in financial applications to reflect the common market factor,
as in the capital asset pricing model. The ARG PC subspace estimator asymptotically
outperforms the naive PCA based estimator. ARG-PCA is built based on ARG PC subspace estimator.


Features

  • ARG PC subspace estimator (compute_arg_pc_subspace)
  • ARG-PCA (ARGPCA) with a scikit-learn–compatible API
  • Fast PCA for HDLSS settings via Gram-matrix eigen-decomposition
  • Simulation and real-data examples reproducing the empirical results in Yoon and Jung (2025)

Installation

Clone the repository and install dependencies:

git clone https://github.com/dsyoonstat/argpca.git
cd argpca
pip install ".[examples]"

Quick Start

ARG-PCA

import numpy as np
from argpca.pca import ARGPCA

# X: (n, p) data matrix
# V: (r, p) reference vectors (rows)
X = ...
V = ...

model = ARGPCA(n_components=2)
model.fit(X, reference_vectors=V)

scores     = model.transform(X)     # (n, 2)
components = model.components_      # (2, p)

ARG PC subspace estimator

from argpca.pca import compute_arg_pc_subspace

U_arg = compute_arg_pc_subspace(
    samples=X,
    reference_vectors=V,
    n_components=2,
)

Public API

from argpca import ARGPCA, compute_arg_pc_subspace

# advanced / low-level utilities
from argpca import utils
# or, more explicitly:
from argpca.utils import GramSpectrum, compute_gram_spectrum, recover_spike_directions

Repository Structure

argpca/
 ├── src/argpca/
 │    ├── pca.py               # ARGPCA + subspace logic
 │    ├── utils.py             # GramSpectrum, Gram PCA utilities
 │    └── __init__.py
 │
 ├── examples/
 │    ├── simulation/          # Monte Carlo experiments
 │    │     ├── simulation.py
 │    │     ├── dgps.py
 │    │     └── metrics.py
 │    ├── realdata/            # NASDAQ 2024-12 analysis
 │    │     ├── realdata.py
 │    │     └── *.csv
 │
 ├── tests/                    # pytest unit tests
 └── pyproject.toml

Running the Simulations

To reproduce the simulation results from of the paper, run:

python examples/simulation/simulation.py

Results are stored under:

examples/simulation/results/

Real Data Example (NASDAQ 2024-12)

To reproduce the real data analysis results of the paper, run:

python examples/realdata/realdata.py

This script generates a scatter plot comparing:

  • ARGPCA PC1–PC2 scores
  • Standard PCA PC1–PC2 scores

Saved as:

examples/realdata/nasdaq_2024_12_pc_scores.png

For reference vectors, normalized vector of ones and 2024 mean log-returns were used.


License

This project is released under the MIT License.


Citation

If you use this package in academic work, please cite:

@article{Yoon2025,
  author    = {Yoon, Dongsun and Jung, Sungkyu},
  title     = {Adaptive Reference-Guided Estimation of Principal Component Subspace in High Dimensions},
  journal   = {Stat},
  volume    = {14},
  number    = {3},
  pages     = {e70081},
  year      = {2025},
  doi       = {10.1002/sta4.70081},
  url       = {https://onlinelibrary.wiley.com/doi/abs/10.1002/sta4.70081}
}

Contributing

Pull requests and bug reports are welcome.
Please use GitHub Issues for questions or feature requests.

About

Adaptive Reference-Guided Principal Component Analysis

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages