Probabilistic Principal Component Analysis (PPCA) model using DeepMind's JAX library. The model is a robust feature extraction and dimensionality reduction technique for high-dimensional, sparse multivariate data.
PPCA is a probabilistic approach to Principal Component Analysis (PCA), which allows for imputing missing values and estimating latent features in the data. By leveraging the power of JAX, this implementation ensures efficient and scalable computation, making it suitable for large-scale financial datasets.
The methodology used in this project was initially proposed in our research manuscript titled "Probabilistic PCA in High Dimensions: Stochastic Dimensionality Reduction on Sparse Multivariate Assets' Bars at High-Risk Regimes". This work presents a novel approach for analyzing portfolio behavior during periods of high market turbulence and risk by:
- Using information-driven bar techniques to synchronize and sample imbalanced sequence volumes.
- Applying a sampling event-based technique, the CUMSUM Filtering method, to create strategic trading plans based on volatility.
- Employing an improved version of the Gaussian Linear System called PPCA for feature extraction from the latent space.
Our findings suggest that PPCA is highly effective in estimating sparse data and forecasting the effects of individual assets within a portfolio under varying market conditions. This repository contains the core implementation of the PPCA model, demonstrating its capability to establish significant relationships among correlated assets during high-risk regimes.
.
├── LICENSE
├── README.md
├── config
├── data
│ ├── bars
│ ├── metadata
│ ├── sample
│ │ ├── r1
│ │ └── r2
│ └── tickers
├── models
├── notebooks
├── pyproject.toml
├── reports
│ ├── docs
│ ├── eval
│ ├── figures
│ └── train
├── src
│ ├── __init__.py
│ ├── eval
│ ├── ft_eng
│ ├── ppcax
│ │ ├── __init__.py
│ │ └── _ppcax.py
│ ├── preprocessing
│ └── utils
└── tests
├── __init__.py
├── gen_data.py
└── test_ppcax.py
25 directories, 17 files
- Python: Ensure you have Python 3.10 or newer installed on your system.
-
Clone the Repository
git clone https://github.com/AI-Ahmed/ppcax.git cd ppcax
-
Install Flit
If you don't already have Flit installed, install it using
pip
:pip install flit
-
Install the Package and Dependencies
Install the package along with its dependencies using Flit:
flit install --deps develop
This command installs the
ppcax
package along with all required dependencies, including development and testing tools likepytest
andflake8
.
If you prefer to install the package directly from GitHub without cloning the repository:
pip install git+https://github.com/AI-Ahmed/ppcax
This command installs the latest version of ppcax
from the main branch.
After installation, you can import the PPCA model in your Python code:
from ppcax import PPCA
To run the unit tests and ensure everything is working correctly:
-
Navigate to the Project Directory
If you haven't already, navigate to the project's root directory:
cd ppcax
-
Run Tests Using pytest
pytest tests/test.py
Here's a simple example of how to use the PPCA
class:
import numpy as np
from ppcax import PPCA
# Generate some sample data
data = np.random.rand(100, 1000)
# Create a PPCA model instance
ppca_model = PPCA(q=150)
# Fit the model to the data
ppca_model.fit(data, use_em=True)
# Transform the data to the lower-dimensional space
transformed_data = ppca_model.transform(lower_dim_only=True)
print("Transformed Data Shape:", transformed_data.shape)
This project is licensed under the Apache License 2.0, which is a permissive open-source license that grants users extensive rights to use, modify, and distribute the software. See the LICENSE file for more details.
If you find this work useful in your research, please consider citing:
@article{Atwa2024,
author = {Ahmed Atwa and Ahmed Sedky and Mohamed Kholief},
title = {Probabilistic PCA in High Dimensions: Stochastic Dimensionality Reduction on Sparse Multivariate Assets' Bars at High-Risk Regimes},
journal = {SSRN Electronic Journal},
year = {2024},
note = {Available at SSRN: \url{https://ssrn.com/abstract=4874874} or \url{http://dx.doi.org/10.2139/ssrn.4874874}}
}
If you're planning to contribute to the project or modify the code, follow these steps to set up your development environment:
-
Clone the Repository
git clone https://github.com/AI-Ahmed/ppcax.git cd ppcax
-
Create a Virtual Environment
It's recommended to use a virtual environment to manage dependencies:
python -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate
-
Install Flit
pip install flit
-
Install the Package in Editable Mode
- For development and testing, install the package with the
test
extras:
flit install --deps develop --extras test --symlink
The
--symlink
option installs the package in editable mode, so changes to the code are immediately reflected without reinstallation. - For development and testing, install the package with the
-
Install Pre-commit Hooks (Optional)
If you use
pre-commit
for code formatting and linting:pip install pre-commit pre-commit install
-
Run Tests
pytest tests/test.py
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
For any questions or inquiries, please contact Ahmed Nabil Atwa.
Refer to the CHANGELOG for details on updates and changes to the project.
To publish a new version of the package to PyPI:
-
Update the Version Number
Increment the version number in
pyproject.toml
. -
Build the Package
flit build
-
Publish to PyPI
flit publish
- Documentation: Github Package documentation
- Issue Tracker: GitHub Issues
- Source Code: GitHub Repository