GitHub - MaxHalford/prince: :crown: Multivariate exploratory data analysis in Python

Prince is a Python library for multivariate exploratory data analysis in Python. It includes a variety of methods for summarizing tabular data, including principal component analysis (PCA) and correspondence analysis (CA). Prince provides efficient implementations, using a scikit-learn API.

I made Prince when I was at university, back in 2016. I spent a significant amount of time in 2022 to revamp the entire package. It is thoroughly tested and supports many features, such as supplementary row/columns, as well as row/column weights.

Example usage

>>> import prince

>>> dataset = prince.datasets.load_decathlon()
>>> decastar = dataset.query('competition == "Decastar"')

>>> pca = prince.PCA(n_components=5)
>>> pca = pca.fit(decastar, supplementary_columns=['rank', 'points'])
>>> pca.eigenvalues_summary
          eigenvalue % of variance % of variance (cumulative)
component
0              3.114        31.14%                     31.14%
1              2.027        20.27%                     51.41%
2              1.390        13.90%                     65.31%
3              1.321        13.21%                     78.52%
4              0.861         8.61%                     87.13%

>>> pca.transform(dataset).tail()
component                       0         1         2         3         4
competition athlete
OlympicG    Lorenzo      2.070933  1.545461 -1.272104 -0.215067 -0.515746
            Karlivans    1.321239  1.318348  0.138303 -0.175566 -1.484658
            Korkizoglou -0.756226 -1.975769  0.701975 -0.642077 -2.621566
            Uldal        1.905276 -0.062984 -0.370408 -0.007944 -2.040579
            Casarsa      2.282575 -2.150282  2.601953  1.196523 -3.571794

>>> chart = pca.plot(dataset)

This chart is interactive, which doesn't show on GitHub. The green points are the column loadings.

>>> chart = pca.plot(
...     dataset,
...     show_row_labels=True,
...     show_row_markers=False,
...     row_labels_column='athlete',
...     color_rows_by='competition'
... )

Installation

pip install prince

🎨 Prince uses Altair for making charts.

Methods

flowchart TD
    cat?(Categorical data?) --> |"✅"| num_too?(Numerical data too?)
    num_too? --> |"✅"| FAMD
    num_too? --> |"❌"| multiple_cat?(More than two columns?)
    multiple_cat? --> |"✅"| MCA
    multiple_cat? --> |"❌"| CA
    cat? --> |"❌"| groups?(Groups of columns?)
    groups? --> |"✅"| MFA
    groups? --> |"❌"| shapes?(Analysing shapes?)
    shapes? --> |"✅"| GPA
    shapes? --> |"❌"| PCA

Principal component analysis (PCA)

Correspondence analysis (CA)

Multiple correspondence analysis (MCA)

Multiple factor analysis (MFA)

Factor analysis of mixed data (FAMD)

Generalized procrustes analysis (GPA)

Correctness

Prince is tested against scikit-learn and FactoMineR. For the latter, rpy2 is used to run code in R, and convert the results to Python, which allows running automated tests. See more in the tests directory.

Citation

Please use this citation if you use this software as part of a scientific publication.

@software{Halford_Prince,
    author = {Halford, Max},
    license = {MIT},
    title = {{Prince}},
    url = {https://github.com/MaxHalford/prince}
}

License

The MIT License (MIT). Please see the license file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 455 Commits
.github		.github
docs		docs
prince		prince
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Example usage

Installation

Methods

Principal component analysis (PCA)

Correspondence analysis (CA)

Multiple correspondence analysis (MCA)

Multiple factor analysis (MFA)

Factor analysis of mixed data (FAMD)

Generalized procrustes analysis (GPA)

Correctness

Citation

License

About

Uh oh!

Sponsor this project

Uh oh!

Uh oh!

Contributors 13

Uh oh!

Languages

Uh oh!

License

MaxHalford/prince

Folders and files

Latest commit

History

Repository files navigation

Example usage

Installation

Methods

Principal component analysis (PCA)

Correspondence analysis (CA)

Multiple correspondence analysis (MCA)

Multiple factor analysis (MFA)

Factor analysis of mixed data (FAMD)

Generalized procrustes analysis (GPA)

Correctness

Citation

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Uh oh!

Contributors 13

Uh oh!

Languages