+++
title = "Correspondence analysis"
menu = "main"
weight = 2
toc = true
aliases = ["ca"]
+++

## Resources

- [Theory of Correspondence Analysis](http://statmath.wu.ac.at/courses/CAandRelMeth/caipA.pdf) has all the equations.
- [Correspondence analysis](https://cedric.cnam.fr/fichiers/art_3066.pdf) by Hervé Abdi and Michael Béra is great too, although it doesn't only cover CA.
- [L’Analyse Factorielle des Correspondences (AFC)](https://marie-chavent.perso.math.cnrs.fr/wp-content/uploads/2013/10/AFC.pdf) by Marie Chavent is short and sweet.

## Data

You can use correspondence analysis when you have a contingency table. In other words, when you want to analyse the dependency between two categorical variables. For instance, here is a dataset which counts the number of voters per region for each candidate in the 2022 French presidential elections.

In [1]:
import prince

dataset = prince.datasets.load_french_elections()
dataset[['Le Pen', 'Macron', 'Mélenchon', 'Abstention']].head()

candidate,Le Pen,Macron,Mélenchon,Abstention
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Auvergne-Rhône-Alpes,943294,1175085,897434,1228490
Bourgogne-Franche-Comté,409639,394117,277899,456682
Bretagne,385393,647172,407527,543425
Centre-Val de Loire,347845,383851,251259,459528
Corse,42283,26795,19779,90636


☝️ *This dataset is already available as a contingency matrix. It's more common to have at one's disposal a flat dataset. If this is the case, a contigency matrix can be obtained using the `pivot_table` function in `pandas`.*

## Fitting

In [2]:
ca = prince.CA(
    n_components=3,
    n_iter=3,
    copy=True,
    check_input=True,
    engine='sklearn',
    random_state=42
)
ca = ca.fit(dataset)

## Eigenvalues

In [3]:
ca.eigenvalues_summary

Unnamed: 0_level_0,eigenvalue,% of variance,% of variance (cumulative)
component,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.021,40.82%,40.82%
1,0.018,36.15%,76.97%
2,0.005,10.08%,87.04%


## Coordinates

In [4]:
ca.row_coordinates(dataset).head()

Unnamed: 0_level_0,0,1,2
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Auvergne-Rhône-Alpes,-0.058638,0.038303,0.000937
Bourgogne-Franche-Comté,-0.070815,-0.077604,-0.016357
Bretagne,-0.083655,0.110491,-0.058991
Centre-Val de Loire,-0.024624,-0.055799,-0.046167
Corse,0.12737,-0.281755,0.279328


In [5]:
ca.column_coordinates(dataset).head()

Unnamed: 0_level_0,0,1,2
candidate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Arthaud,-0.034732,-0.091291,-0.122722
Dupont-Aignan,-0.094708,-0.064696,-0.023546
Hidalgo,-0.137897,0.052846,0.101351
Jadot,-0.126228,0.188836,-0.031329
Lassalle,-0.271867,-0.091407,0.365112


## Visualization

In [6]:
ca.plot(
    dataset,
    x_component=0,
    y_component=1,
    show_row_markers=True,
    show_column_markers=True,
    show_row_labels=False,
    show_column_labels=False
)

In [7]:
ca.plot(
    dataset,
    x_component=0,
    y_component=1,
    show_row_markers=False,
    show_column_markers=False,
    show_row_labels=False,
    show_column_labels=True
)

## Contributions

In [8]:
ca.row_contributions_.head().style.format('{:.0%}')

Unnamed: 0,0,1,2
Auvergne-Rhône-Alpes,2%,1%,0%
Bourgogne-Franche-Comté,1%,1%,0%
Bretagne,2%,4%,4%
Centre-Val de Loire,0%,1%,2%
Corse,0%,2%,8%


In [9]:
ca.column_contributions_.head().style.format('{:.0%}')

Unnamed: 0,0,1,2
Arthaud,0%,0%,1%
Dupont-Aignan,1%,0%,0%
Hidalgo,1%,0%,3%
Jadot,3%,7%,1%
Lassalle,8%,1%,61%


## Cosine similarities

In [10]:
ca.row_cosine_similarities(dataset).head()

Unnamed: 0_level_0,0,1,2
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Auvergne-Rhône-Alpes,0.568331,0.2425,0.000145
Bourgogne-Franche-Comté,0.365626,0.439086,0.019507
Bretagne,0.212706,0.371061,0.105772
Centre-Val de Loire,0.076356,0.392078,0.268406
Corse,0.066825,0.327001,0.321391


In [11]:
ca.column_cosine_similarities(dataset).head()

Unnamed: 0_level_0,0,1,2
candidate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Arthaud,0.024619,0.170088,0.307375
Dupont-Aignan,0.305277,0.142452,0.018869
Hidalgo,0.292428,0.042947,0.157968
Jadot,0.265642,0.5945,0.016364
Lassalle,0.30704,0.034709,0.553774
