+++
title = "Multiple factor analysis"
menu = "main"
weight = 4
toc = true
aliases = ["mfa"]
+++

## Resources

- [*Multiple Factor Analysis* by Hervé Abdi](https://www.utdallas.edu/~herve/Abdi-MFA2007-pretty.pdf)
- [*Multiple Factor Analysis: main features and application to sensory data* by Jérôme Pagès](http://factominer.free.fr/more/PagesAFM.pdf)

## Data

Multiple factor analysis (MFA) is meant to be used when you have groups of variables. In practice, it builds a PCA on each group. It then fits a global PCA on the results of the so-called partial PCAs.

The dataset used in the following example come from [this paper](https://www.utdallas.edu/~herve/Abdi-MFA2007-pretty.pdf). In the dataset, three experts give their opinion on six different wines. Each opinion for each wine is recorded as a variable. We thus want to consider the separate opinions of each expert whilst also having a global overview of each wine. MFA is the perfect fit for this kind of situation.

In [1]:
import prince

dataset = prince.datasets.load_premier_league()
dataset

Unnamed: 0_level_0,2021-22,2021-22,2021-22,2021-22,2021-22,2021-22,2022-23,2022-23,2022-23,2022-23,2022-23,2022-23,2023-24,2023-24,2023-24,2023-24,2023-24,2023-24
Unnamed: 0_level_1,W,D,L,GF,GA,Pts,W,D,L,GF,GA,Pts,W,D,L,GF,GA,Pts
Team,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2
Arsenal,22,3,13,61,48,69,26,6,6,88,43,84,28,5,5,91,29,89
Aston Villa,13,6,19,52,54,45,18,7,13,51,46,61,20,8,10,76,61,68
Brentford,13,7,18,48,56,46,15,14,9,58,46,59,10,9,19,56,65,39
Brighton & Hove Albion,12,15,11,42,44,51,18,8,12,72,53,62,12,12,14,55,62,48
Chelsea,21,11,6,76,33,74,11,11,16,38,47,44,18,9,11,77,63,63
Crystal Palace,11,15,12,50,46,48,11,12,15,40,49,45,13,10,15,57,58,49
Everton,11,6,21,43,66,39,8,12,18,34,57,36,13,9,16,40,51,40
Liverpool,28,8,2,94,26,92,19,10,9,75,47,67,24,10,4,86,41,82
Manchester City,29,6,3,99,26,93,28,5,5,94,33,89,28,7,3,96,34,91
Manchester United,16,10,12,57,57,58,23,6,9,58,43,75,18,6,14,57,58,60


In [2]:
isinstance(dataset.columns, pd.MultiIndex)

True

## Fitting

The groups are specified by the `groups` argument when calling `fit`.

In [3]:
groups = dataset.columns.levels[0].tolist()
groups

['2021-22', '2022-23', '2023-24']

In [4]:
mfa = prince.MFA(
    n_components=3,
    n_iter=3,
    copy=True,
    check_input=True,
    engine='sklearn',
    random_state=42
)
mfa = mfa.fit(
    dataset,
    groups=groups,
    supplementary_groups=None
)

There are several ways to specify the groups:

- If the columns of the dataframe are a `MultiIndex`:
   - By default the groups are all the columns in the first level.
   - You can also pass a list with a subset of the columns in the first level.
- You can also pass a dict that maps group names to the desired columns.

The `supplementary_groups` argument is expected to be a list with one more existing group names.

## Eigenvalues

In [5]:
mfa.eigenvalues_summary

Unnamed: 0_level_0,eigenvalue,% of variance,% of variance (cumulative)
component,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,2.376,59.53%,59.53%
1,0.619,15.51%,75.04%
2,0.412,10.32%,85.36%


## Coordinates

The `MFA` inherits from the `PCA` class, which means it provides access to the `PCA` methods and properties. For instance, the `row_coordinates` method will return the global coordinates of each wine.

In [6]:
mfa.row_coordinates(dataset)

component,0,1,2
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Arsenal,2.236971,1.034584,0.697651
Aston Villa,-0.179988,0.580297,0.463962
Brentford,-1.267447,0.696757,-0.490607
Brighton & Hove Albion,-0.800062,-0.248918,-0.904603
Chelsea,0.000108,-1.253858,-0.365442
Crystal Palace,-1.325908,-0.410853,-0.809261
Everton,-2.089219,0.184291,0.55233
Liverpool,2.063236,-1.170222,-0.419547
Manchester City,3.393773,-0.160572,-0.15116
Manchester United,0.189448,0.753614,-0.007898


However, all the other methods are not implemented yet. They will raise a `NotImplemented` exception if you call them.

In [7]:
mfa.group_row_coordinates(dataset)

group,2021-22,2021-22,2021-22,2022-23,2022-23,2022-23,2023-24,2023-24,2023-24
component,0,1,2,0,1,2,0,1,2
Team,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
Arsenal,2.582726,-0.222694,5.302243,9.375186,8.365184,-3.088474,13.152021,3.470683,5.617342
Aston Villa,-4.508285,6.76279,3.360487,0.425462,-0.131248,1.388393,2.062462,-0.117726,0.459079
Brentford,-4.824699,6.831446,2.321044,-0.913798,1.65643,-5.108336,-8.488562,-0.666794,-2.719762
Brighton & Hove Albion,-3.836427,0.863534,-6.755643,1.232951,0.108793,1.357369,-6.377201,-3.766425,-4.755869
Chelsea,5.327119,-8.45477,-3.978688,-5.636604,-4.60459,0.880535,0.3107,-1.015164,-1.003918
Crystal Palace,-4.139202,1.363018,-6.617784,-5.658228,-4.295532,0.216513,-5.085854,-1.679302,-2.682664
Everton,-7.578572,11.276745,3.99624,-9.227286,-8.594615,3.749277,-6.645565,-0.613473,-1.545628
Liverpool,11.734074,-14.796924,-1.851494,2.978462,3.350864,-2.858228,8.447234,-1.689651,0.000318
Manchester City,12.520592,-14.730261,0.218117,12.36563,11.578338,-5.561443,13.208791,1.34951,3.646564
Manchester United,-1.730052,2.061911,-1.45246,4.946706,4.456987,-0.769662,-1.090109,1.940399,2.133472


## Visualization

In [9]:
mfa.plot(
    dataset,
    x_component=0,
    y_component=1
)

The first axis explains most of the difference between the wine ratings. This difference is actually due to the oak type of the barrels they were fermented in.

## Partial PCAs

An MFA is essentially a PCA applied to the outputs of partial PCA. Indeed, a PCA is first fitted to each group. A partial PCA can be accessed as so:

In [20]:
dataset['2022-23']

Unnamed: 0_level_0,W,D,L,GF,GA,Pts
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Arsenal,26,6,6,88,43,84
Aston Villa,18,7,13,51,46,61
Brentford,15,14,9,58,46,59
Brighton & Hove Albion,18,8,12,72,53,62
Chelsea,11,11,16,38,47,44
Crystal Palace,11,12,15,40,49,45
Everton,8,12,18,34,57,36
Liverpool,19,10,9,75,47,67
Manchester City,28,5,5,94,33,89
Manchester United,23,6,9,58,43,75


In [21]:
mfa['2022-23'].eigenvalues_summary

Unnamed: 0_level_0,eigenvalue,% of variance,% of variance (cumulative)
component,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,4.374,72.89%,72.89%
1,1.245,20.74%,93.64%
2,0.32,5.34%,98.97%
