We will use two packages: FactoMineR  and factoextra . FactoMineR is used for the analysis and use factoextra for ggplot2-based visualization.

In [41]:
library(FactoMineR)
library(ggplot2)
library(factoextra)

We’ll use the demo data sets decathlon from the factoextra package.

In [None]:
data(decathlon)
head(decathlon)

We start by subsetting active individuals and active variables for the principal component analysis.

In [None]:
decathlon.active = decathlon[1:23, 1:10]
head(decathlon.active)

The R code below, computes principal component analysis on the active individuals/variables. The PCA function also does the standardisation of the data automatically.

In [None]:
res.pca = PCA(decathlon.active, graph = FALSE)
res.pca

The eigenvalues measure the amount of variation retained by each principal component. Eigenvalues are large for the first PCs and small for the subsequent PCs. That is, the first PCs corresponds to the directions with the maximum amount of variation in the data set.

We examine the eigenvalues to determine the number of principal components to be considered. The eigenvalues and the proportion of variances (i.e., information) retained by the principal components (PCs) can be extracted using the function get_eigenvalue().

In [None]:
eig.val = get_eigenvalue(res.pca)
eig.val

The scree plot tells us that which dimension is more important. Generally, variables with eigenvalues > 1 are considered important.

In [None]:
fviz_eig(res.pca, addLabels = TRUE, ylim = c(0,50))

A simple method to extract the results, for variables, from a PCA output is to use the function get_pca_var(). This function provides a list of matrices containing all the results for the active variables (coordinates, correlation between variables and axes, squared cosine and contributions)

In [None]:
var = get_pca_var(res.pca)
var

var$coord: coordinates of variables to create a scatter plot

In [None]:
#Co-ordinates
head(var$coord)

var$cos2: represents the quality of representation for variables on the factor map. It’s calculated as the squared coordinates: var.cos2 = var.coord * var.coord.

In [None]:
#Cos2 : quality on the factor map
head(var$cos2)

var$contrib: contains the contributions (in percentage) of the variables to the principal components. The contribution of a variable (var) to a given principal component is (in percentage) : (var.cos2 * 100) / (total cos2 of the component).

In [None]:
#Contributions to the principle components
head(var$contrib)

The correlation between a variable and a principal component (PC) is used as the coordinates of the variable on the PC. The representation of variables differs from the plot of the observations: The observations are represented by their projections, but the variables are represented by their correlations.

In [None]:
#Correlation circle
head(var$coord,4)
fviz_pca_var(res.pca, col.var = 'blue', repel = TRUE)

The cos2 of variables on all the dimensions can be visualised using the corrplot package. Also, a bar plot of variables cos2 can be created using the function fviz_cos2()

In [None]:
#Quality of representation
head(var$cos2,4)
library('corrplot')
corrplot(var$cos2, is.corr = FALSE)
fviz_cos2(res.pca, choice = 'var', axes = 1:2)

Transparency of the variables according to their cos2 values can be changed.

In [None]:
#Change the transparency
fviz_pca_var(res.pca, alpha.var = 'cos2')

The contribution of the variables can be extracted as follows:

In [None]:
#Contibution of of variables to princcple components
head(var$contrib,4)
corrplot(var$contrib, is.corr = FALSE)

A bar plot which demonstrates the contribution of each variable to the first principle component.

In [None]:
#Contribution of variable to PC1
fviz_contrib(res.pca, choice = 'var', axes = 1, top = 10)

A bar plot which demonstrates the contribution of each variable to the second principle component.

In [None]:
#Contribution of variable to PC2
fviz_contrib(res.pca, choice = 'var', axes = 2, top = 10)

A bar plot which demonstrates the contribution of each variable to the first two principle components.

In [None]:
#Contribution to PC1 and PC2
fviz_contrib(res.pca, choice = 'var', axes = 1:2, top = 10)

he most important (or, contributing) variables can be highlighted on the correlation plot as :

In [None]:
fviz_pca_var(res.pca, col.var = 'contrib',
            gradient.cols = c('#00AFBB', '#E7B800', '#FC4E07'),
            repel = TRUE)

We can colour the variables by any custom continuous variable. But the coloring variable should have the same length as the number of active variables in the PCA (here n = 10).

In [None]:
#Colour by a custom continuous variable
set.seed(42)
my.cont.var = rnorm(10)
fviz_pca_var(res.pca, col.var = my.cont.var,
            gradient.cols = c("blue","yellow","red"),
            legend.title = "Cont.var",
            repel = TRUE)

We can also change the color of variables by groups defined by a qualitative/categorical variable. Since our dataset doesn't contain groups, we will cluster the dataset using k-means algorithm.

In [None]:
#Colour by groups(Clustering)
set.seed(42)
res.km = kmeans(var$coord, centers =3, nstart = 25)
grp = as.factor(res.km$cluster)
fviz_pca_var(res.pca, col.var = grp,
            palette = c("#0073C2FF", "#EFC000FF", "#868686FF"),
            legend.title = "Cluster",
            repel = TRUE)

Dimension description can be used to identify the most significantly associated variables with a given principal component.

In [None]:
#Dimension Decription
res.desc = dimdesc(res.pca, axes = c(1,2), proba = 0.05)
#description of dimension 1
res.desc$Dim.1
#description of dimension 2
res.desc$Dim.2

The results, for individuals can be extracted using the function get_pca_ind()

In [None]:
#Graph of individuals
ind = get_pca_ind(res.pca)
ind

In [None]:
#Coordinates of individuals
head(ind$coord)

In [None]:
#Quality if the individuals
head(ind$cos2)

In [None]:
#Contribution of the individuals
head(ind$contrib)

In [None]:
#A Simple Plot
fviz_pca_ind(res.pca, repel = TRUE)

In [None]:
#coloured plot
fviz_pca_ind(res.pca, col.ind = "cos2",
            gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
            repel = TRUE)

In [None]:
#Change the point size
fviz_pca_ind(res.pca, pointsize = "cos2",
            pointshape = 21, fill = "#E7B800",
            repel = TRUE)

In [None]:
#Change both point size and colour
fviz_pca_ind(res.pca, col.ind = "cos2", pointsize = "cos2",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE
             )

In [None]:
#Bar plot
fviz_cos2(res.pca, choice = "ind")

In [None]:
#Total contribution to Dim 1 and Dim 2
fviz_contrib(res.pca, choice = 'ind', axes = 1:2)

In [None]:
#Colour by a custom continuous variable
set.seed(42)
my.rand.var = rnorm(23)
fviz_pca_ind(res.pca, col.ind = my.rand.var,
            gradient.cols = c("blue", "yellow", "red"),
            legend.title = "Cont.Var",
            Repel = TRUE)

In [None]:
#Supplementary elements
res.pca = PCA(decathlon, ind.sup = 24:27,
             quanti.sup = 11:12, quali.sup = 13, graph = FALSE)
res.pca
res.pca$quanti.sup
fviz_pca_var(res.pca, repel = TRUE)

In [None]:
#Customised plot(colours changed)
fviz_pca_var(res.pca,
            col.var = "black",
            col.quanti.sup = "red", repel = TRUE)

In [None]:
#show only supplementary variables
fviz_pca_var(res.pca, invisible = "var") 

In [None]:
 #Hide supplementary variables
fviz_pca_var(res.pca, invisible = "quanti.sup") #Hide supplementary variables


In [None]:
#Addition of individuals
p = fviz_pca_ind(res.pca, col.ind.sup = "blue", repel = TRUE)
p = fviz_add(p, res.pca$quali.sup$coord, color = "red")
p

In [None]:
res.pca$quali
fviz_pca_ind(res.pca, habillage = 13,
            addEllipses = TRUE, ellipse.type = "confidence",
            palette = "jco", repel = TRUE)