In [2]:
# sauce: https://stats.idre.ucla.edu/r/dae/canonical-correlation-analysis/
library("pacman")
pacman::p_load("CCA", "ggplot2")

For multiple x and y the canonical correlation analysis constructs two variates `CVX1 = a1x1 + a2x2 + a3x3 + … + anxn` and `CVY1 = b1y1 + b2y2 + b3y3 + … + bmym`.  The canonical weights `a1…an` and `b1…bn` are chosen so that they maximize the correlation between the canonical variates `CVX1` and `CVY1`

In [3]:
mm <- read.csv("https://stats.idre.ucla.edu/stat/data/mmreg.csv")
colnames(mm) <- c("Control", "Concept", "Motivation", "Read", "Write", "Math", 
    "Science", "Sex")
summary(mm)

# Get the two inputs for CCA
psych <- mm[, 1:3]
acad <- mm[, 4:8]

    Control            Concept            Motivation          Read     
 Min.   :-2.23000   Min.   :-2.620000   Min.   :0.0000   Min.   :28.3  
 1st Qu.:-0.37250   1st Qu.:-0.300000   1st Qu.:0.3300   1st Qu.:44.2  
 Median : 0.21000   Median : 0.030000   Median :0.6700   Median :52.1  
 Mean   : 0.09653   Mean   : 0.004917   Mean   :0.6608   Mean   :51.9  
 3rd Qu.: 0.51000   3rd Qu.: 0.440000   3rd Qu.:1.0000   3rd Qu.:60.1  
 Max.   : 1.36000   Max.   : 1.190000   Max.   :1.0000   Max.   :76.0  
     Write            Math          Science           Sex       
 Min.   :25.50   Min.   :31.80   Min.   :26.00   Min.   :0.000  
 1st Qu.:44.30   1st Qu.:44.50   1st Qu.:44.40   1st Qu.:0.000  
 Median :54.10   Median :51.30   Median :52.60   Median :1.000  
 Mean   :52.38   Mean   :51.85   Mean   :51.76   Mean   :0.545  
 3rd Qu.:59.90   3rd Qu.:58.38   3rd Qu.:58.65   3rd Qu.:1.000  
 Max.   :67.10   Max.   :75.50   Max.   :74.20   Max.   :1.000  

In [5]:
# correlations
matcor(psych, acad)

Unnamed: 0,Control,Concept,Motivation
Control,1.0,0.1711878,0.2451323
Concept,0.1711878,1.0,0.2885707
Motivation,0.2451323,0.2885707,1.0

Unnamed: 0,Read,Write,Math,Science,Sex
Read,1.0,0.6285909,0.6792757,0.6906929,-0.04174278
Write,0.62859089,1.0,0.6326664,0.5691498,0.24433183
Math,0.67927568,0.6326664,1.0,0.6495261,-0.0482183
Science,0.69069291,0.5691498,0.6495261,1.0,-0.13818587
Sex,-0.04174278,0.2443318,-0.0482183,-0.1381859,1.0

Unnamed: 0,Control,Concept,Motivation,Read,Write,Math,Science,Sex
Control,1.0,0.17118778,0.24513227,0.37356505,0.35887684,0.337269,0.32462694,0.11341075
Concept,0.1711878,1.0,0.28857075,0.06065584,0.01944856,0.0535977,0.06982633,-0.12595132
Motivation,0.2451323,0.28857075,1.0,0.21060992,0.25424818,0.1950135,0.11566948,0.09810277
Read,0.373565,0.06065584,0.21060992,1.0,0.62859089,0.6792757,0.69069291,-0.04174278
Write,0.3588768,0.01944856,0.25424818,0.62859089,1.0,0.6326664,0.56914983,0.24433183
Math,0.337269,0.0535977,0.19501347,0.67927568,0.6326664,1.0,0.64952612,-0.0482183
Science,0.3246269,0.06982633,0.11566948,0.69069291,0.56914983,0.6495261,1.0,-0.13818587
Sex,0.1134108,-0.12595132,0.09810277,-0.04174278,0.24433183,-0.0482183,-0.13818587,1.0


In [6]:
# RUN
cc1 <- cc(psych, acad)
# display the canonical correlations
cc1$cor

In [7]:
# raw canonical coefficients
cc1[3:4]

0,1,2,3
Control,-1.2538339,-0.6214776,-0.6616896
Concept,0.3513499,-1.1876866,0.826721
Motivation,-1.2624204,2.0272641,2.0002283

0,1,2,3
Read,-0.0446206,-0.004910024,0.021380576
Write,-0.035877112,0.042071478,0.091307329
Math,-0.023417185,0.004229478,0.009398182
Science,-0.005025152,-0.085162184,-0.109835014
Sex,-0.632119234,1.084642326,-1.794647036


In [8]:
# compute canonical loadings
cc2 <- comput(psych, acad, cc1)

# display canonical loadings
cc2[3:6]

0,1,2,3
Control,-0.90404631,-0.3896883,-0.1756227
Concept,-0.02084327,-0.7087386,0.7051632
Motivation,-0.56715106,0.3508882,0.7451289

0,1,2,3
Read,-0.3900402,-0.06010654,0.01407661
Write,-0.4067914,0.01086075,0.02647207
Math,-0.3545378,-0.04990916,0.01536585
Science,-0.3055607,-0.1133698,-0.02395489
Sex,-0.1689796,0.12645737,-0.05650916

0,1,2,3
Control,-0.419555307,-0.06527635,-0.0182632
Concept,-0.009673069,-0.11872021,0.07333073
Motivation,-0.26320691,0.05877699,0.07748681

0,1,2,3
Read,-0.840448,-0.35882541,0.1353635
Write,-0.8765429,0.06483674,0.2545608
Math,-0.7639483,-0.29794884,0.1477611
Science,-0.6584139,-0.67679761,-0.2303551
Sex,-0.3641127,0.75492811,-0.5434036


The above correlations are between observed variables and canonical variables which are known as the canonical loadings. These canonical variates are actually a type of latent variable.

In general, the number of canonical dimensions is equal to the number of variables in the smaller set; however, the number of significant dimensions may be even smaller. Canonical dimensions, also known as canonical variates, are latent variables that are analogous to factors obtained in factor analysis. For this particular model there are three canonical dimensions of which only the first two are statistically significant. For statistical test we use R package "CCP"