A script trying to implement and make sense of the different similarity metrics

In [1]:
import importlib
import numpy as np
import sklearn
from matplotlib import pyplot as plt
from os.path import join
import os
import seaborn as sns
import lib.utils_RSA as rsa
from lib.algos import *
from scipy.spatial import procrustes as scipro
import lib.utils_CKA as cka

importlib.reload(rsa)
importlib.reload(cka)

<module 'lib.utils_CKA' from '/home/alban/projects/SAYCam_Vs_EGO4D/lib/utils_CKA.py'>

In [2]:
### Load in every activation sets
dataset = 'ecoVal'
models  = ['ego', 'saycam', 'imagenet', 'supervised', 'random', 'resnet']
path2activations = f'/home/alban/Documents/activations_datadriven/%s_{dataset}/'

imagelists = {}
activations = {}
for model in models:
    with open(join(path2activations%model, 'imagepaths.txt'), 'r') as f:
        imagelists[model] = [line.strip() for line in f.readlines()]
    activations[model] = np.load(join(path2activations % model, 'cls_tokens.npy'))

activations[model].shape

(28250, 2048)

In [3]:
### check if images were shown in the same order
imagelists['ego'] == imagelists['saycam']
imagelist = imagelists['ego'] # since they are the same, only consider one list

#### check if each category has the same number of images and list all categories in listcats
count = 0
cat = ''
listcat = list()
for i, imgp in enumerate(imagelist):
    current_cat = imgp.split('/')[7]
    if i == 0:
        cat = current_cat
        listcat.append(current_cat)
    if cat != current_cat:
        cat = current_cat
        listcat.append(current_cat)
        count = 1
    else:
        count += 1

nb_per_cat = count # in val, 50 images per category


In [4]:
### only select one image per category to play with metrics as a toy example
#activations_normalized = {}
for model in models:
    activations[model] = activations[model][::nb_per_cat]
    #activations_normalized[model] = activations[model].copy()
    #activations_normalized[model]

One thing I really want to check and understand is the supposed equivalance of linear CKA and RSA wth L2squared similarity (Cf. Williams, 2024)

In [5]:
### Compute RDMs for several metrics
RDMs = {}
metrics = ['pearson', 'L2', 'L2squared', 'L2_normalize', 'L2squared_normalize']
for i, model in enumerate(models):
    RDMs[model] = {}
    for m, metric in enumerate(metrics):
        RDMs[model][metric] = rsa.compute_RDMs(activations[model], metric = metric, display = False, title = f'{model}_{metric}')

In [6]:
### Compute differences between the different RDMs
for i, metric1 in enumerate(metrics[:-1]):
    for j, metric2 in enumerate(metrics[i+1:]):
        diff = list()
        for model in models:
            diff.append(np.absolute(RDMs[model][metric1] - RDMs[model][metric2]).mean())
        print(f'{metric1} VS {metric2} is {[float(x) for x in diff]}')


pearson VS L2 is [32.50833649008573, 21.752096619386524, 19.497312251897977, 250.87047294266716, 29.655785704871754, 21.271819620116332]
pearson VS L2squared is [1110.4616598258192, 503.07821217899874, 407.1868249234052, 64490.55313873477, 1005.489015213966, 499.64711303729115]
pearson VS L2_normalize is [0.4731603240540338, 0.4941509335263503, 0.49345840299192517, 0.43963581816421127, 0.438706026560215, 0.05448409964206643]
pearson VS L2squared_normalize is [0.7469598067727495, 0.5762094869882597, 0.5928130206757279, 0.8981276257105497, 0.6550420015053967, 0.12124288070142204]
L2 VS L2squared is [1077.9532470703125, 481.3262634277344, 387.6894836425781, 64239.67578125, 975.833251953125, 478.3752746582031]
L2 VS L2_normalize is [32.03517532348633, 21.257944107055664, 19.00385284423828, 250.4308319091797, 29.217079162597656, 21.23788833618164]
L2 VS L2squared_normalize is [31.76137924194336, 21.175886154174805, 18.904499053955078, 249.97230529785156, 29.000741958618164, 21.3850460052490

In absolute values, we find vastly different RDMs. How about in terms of correlations?

In [8]:
### Compute correlaions between the different RDMs
print(models)
for i, metric1 in enumerate(metrics[:-1]):
    for j, metric2 in enumerate(metrics[i+1:]):
        diff = list()
        for model in models:
            diff.append(np.round(np.corrcoef(RDMs[model][metric1].flatten(),RDMs[model][metric2].flatten())[0,1], 3))
        print(f'{metric1} VS {metric2} is {[float(x) for x in diff]}')


['ego', 'saycam', 'imagenet', 'supervised', 'random', 'resnet']
pearson VS L2 is [0.871, 0.893, 0.899, 0.212, 0.982, 0.567]
pearson VS L2squared is [0.849, 0.876, 0.868, 0.13, 1.0, 0.494]
pearson VS L2_normalize is [0.948, 0.963, 0.96, 0.952, 0.982, 0.862]
pearson VS L2squared_normalize is [1.0, 1.0, 1.0, 1.0, 1.0, 0.846]
L2 VS L2squared is [0.959, 0.977, 0.977, 0.984, 0.982, 0.976]
L2 VS L2_normalize is [0.894, 0.891, 0.881, 0.28, 1.0, 0.599]
L2 VS L2squared_normalize is [0.871, 0.893, 0.899, 0.212, 0.982, 0.588]
L2squared VS L2_normalize is [0.787, 0.819, 0.796, 0.161, 0.982, 0.533]
L2squared VS L2squared_normalize is [0.85, 0.876, 0.869, 0.13, 1.0, 0.546]
L2_normalize VS L2squared_normalize is [0.948, 0.963, 0.96, 0.952, 0.982, 0.982]


We find that the RDMs found using the various metrics are all very correlated, except for supervised and the ResNet trained on Saycam (last model) --> thus an effect of training algorithm and architecture. What does that mean?

The highest correlations are found for pearson and L2squared_normaized, showing they are almost perfectly equivalent!

In [42]:
### Compute similarities between models using the various metrics.
sim_metrics = ['cosine', 'pearson', 'center_pearson'] # We only consider cosine and pearson as similarity metrics for now

SIMs = {} # save all similarity values in a dictionary
list_sim = {} # save all similarity values in a list to directly compare with CKA later
for sm, sim_metric in enumerate(sim_metrics):
    if sim_metric == 'pearson':
        center = False
    else:
        center = True
    if sim_metric == 'center_pearson':
        simmetric = 'pearson'
    else:
        simmetric = sim_metric
    SIMs[sim_metric] = {}
    list_sim[sim_metric] = {}
    for m, metric in enumerate(metrics):
        SIMs[sim_metric][metric] = {}
        list_sim[sim_metric][metric] = list()
        for i, model1 in enumerate(models[:-1]):
            SIMs[sim_metric][metric][model1] = {}
            for j, model2 in enumerate(models[i+1:]):
                sim = float(np.round(float(rsa.Compute_sim_RDMs(RDMs[model1][metric], RDMs[model2][metric], center = center, metric = simmetric)), 3))
                SIMs[sim_metric][metric][model1][model2] = sim
                list_sim[sim_metric][metric].append(float(sim))

In [45]:
print('pearson & L2squared')
print(list_sim['pearson']['L2squared'])
print('cosine & L2squared')
print(list_sim['cosine']['L2squared'])
print('centered-pearson & L2squared')
print(list_sim['center_pearson']['L2squared'])

pearson & L2squared
[0.621, 0.379, 0.081, 0.157, 0.443, 0.367, 0.064, 0.116, 0.592, 0.314, 0.095, 0.299, 0.02, 0.025, 0.069]
cosine & L2squared
[0.656, 0.497, 0.253, 0.195, 0.56, 0.559, 0.282, 0.176, 0.685, 0.416, 0.164, 0.496, 0.06, 0.243, 0.149]
centered-pearson & L2squared
[0.656, 0.497, 0.253, 0.195, 0.56, 0.559, 0.282, 0.176, 0.685, 0.416, 0.164, 0.496, 0.06, 0.243, 0.149]


In [26]:
### Perform CKA on the activations
CKA = {} # save all CKA values in a dictionary
list_cka = list() # save all CKA values in a list to directly compare with similarities previously computed
for i, model1 in enumerate(models[:-1]):
    CKA[model1] = {}
    for j, model2 in enumerate(models[i+1:]):
        CKA[model1][model2] = float(np.round(cka.linear_CKA(activations[model1], activations[model2]), 3))
        list_cka.append(CKA[model1][model2])

In [27]:
for sim_metric in sim_metrics:
    for metric in metrics:
        print(f'{sim_metric} {metric}')
        print(np.corrcoef(np.array(list_sim[sim_metric][metric]), np.array(list_cka))[0,1])

cosine pearson
0.9636555055360286
cosine L2
0.9668148947776151
cosine L2squared
0.9687370609694657
cosine L2_normalize
0.9621501145076792
cosine L2squared_normalize
0.9635437065120578
pearson pearson
0.8792020092141342
pearson L2
0.875860689120937
pearson L2squared
0.871332181937399
pearson L2_normalize
0.8930530457901726
pearson L2squared_normalize
0.8835810522621133
center_pearson pearson
0.9634542386173426
center_pearson L2
0.966166233913843
center_pearson L2squared
0.9687370604544859
center_pearson L2_normalize
0.9614215835294484
center_pearson L2squared_normalize
0.9633040745496696


It seems that, indepently of the various similarity measures used here, the resulting similarities are equivalently correlated with a linear CKA, around 0.96.

But what about actual values?

In [28]:
for sim_metric in sim_metrics:
    for metric in metrics:
        print(f'{sim_metric} {metric}')
        print(np.mean(np.absolute(np.array(list_sim[sim_metric][metric]) - np.array(list_cka))))

cosine pearson
0.051600001076857256
cosine L2
0.06040001014868419
cosine L2squared
0.06953333293398221
cosine L2_normalize
0.049000002443790436
cosine L2squared_normalize
0.052133337656656904
pearson pearson
0.09280000101327895
pearson L2
0.17540000250736873
pearson L2squared
0.18613333584070207
pearson L2_normalize
0.08853333523670834
pearson L2squared_normalize
0.09406666799783706
center_pearson pearson
0.051666667743523916
center_pearson L2
0.060933335236708325
center_pearson L2squared
0.06953333584070205
center_pearson L2_normalize
0.049533334410190595
center_pearson L2squared_normalize
0.05240000107685726


In [12]:
### Run custom procrustes analysis
#d, Z, T = procrustes(activations['saycam'], activations['ego'])
### Run scipy procrustes analysis as a control
#mtx1, mtx2, dsci = scipro(activations['saycam'], activations['ego'])
#print([d, dsci])
### --> Both algos agree with each other, and the disparity measures are pretty high (somewhat unexpectedly)