# Step 3: Correlate RDM Matrices

This notebook correlates two RDM matrices (e.g., BV_CLIP vs THINGS_CLIP) using Spearman correlation.

## Overview

This step:
1. Loads two RDM distance matrices (e.g., BV_CLIP and THINGS_CLIP)
2. Matches categories between matrices if needed
3. Extracts lower triangle (excluding diagonal)
4. Computes Pearson and Spearman correlations
5. Reports correlation statistics

## Prerequisites

This step requires:
- Output normalized and filtered RDMs from Step 2

## Setup and Imports

In [11]:
import numpy as np
import pandas as pd
from pathlib import Path
from scipy.stats import spearmanr, pearsonr, kendalltau
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

print("All imports successful!")

All imports successful!


## Configuration

**Please update the paths below according to your setup:**

In [12]:
# ============================================================================
# CONFIGURATION - UPDATE THESE PATHS FOR YOUR SETUP
# ============================================================================

# Input RDM matrices
INPUT_RDM1_PATH = "./bv_things_comp_12252025/bv_dinov3_filtered_zscored_hierarchical_163cats/distance_matrix_alphabetical.npy"  # First RDM matrix path
INPUT_RDM2_PATH = "./bv_things_comp_12252025/things_dinov3_filtered_zscored_hierarchical_163cats/distance_matrix_alphabetical.npy"  # Second RDM matrix path

# Output directory for correlation results
OUTPUT_DIR = "./bv_things_comp_12252025/correlation_results_12252025"  # Directory to save correlation results
OUTPUT_FILENAME = 'bv_things_dinov3_rdm_correlation_results_163cats_26filtered.txt'  # Output filename

print("Configuration loaded. Please review and update paths as needed.")

Configuration loaded. Please review and update paths as needed.


## Correlate RDM Matrices

In [13]:
print("="*60)
print("CORRELATING RDM MATRICES")
print("="*60)

# Convert paths to Path objects
rdm1_path = Path(INPUT_RDM1_PATH)
rdm2_path = Path(INPUT_RDM2_PATH)

# Check if files exist
if not rdm1_path.exists():
    raise FileNotFoundError(f"Error: {rdm1_path} not found. Please check INPUT_RDM1_PATH.")
if not rdm2_path.exists():
    raise FileNotFoundError(f"Error: {rdm2_path} not found. Please check INPUT_RDM2_PATH.")

# Load matrices
print(f"Loading RDM 1 from {rdm1_path}...")
matrix1 = np.load(rdm1_path)
print(f"  Shape: {matrix1.shape}")

print(f"Loading RDM 2 from {rdm2_path}...")
matrix2 = np.load(rdm2_path)
print(f"  Shape: {matrix2.shape}")

# Check if shapes match
if matrix1.shape != matrix2.shape:
    raise ValueError(f"Matrices have different shapes: {matrix1.shape} vs {matrix2.shape}. "
                     f"Please ensure both matrices have the same dimensions.")

# Extract lower triangle (excluding diagonal)
# k=-1 excludes the diagonal (k=0 would include it)
vec1 = matrix1[np.tril_indices_from(matrix1, k=-1)]
vec2 = matrix2[np.tril_indices_from(matrix2, k=-1)]

print(f"\nExtracted lower triangle (excluding diagonal): {len(vec1)} elements")

# Remove NaN/Inf values
mask = np.isfinite(vec1) & np.isfinite(vec2)
vec1_clean = vec1[mask]
vec2_clean = vec2[mask]

print(f"Valid elements: {len(vec1_clean)} / {len(vec1)}")

# Compute correlations
spearman_r, spearman_p = spearmanr(vec1_clean, vec2_clean)
pearson_r, pearson_p = pearsonr(vec1_clean, vec2_clean)
kendall_r, kendall_p = kendalltau(vec1_clean, vec2_clean)

# Print results
print("\n" + "="*60)
print("CORRELATION RESULTS")
print("="*60)
print(f"Spearman r: {spearman_r:.6f} (p = {spearman_p:.2e})")
print(f"Pearson r:  {pearson_r:.6f} (p = {pearson_p:.2e})")
print(f"Kendall τ: {kendall_r:.6f} (p = {kendall_p:.2e})")
print(f"\nMatrix 1 stats: Mean={vec1_clean.mean():.6f}, Std={vec1_clean.std():.6f}")
print(f"Matrix 2 stats: Mean={vec2_clean.mean():.6f}, Std={vec2_clean.std():.6f}")

# Save results to output directory
output_dir = Path(OUTPUT_DIR)
output_dir.mkdir(parents=True, exist_ok=True)

# Create results text file
results_file = output_dir / OUTPUT_FILENAME
with open(results_file, 'w') as f:
    f.write("="*60 + "\n")
    f.write("RDM MATRIX CORRELATION RESULTS\n")
    f.write("="*60 + "\n\n")
    f.write(f"RDM 1: {rdm1_path}\n")
    f.write(f"RDM 2: {rdm2_path}\n\n")
    f.write(f"Matrix 1 shape: {matrix1.shape}\n")
    f.write(f"Matrix 2 shape: {matrix2.shape}\n\n")
    f.write(f"Extracted lower triangle (excluding diagonal): {len(vec1)} elements\n")
    f.write(f"Valid elements: {len(vec1_clean)} / {len(vec1)}\n\n")
    f.write("="*60 + "\n")
    f.write("CORRELATION RESULTS\n")
    f.write("="*60 + "\n")
    f.write(f"Spearman r: {spearman_r:.6f}\n")
    f.write(f"Spearman p-value: {spearman_p:.2e}\n")
    f.write(f"Pearson r: {pearson_r:.6f}\n")
    f.write(f"Pearson p-value: {pearson_p:.2e}\n")
    f.write(f"Kendall τ: {kendall_r:.6f}\n")
    f.write(f"Kendall p-value: {kendall_p:.2e}\n\n")
    f.write("Matrix Statistics:\n")
    f.write(f"  Matrix 1: Mean={vec1_clean.mean():.6f}, Std={vec1_clean.std():.6f}\n")
    f.write(f"  Matrix 2: Mean={vec2_clean.mean():.6f}, Std={vec2_clean.std():.6f}\n")

print(f"\nResults saved to: {results_file}")

CORRELATING RDM MATRICES
Loading RDM 1 from bv_things_comp_12252025/bv_dinov3_filtered_zscored_hierarchical_163cats/distance_matrix_alphabetical.npy...
  Shape: (163, 163)
Loading RDM 2 from bv_things_comp_12252025/things_dinov3_filtered_zscored_hierarchical_163cats/distance_matrix_alphabetical.npy...
  Shape: (163, 163)

Extracted lower triangle (excluding diagonal): 13203 elements
Valid elements: 13203 / 13203

CORRELATION RESULTS
Spearman r: 0.314168 (p = 2.84e-300)
Pearson r:  0.449798 (p = 0.00e+00)
Kendall τ: 0.214315 (p = 1.33e-298)

Matrix 1 stats: Mean=1.004837, Std=0.188553
Matrix 2 stats: Mean=1.005878, Std=0.066091

Results saved to: bv_things_comp_12252025/correlation_results_12252025/bv_things_dinov3_rdm_correlation_results_163cats_26filtered.txt
