# <center><b> Correlaciones de Variables <br> Continuas vs Continuas </b></center>

## Cuadro Resumen

| Correlación                                | Rango   | Tipo de Relación Capturada    | Sensibilidad a Outliers | Diferencial                                                       |
| ------------------------------------------ | ------- | ----------------------------- | ----------------------- | ----------------------------------------------------------------- |
| **Pearson**                                | [-1, 1] | Lineal                        | **Alta**                | Mide asociación lineal pura; óptima bajo normalidad               |
| **Spearman (ρ)**                           | [-1, 1] | Monótona (lineal o no)        | Media                   | Pearson aplicado a rangos; capta relaciones monótonas no lineales |
| **Kendall τ-b**                            | [-1, 1] | Monótona                      | Baja–Media              | Basada en concordancias/discordancias; mejor con empates          |
| **Kendall τ-c**                            | [-1, 1] | Monótona                      | Baja–Media              | Variante para tablas no cuadradas; menos usada en continuas       |
| **Distance Correlation**                   | [0, 1]  | Dependencia general           | Media–Alta              | Es 0 **si y solo si** hay independencia                           |
| **Hoeffding’s D**                          | ≈[0, 1] | Dependencia general           | Media                   | Detecta cualquier desviación de independencia                     |
| **Blomqvist’s Beta**                       | [-1, 1] | Dependencia central           | **Alta**                | Usa solo la mediana; ignora colas                                 |
| **MIC (Maximal Information Coefficient)**  | [0, 1]  | Dependencia funcional general | Media                   | Busca la mejor grilla; invariante a transformaciones monotónicas  |
| **Biweight Midcorrelation**                | [-1, 1] | Lineal (robusta)              | **Baja**                | Pearson robusto basado en pesos                                   |
| **Winsorized Correlation**                 | [-1, 1] | Lineal (robusta)              | Baja–Media              | Recorta extremos antes de calcular Pearson                        |
| **Correlación Parcial**                    | [-1, 1] | Lineal condicional            | Media                   | Relación entre X–Y controlando otras variables                    |
| **Copula-based Correlation (τ)**           | [-1, 1] | Monótona                      | Baja                    | Dependencia marginal-free vía copulas                             |
| **Quadrant Correlation (Q)**               | [-1, 1] | Monótona gruesa               | **Muy baja**            | Usa solo signos respecto a la mediana                             |
| **Percentage Bend Correlation (percbend)** | [-1, 1] | Lineal (robusta)              | **Muy baja**            | Limita la influencia de valores extremos                          |
| **Shepherd’s Pi**                          | [-1, 1] | Lineal (robusta)              | **Muy baja**            | Elimina outliers multivariados antes del cálculo                  |
| **Skipped Correlation**                    | [-1, 1] | Lineal (robusta)              | **Muy baja**            | Detecta y omite outliers automáticamente                          |


## Librerías

In [4]:
# pip install numpy==2.2.2 --force-reinstall

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import pearsonr
from scipy.stats import spearmanr
from scipy.stats import kendalltau
import dcor
import pingouin as pg
from scipy.stats import rankdata
from obscure_stats.association import blomqvist_beta
from itertools import combinations
from scipy.stats.mstats import winsorize
from copulas.bivariate import Clayton, Frank
from statsmodels.distributions.empirical_distribution import ECDF

import matplotlib.pyplot as plt, tempfile, base64, os
from matplotlib.animation import FuncAnimation, PillowWriter
from IPython.display import HTML

## Funciones

In [32]:
def hoeffding_d_vec(x, y):
    if isinstance(x, pd.Series): x = x.values
    if isinstance(y, pd.Series): y = y.values
    R, S = rankdata(x), rankdata(y)
    N, Q = len(R), np.zeros(len(R))
    for i in range(N):
        Q[i] = 1 + sum(np.logical_and(R<R[i], S<S[i])) + 0.25*(sum(np.logical_and(R==R[i], S==S[i]))-1) + 0.5*sum(np.logical_and(R==R[i], S<S[i])) + 0.5*sum(np.logical_and(R<R[i], S==S[i]))
    D1 = sum((Q-1)*(Q-2))
    D2 = sum((R-1)*(R-2)*(S-1)*(S-2))
    D3 = sum((R-2)*(S-2)*(Q-1))
    return 30*((N-2)*(N-3)*D1 + D2 - 2*(N-2)*D3)/(N*(N-1)*(N-2)*(N-3)*(N-4))

def mic_corr_aprox(x, y, max_bins=30):
    x, y = np.asarray(x), np.asarray(y)
    best = 0.0

    for bx in range(2, max_bins+1):
        qx = np.quantile(x, np.linspace(0,1,bx+1))
        for by in [2,3,4]:
            qy = np.quantile(y, np.linspace(0,1,by+1))

            hist, _, _ = np.histogram2d(x, y, bins=[qx, qy])
            pxy = hist / hist.sum()
            px = pxy.sum(axis=1)
            py = pxy.sum(axis=0)

            mi = 0.0
            for i in range(bx):
                for j in range(by):
                    if pxy[i,j] > 0:
                        mi += pxy[i,j] * np.log2(pxy[i,j] / (px[i]*py[j]))

            mi /= np.log2(min(bx,by))
            best = max(best, mi)

    return min(best,1.0)

def quadrant_correlation(x, y):
    x, y = np.array(x), np.array(y)
    mx, my = np.median(x), np.median(y)
    sx = np.sign(x - mx)
    sy = np.sign(y - my)
    Q = np.sum(sx * sy) / len(x)
    return Q

def animar_correlaciones_final(metric_fun, metric_name="Metric", color="tab:blue",
                               n_frames=60, n_points=160, max_dispersion=1.2,
                               outlier_frac=0.2, figsize=(9,6),
                               duration=300, random_seed=42):

    np.random.seed(random_seed)
    x = np.linspace(-3,3,n_points)
    y_mono = 3 * (1 - np.exp(-1.8*(x+3))) / (1 - np.exp(-1.8*6)) - 3
    y_mono = (y_mono - y_mono.min()) / (y_mono.max() - y_mono.min()) * 5.6 - 3
    relaciones = {
        "Lineal": x,
        "Monótona": y_mono,
        "Curva": - (x/1.4)**2 + 2
    }
    ruido_levels = max_dispersion * np.exp(-np.linspace(0,4,n_frames))
    fig, axes = plt.subplots(2,3,figsize=figsize)
    plt.subplots_adjust(wspace=0.25,hspace=0.35)
    scatters, labels = [], []
    for i,(k,y_true) in enumerate(relaciones.items()):
        for j in range(2):
            ax = axes[j,i]
            ax.set_xlim(x.min()-0.5,x.max()+0.5)
            ax.set_ylim(-3.5,3.5)
            ax.plot([x.min()-0.5,x.max()+0.5],
                    [x.min()-0.5,x.max()+0.5],'--',color='gray',lw=1)
            title = f"{k}" + ("" if j==0 else " + outliers")
            ax.set_title(title)
            sc = ax.scatter([],[],s=14,color=color,alpha=0.85)
            scatters.append((sc,y_true,title))
            labels.append(
                ax.text(0.02,0.88,'',
                        transform=ax.transAxes,
                        fontsize=10,
                        color=color,
                        fontweight='bold')
            )
            ax.text(0.02,0.95,metric_name,
                    transform=ax.transAxes,
                    fontsize=9,
                    color=color)
    def update(i):
        out=[]
        for idx,(sc,y_true,title) in enumerate(scatters):
            y = y_true + np.random.normal(0, ruido_levels[i], size=y_true.shape)
            if "outliers" in title:
                n_out = int(len(y)*outlier_frac)
                idxs = np.random.choice(len(y), n_out, replace=False)
                y[idxs] += np.random.uniform(-6,6,n_out)
            sc.set_offsets(np.c_[x,y])
            out.append(sc)
            mf = metric_fun(x, y)
            val = mf[0] if isinstance(mf,(tuple,list,np.ndarray)) else mf
            labels[idx].set_text(f"{val:.2f}")
            out.append(labels[idx])
        return out
    anim = FuncAnimation(fig, update, frames=n_frames,
                         interval=duration, blit=True)
    tmp = tempfile.NamedTemporaryFile(delete=False,suffix=".gif")
    tmp.close()
    anim.save(tmp.name, writer=PillowWriter(fps=1000/duration))
    plt.close(fig)
    with open(tmp.name,"rb") as f:
        b64 = base64.b64encode(f.read()).decode("utf-8")
    os.unlink(tmp.name)
    return HTML(f'<img src="data:image/gif;base64,{b64}">')

## Datos

In [3]:
np.random.seed(2)

mu = [0, 0, 0]
sigma = [
    [1.00, 0.75, 0.88],
    [0.75, 1.00, 0.86],
    [0.88, 0.86, 1.00]
]

X1, X2, X3 = np.random.multivariate_normal(mu, sigma, size=1000).T
df = pd.DataFrame({'X1': X1, 'X2': X2, 'X3': X3})
df.head(3)

Unnamed: 0,X1,X2,X3
0,0.666247,0.639946,-0.104837
1,-2.015213,-0.759732,-1.841398
2,-0.741239,0.111497,-0.773606


## Correlación de Pearson

In [4]:
r, p_value = pearsonr(df['X1'], df['X2'])
print("Coeficiente de Pearson:", r)
print("Valor p:", p_value)

Coeficiente de Pearson: 0.7632516610658809
Valor p: 1.5823679175509708e-191


In [77]:
animar_correlaciones_final(metric_fun=pearsonr,metric_name = 'Pearson',color = 'tab:blue', n_frames=100, duration=300)

## Correlación de Spearman

In [5]:
rho, p_value = spearmanr(df['X1'], df['X2'])
print("Coeficiente de Spearman:", rho)
print("Valor p:", p_value)

Coeficiente de Spearman: 0.7482076242076242
Valor p: 5.0357046752749395e-180


In [78]:
animar_correlaciones_final(metric_fun=spearmanr, metric_name = 'Spearman',color = '#8B0000', n_frames=100, duration=300)

## Correlación de Kendall

### Tau - B

In [12]:
tau, p_value = kendalltau(df['X1'], df['X2'], variant='b')
print("Coeficiente Kendall tau-b:", tau)
print("Valor p:", p_value)

Coeficiente Kendall tau-b: 0.5516676676676677
Valor p: 2.0415305464214838e-150


In [79]:
def kendall_tauB(x,y):
    tau, p_value = kendalltau(x, y, variant='b')
    return tau

animar_correlaciones_final(metric_fun=kendall_tauB, metric_name = 'Kendall B',color = '#2ECC71', n_frames=100, duration=300)

### Tau - C

In [14]:
tau, p_value = kendalltau(df['X1'], df['X2'], variant='c')
print("Coeficiente Kendall tau-c:", tau)
print("Valor p:", p_value)

Coeficiente Kendall tau-c: 0.5516676676676676
Valor p: 2.0415305464214838e-150


In [80]:
def kendall_tauC(x,y):
    tau, p_value = kendalltau(x, y, variant='c')
    return tau

animar_correlaciones_final(metric_fun=kendall_tauC, metric_name = 'Kendall C',color = '#E67E22', n_frames=100, duration=300)

## Correlación de Distancia (dCor)

In [16]:
dcor_value = dcor.distance_correlation(df['X1'], df['X2'])
print("Distance Correlation:", dcor_value)

Distance Correlation: 0.7157479624134814


In [81]:
animar_correlaciones_final(metric_fun=dcor.distance_correlation, metric_name = 'dCor',color = '#9B59B6', n_frames=100, duration=300)

## Correlación de Hoeffding’s D

In [18]:
Hoeffding = hoeffding_d_vec(df['X1'], df['X2'])
print(f"Hoeffding's D: {Hoeffding}")

Hoeffding's D: 0.2199775410756565


In [82]:
animar_correlaciones_final(metric_fun=hoeffding_d_vec, metric_name = 'Hoeffding’s',color = '#1ABC9C', n_frames=100, duration=300)

## Correlación de Blomqvist’s Beta

In [21]:
beta = blomqvist_beta(df['X1'], df['X2'])
print(f"Blomqvist's Beta: {beta}")

Blomqvist's Beta: 0.54


In [83]:
animar_correlaciones_final(metric_fun=blomqvist_beta, metric_name = 'Blomqvist’s',color = '#E84393', n_frames=100, duration=300)

## Coeficiente de Información Máxima (MIC)

In [33]:
mic_rest = mic_corr_aprox(df['X1'], df['X2'], max_bins=20) 
print("MIC (aprox.):", mic_rest)

MIC (aprox.): 0.35145314066189115


In [84]:
animar_correlaciones_final(metric_fun=mic_corr_aprox, metric_name = 'MIC Aprox',color = '#34495E', n_frames=100, duration=300)

## Biweight Midcorrelation

In [35]:
result = pg.corr(x=df['X1'], y=df['X2'], method='bicor')
print("Biweight Midcorrelation: ")
print(result)

Biweight Midcorrelation: 
          n         r         CI95%          p-val  power
bicor  1000  0.752515  [0.72, 0.78]  3.117442e-183    1.0


In [85]:
def Biweight_corr(x,y):
    result = pg.corr(x, y, method='bicor')
    return result['r'].values

animar_correlaciones_final(metric_fun=Biweight_corr, metric_name = 'Biweight',color = '#27AE60', n_frames=100, duration=300)

## Winsorized Correlation

In [38]:
x_win = winsorize(df['X1'].copy(), limits=[0.05, 0.05])
y_win = winsorize(df['X2'].copy(), limits=[0.05, 0.05])

corr, p_value = pearsonr(x_win, y_win)
print("Winsorized correlation:", corr)
print("p-value:", p_value)

Winsorized correlation: 0.7490846927391712
p-value: 1.1325460403437327e-180


In [86]:
def Winsorized_corr(x,y):
    x_win = winsorize(x.copy(), limits=[0.05, 0.05])
    y_win = winsorize(y.copy(), limits=[0.05, 0.05])
    corr, p_value = pearsonr(x_win, y_win)
    return corr

animar_correlaciones_final(metric_fun=Winsorized_corr, metric_name = 'Winsorized Corr',color = '#D35400', n_frames=100, duration=300)

## Correlación Parcial

In [40]:
pcorr = pg.partial_corr(data=df, x='X1', y='X2', covar=['X3'])
print(pcorr)

            n         r          CI95%     p-val
pearson  1000  0.001563  [-0.06, 0.06]  0.960652


## Copula-based Correlation

In [46]:
u = ECDF(df['X1'])(df['X1'])
v = ECDF(df['X2'])(df['X2'])
data = np.column_stack([u, v])

copula = Frank()
copula.fit(data)
print("Copula-based:", copula.tau)

Copula-based: 1.0


In [87]:
def Copula_based_corr(x, y):
    u = ECDF(x)(x)
    v = ECDF(y)(y)
    data = np.column_stack([u, v])
    copula = Frank()
    copula.fit(data)
    return copula.tau

animar_correlaciones_final(metric_fun=Copula_based_corr, metric_name = 'Copula-based',color = '#F39C12', n_frames=100, duration=300)

## Quadrant Correlation (Q)

In [57]:
quadrant_correlation(df['X1'], df['X2'])

np.float64(1.0)

In [88]:
animar_correlaciones_final(metric_fun=quadrant_correlation, metric_name = 'Quadrant Corr',color = '#16A085', n_frames=100, duration=300)

## Correlación del Porcentaje de Curvatura

In [59]:
res = pg.corr(df['X1'], df['X2'], method='percbend', beta=0.2)
print("Percentage bend correlation")
print(res)

Percentage bend correlation
             n         r       CI95%  p-val  power
percbend  1000  0.999686  [1.0, 1.0]    0.0    1.0


In [89]:
def percbend_corr(x,y):
    result = pg.corr(x, y, method='percbend')
    return result['r'].values

animar_correlaciones_final(metric_fun=percbend_corr, metric_name = 'Porcent Curv',color = '#C0392B', n_frames=100, duration=300)

## Correlación Pi de Shepherd

In [61]:
res = pg.corr(df['X1'], df['X2'], method='shepherd')
print("Shepherd’s pi correlation")
print(res)

Shepherd’s pi correlation
             n  outliers    r       CI95%  p-val  power
shepherd  1000        54  1.0  [1.0, 1.0]    0.0    1.0


In [90]:
def shepherd_corr(x,y):
    result = pg.corr(x, y, method='shepherd')
    return result['r'].values

animar_correlaciones_final(metric_fun=shepherd_corr, metric_name = 'Shepherd',color = '#7F8C8D', n_frames=100, duration=300)

## Correlación Omitida

In [63]:
res = pg.corr(df['X1'], df['X2'], method='skipped')
print("Skipped correlation")
print(res)

Skipped correlation
            n  outliers    r       CI95%  p-val  power
skipped  1000        19  1.0  [1.0, 1.0]    0.0      1




In [91]:
import warnings
warnings.filterwarnings("ignore")

def skipped_corr(x,y):
    result = pg.corr(x, y, method='skipped')
    return result['r'].values

animar_correlaciones_final(metric_fun=skipped_corr,metric_name = 'Corr Omitida',color = '#A04000', n_frames=100, duration=300)