# $$\text{Pearson’s correlation coefficient}$$

What is Pearson Correlation?
Correlation between sets of data is a measure of how well they are related. The most common measure of correlation in stats is the Pearson Correlation. The full name is the Pearson Product Moment Correlation (PPMC). It shows the linear relationship between two sets of data. In simple terms, it answers the question, Can I draw a line graph to represent the data? Two letters are used to represent the Pearson correlation: Greek letter rho (ρ) for a population and the letter “r” for a sample.

$$r_{xy}=\frac{s_{xy}}{s_xs_y}$$

$s_{x}$ and $s_{y}$ are the sample standard deviations, and $s_{xy}$ is the sample covariance.

The explicit formula is

$$r_{xy}=\frac{n\Sigma x_iy_i-\Sigma x_i\Sigma y_i}{\sqrt{n\Sigma x_i^2-(\Sigma x_i)^2}\sqrt{n\Sigma y_i^2-(\Sigma y_i)^2})}$$

# $\text{Ejemplo:}$

In [1]:
Edad=[43,21,25,42,57,59]
Nivel_de_Glucosa=[99,65,79,75,87,81];

In [2]:
x=Edad
y=Nivel_de_Glucosa

########################################################

n=length(x)
XIYI=Float64[]
for i in 1:1:n
    xiyi=x[i]*y[i]
    push!(XIYI,xiyi)
end

    Σxiyi=sum(XIYI)
    
    nΣxiyi=n*Σxiyi

    ΣxiΣyi=sum(x)*sum(y)

    P1=nΣxiyi-ΣxiΣyi

########################################################

XI²=Float64[]
for i in 1:1:n
    xi²=x[i].*x[i]
    push!(XI²,xi²)
end

Σxi²=sum(XI²)

nΣxi²=n*Σxi²

Σxi=sum(x)

ΣxiΣxi=Σxi^2

P2=sqrt(nΣxi²-ΣxiΣxi)

########################################################


YI²=Float64[]
for i in 1:1:n
    yi²=y[i].*y[i]
    push!(YI²,yi²)
end

Σyi²=sum(YI²)

nΣyi²=n*Σyi²

Σyi=sum(y)

ΣyiΣyi=Σyi^2

P3=sqrt(nΣyi²-ΣyiΣyi)

########################################################

rxy=P1/(P2*P3)

0.5298089018901744

In [3]:
rxy=round(rxy,4)
print("El coeficiente de correlación es $rxy.")

El coeficiente de correlación es 0.5298.

# $$\text{Intervalos de Confianza de  } r_{xy}$$

In [5]:
r=0.629
n=40
z_crítica=1.96 # --> Lo cuál equivale a 1σ --   intervalo de 68%
# --> Lo cuál equivale a 1σ --   intervalo de 95%

z=0.5*log((1+r)/(1-r))
σz=sqrt(1/(n-3))

z_max=z+(σz*z_crítica)
z_min=z-(σz*z_crítica)

r_max=((exp(2*z_max)-1)/(exp(2*z_max)+1))
r_min=((exp(2*z_min)-1)/(exp(2*z_min)+1))

print("rxy_min=$r_min,rxy_max=$r_max")

rxy_min=0.3948541003238791,rxy_max=0.7864211732076042

## $$\text{END}$$

In [6]:
# Buena bibliografía para esto: onlinestatbook

In [10]:
10^(-1.07823)

0.08351606045692339

In [9]:
10^(-1.46955)

0.033919543597565806