# Statistics of dependance

In many engineering and scientific applications there are multiple variables involved in the issues assessed. In structural engineering, when assessing the health of a structure, we have to take into account the loads imposed as well as the decay of the building materials. In climate science, when trying to study the effect of climate change in agricultural production we have to consider the impact of temperature change, soil moisture and precipitation, amongst others, in vegetation. 

These variables of interest are often "tied" to one another. By imposing loads continously in a bridge, the materials that consist start losing some of their . And as temperature increases, some of the soil moisture evaporates, which might impact precipitation later. How to assess all these complex relations between variables of interest?

### Correlation

A simple way to assess statistically whether two variables are related is their (linear) correlation, which describes how change in respect to one another. One of the most popular ways to calculate the correlation is the Pearson correlation r:

$$
r = \frac{\sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n} (X_i - \bar{X})^2 \sum_{i=1}^{n} (Y_i - \bar{Y})^2}}
$$

Where:
- $X_i$ and  $Y_i$ are the individual data points,
- $\bar{X}$ and $\bar{Y}$ are the means of $X$ and $Y$,
- $n$ is the number of data points.

In [None]:

# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from ipywidgets import interact, FloatSlider

# Enable inline plotting for Jupyter Notebook
%matplotlib inline

# Function to generate and plot correlated data
def plot_correlation(corr_value):
    # Generate data
    np.random.seed(42)
    x = np.random.randn(1000)
    y = corr_value * x + (1 - abs(corr_value)) * np.random.randn(1000)
    
    # Create a DataFrame
    df = pd.DataFrame({'X': x, 'Y': y})
    
    # Calculate and display the correlation coefficient
    correlation = df.corr().iloc[0, 1]
    print(f"Correlation coefficient: {correlation:.2f}")
    
    # Plot the data
    plt.figure(figsize=(8, 6))
    sns.scatterplot(x='X', y='Y', data=df)
    plt.title(f'Scatter Plot with Correlation: {correlation:.2f}')
    plt.xlabel('X')
    plt.ylabel('Y')
    plt.grid(True)
    plt.show()

# Interactive widget to adjust the correlation value
interact(plot_correlation, corr_value=FloatSlider(value=0.5, min=-1, max=1, step=0.1, description='Pearson Correlation'));


interactive(children=(FloatSlider(value=0.5, description='Correlation', max=1.0, min=-1.0), Output()), _dom_cl…

### Covariance

Covariance is a measure of how two variables vary together, so it is a measure of their joint probability. High values of covariance generally mean that there is a strong dependance between variables. The formula for the calculation of covariance is:

$$
\text{Cov}(X, Y) = \mathbb{E}[(X - \mu_X)(Y - \mu_Y)]
$$


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interactive

# Function to generate and plot data with adjustable covariance
def plot_covariance(mean_x=0, mean_y=0, std_x=1, std_y=1, correlation=0):
    np.random.seed(42) 
    x = np.random.normal(mean_x, std_x, 1000)
    y = correlation * x + np.random.normal(mean_y, std_y, 1000) * np.sqrt(1 - correlation**2)

    # Calculate covariance
    covariance = np.cov(x, y)[0, 1]

    # Plotting the scatter plot of X vs Y
    plt.figure(figsize=(8, 6))
    plt.scatter(x, y, alpha=0.5)
    plt.title(f"Scatter Plot (Covariance: {covariance:.2f})", fontsize=14)
    plt.xlabel("X", fontsize=12)
    plt.ylabel("Y", fontsize=12)
    plt.grid(True)
    plt.show()

    return covariance


interactive_plot = interactive(plot_covariance, 
                               mean_x=(-5, 5, 1), 
                               mean_y=(-50, 50, 1), 
                               std_x=(0.1, 5, 0.1), 
                               std_y=(1, 10, 0.5), 
                               correlation=(-1, 1, 0.1))


interactive_plot


interactive(children=(IntSlider(value=0, description='mean_x', max=5, min=-5), IntSlider(value=0, description=…

In this plot we can see how the covariance changes the way the two variables are set as well as their correlation 

### Copulas

While covariance and correlation give us an estimate for the 'relationship' between variables, they exhibit significant limitations. 