# Parametric vs. non-parametric

### Testing for Normality

In [None]:
import scipy.stats as stats 
import matplotlib.pyplot as plt 
import seaborn as sns

def check_normality(data, alpha=0.05):
    # Visual check - Q-Q plot
    stats.probplot(data, dist="norm", plot=plt)
    plt.title("Q-Q plot")
    plt.show()
    
    # Visual check - histogram
    sns.histplot(data, kde=True)
    plt.title("Distribution with KDE")
    plt.show()
    
    # Statistical test - Shapiro-Wilk
    statistic, p_value = stats.shapiro(data)
    print(f"Shapiro-Wilk test: p-value = {p_value:.4f}")

This function combines three powerful methods for checking normality:

The Q-Q plot provides a visual way to assess normality. Think of it as comparing your data's distribution to a perfect normal distribution. If the points follow a straight diagonal line, your data is approximately normal. Any systematic deviations from this line suggest non-normality (Outlier data points are easily highlighted via this method).

The histogram with KDE (Kernel Density Estimation) overlay gives you an intuitive view of your data's shape. The KDE line helps you see if your distribution approximates the classic bell curve shape of a normal distribution.

The Shapiro-Wilk test provides a numerical assessment. If the p-value is above your alpha level (typically 0.05), you don't have strong evidence against normality. The null hypothesis is that the data is normal. However, with large samples, even small deviations from normality can lead to significant results, which is why we always look at visualizations too.


### Independency
While independence is often determined by study design, we can check for certain types of dependence like autocorrelation:

In [None]:
from statsmodels.graphics.tsaplots import plot_acf
def check_autocorrelation(data, lags=20, alpha=.05):
   plot_acf(data, lags=lags, alpha=alpha)

### Testing for Homoscedasticity
When comparing groups, we need to verify equal variances. Here's how to implement this check:

In [None]:
import scipy.stats as stats

def check_homoscedasticity(group1, group2, alpha=0.05):
    # Levene's test for equal variances
    statistic, p_value = stats.levene(group1, group2)
    print(f"Levene's test: p-value = {p_value:.4f}")
    
    # Visual check - boxplots
    plt.boxplot([group1, group2])
    plt.title("Boxplot comparison")
    plt.show()

This function uses two complementary approaches:

Levene's test statistically compares the variances between groups. It's more robust than the alternative F-test because it works even when data isn't perfectly normal. Akin to the normality test above, the Levene test tests the null hypothesis that all input samples are from populations with equal variances.

The boxplots provide a visual comparison of the spread in each group. Similar box sizes and whisker lengths suggest similar variances. This visual check is crucial because significant Levene's test results might not always indicate practically important differences in variance.