# Colinearity and Other Statistical Methods

In this notebook I first measure colinearity and show that the situation is pretty bad. Then, I consider some other statistical methods:
1. Multiple linear regressions: we fix the bandit features (_uncertainty_ and _n-experiments_) and do multiple linear regressions, one for each combination of network features. 
2. Ridge and Lasso regressions: these methods use regularization techniques. Especially Lasso might be interesting because it can be viewed as a feature selection method. Overall, Lasso gives the desired results: for densify, Lasso only selects _Average Degree_ and not the other network features; for equalize, Lasso only selects _Gini_ and not _Clustering_. 
3. Bootstrapped Elastic Net regressions: this method can be used for robust feature selection. It uses 5-fold cross-validation to select optimal alpha and l1_ratio. To my surprise, this statistical method always selected all network features.

## Setup

In [74]:
from itertools import chain, combinations

import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.linear_model import ElasticNetCV, Lasso, Ridge
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from statsmodels.stats.outliers_influence import variance_inflation_factor
from tqdm.auto import tqdm

In [75]:
rename_dict = {
    'n_agents': 'Number of Agents',
    'p_rewiring': 'Probability of Rewiring',
    'uncertainty': 'Problem Easiness',
    'n_experiments': 'Number of Experiments',
    'share_of_correct_agents_at_convergence': 'Share of correct agents', # is this CORRECT ?!!?!?!?
    #'share_of_correct_agents_at_convergence':'Share of correct agents',
    'mean_degree': 'Mean Degree',
    'ba_degree':'BA-Degree',
    'convergence_step': 'Steps until convergence (log)',
    'approx_average_clustering_coefficient':'Clustering Coeff.',
    'average_degree': 'Average Degree',
    'degree_gini_coefficient': 'Degree Gini Coeff.',
    'avg_path_length': 'Avg. Path Length',
    'degree_entropy': 'Degree Entropy',
    'diameter': 'Diameter',
    'reachability_dominator_set_size': 'Reach. Dominator SS',
    'reachability_dominator_set_ratio': 'Reach. Dominator SR',
    'condensation_graph_size': 'Cond. Graph Size',
    'condensation_graph_ratio': 'Cond. Graph Ratio',
}

## Measuring colinearity: it's not bad!

In [76]:
def VIF_interpretation(score: float) -> str:
    if score >= 5:
        return "high multicollinearity"
    elif score > 1.5:
        return "acceptable multicollinearity"
    elif score > 1:
        return "minimal multicollinearity"
    elif score == 1:
        return "no multicollinearity"
    else:
        return "something went wrong"


def compute_vif(
    df: pd.DataFrame,
    columns_corr: list[str] = ["degree_average", "degree_gini", "clustering_average"],
):
    X = df[columns_corr].copy()
    X_const = X
    X_const = sm.add_constant(X)

    vif_data = pd.DataFrame(
        {
            "variable": X_const.columns,
            "VIF": [
                variance_inflation_factor(X_const.values, i)
                for i in range(X_const.shape[1])
            ],
            "VIF interpretation": [
                VIF_interpretation(variance_inflation_factor(X_const.values, i))
                for i in range(X_const.shape[1])
            ],
        }
    )
    vif_df = pd.DataFrame(vif_data, columns=["variable", "VIF", "VIF interpretation"])
    vif_df = vif_df[vif_df["variable"].isin(columns_corr)]
    return vif_df


In [77]:
def pearson_interpretation(
      score: float
) -> str:
    score_abs = abs(score)
    if score_abs == 0:
        return "no "
    elif score_abs <= 0.3:
        return "weak"
    elif score_abs <= 0.5:
        return "medium"
    elif score_abs <= 1:
        return "strong"
    else:
        return "error"

def compute_correlations(
    df: pd.DataFrame, 
    columns_corr: list[str] = ["degree_average", "degree_gini",'clustering_average'],
) -> None:
    df_corr = df[columns_corr].corr(method='pearson')
    display(df_corr)
    df_corr_interpretation = df_corr.map(pearson_interpretation)
    display(df_corr_interpretation)

In [78]:
df_densify = pd.read_csv("data/simulation_densify.csv")
network_features = ["degree_average", "degree_gini","clustering_average"]
compute_correlations(df_densify, columns_corr=network_features)
compute_vif(df_densify, columns_corr=network_features)

Unnamed: 0,degree_average,degree_gini,clustering_average
degree_average,1.0,-0.147334,-0.066165
degree_gini,-0.147334,1.0,0.01524
clustering_average,-0.066165,0.01524,1.0


Unnamed: 0,degree_average,degree_gini,clustering_average
degree_average,strong,weak,weak
degree_gini,weak,strong,weak
clustering_average,weak,weak,strong


Unnamed: 0,variable,VIF,VIF interpretation
1,degree_average,1.026477,minimal multicollinearity
2,degree_gini,1.022221,minimal multicollinearity
3,clustering_average,1.004428,minimal multicollinearity


In [79]:
df_equalize = pd.read_csv("data/simulation_equalize.csv")
network_features = ["degree_gini", "clustering_average"]
compute_correlations(df_equalize, columns_corr=network_features)
compute_vif(df_equalize, columns_corr=network_features)

Unnamed: 0,degree_gini,clustering_average
degree_gini,1.0,0.168868
clustering_average,0.168868,1.0


Unnamed: 0,degree_gini,clustering_average
degree_gini,strong,weak
clustering_average,weak,strong


Unnamed: 0,variable,VIF,VIF interpretation
1,degree_gini,1.029354,minimal multicollinearity
2,clustering_average,1.029354,minimal multicollinearity


## New multiple linear regressions

In [80]:
df_densify = pd.read_csv("data/simulation_densify.csv")
df_equalize = pd.read_csv("data/simulation_equalize.csv")

In [81]:
def stats_subsets(
        df: pd.DataFrame, 
        fixed_predictors = ['uncertainty', 'n_experiments'], 
        main_predictors = ['degree_average', 'degree_gini', 'clustering_average'], 
        target = 'conclusion'
    ):

    # Generate all non-empty subsets of the first three predictors
    def all_subsets(lst):
        return chain.from_iterable(combinations(lst, r) for r in range(0, len(lst)+1))

    df_renamed = df.rename(columns={"degree_gini_coefficient": "gini", "approx_average_clustering_coefficient": "clustering"})
    
    results = []

    for subset in all_subsets(main_predictors):
        predictors = list(subset) + fixed_predictors
        X = df_renamed[predictors]
        y = df_renamed[target]

        # Normalize predictors and keep column names by converting back to a DataFrame
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        X_scaled_df = pd.DataFrame(X_scaled, columns=predictors, index=X.index)
        
        # Add intercept (preserves column names so statsmodels will show informative coef names)
        X_scaled_df = sm.add_constant(X_scaled_df, has_constant='add')
        
        # Fit OLS model
        model = sm.OLS(y, X_scaled_df).fit()
        
        # Build a tidy results table for this model
        params = model.params
        pvals = model.pvalues.reindex(params.index)
        results_df = pd.DataFrame({
            'Predictor': params.index,
            'Coefficient': params.values,
            'P-value': pvals.values
        })

        # Print formatted output: predictor table and R-squared
        print(f"Predictors: {predictors}")
        print(f"R-squared: {model.rsquared:.4f}")
        print(f"R-squared-adj: {model.rsquared_adj:.4f}\n")
        print(results_df.to_string(index=False, float_format='{:.6f}'.format))

        print('\n'+'-'*90+'\n')

        # Optionally store results for later
        # results.append({'predictors': predictors, 'results_df': results_df, 'r2': model.rsquared})


In [82]:
stats_subsets(df_equalize, main_predictors=['degree_gini', 'clustering_average'])

Predictors: ['uncertainty', 'n_experiments']
R-squared: 0.4976
R-squared-adj: 0.4974

    Predictor  Coefficient  P-value
        const     0.762036 0.000000
  uncertainty     0.051198 0.000000
n_experiments     0.026178 0.000000

------------------------------------------------------------------------------------------

Predictors: ['degree_gini', 'uncertainty', 'n_experiments']
R-squared: 0.5435
R-squared-adj: 0.5432

    Predictor  Coefficient  P-value
        const     0.762036 0.000000
  degree_gini    -0.017435 0.000000
  uncertainty     0.051191 0.000000
n_experiments     0.026517 0.000000

------------------------------------------------------------------------------------------

Predictors: ['clustering_average', 'uncertainty', 'n_experiments']
R-squared: 0.5010
R-squared-adj: 0.5007

         Predictor  Coefficient  P-value
             const     0.762036 0.000000
clustering_average    -0.004762 0.000000
       uncertainty     0.051210 0.000000
     n_experiments     0.026218

Notes:
- `equalize` keeps average degree fixed, but tries to maximally change Gini while keeping Clustering somewhat equal.
- Adding Gini to the bandit predictors increases the $R^2$ by about 5%.
- Adding Clustering to the bandit predictors increases the $R^2$ by about 1%.
- The coefficients of Gini and Clustering do not change much across multiple conditions. 
- The $R^2$ when including Gini is basically the same as the $R^2$ when including both Gini and Clustering.

In [83]:
stats_subsets(df_densify)

Predictors: ['uncertainty', 'n_experiments']
R-squared: 0.4381
R-squared-adj: 0.4379

    Predictor  Coefficient  P-value
        const     0.758856 0.000000
  uncertainty     0.044228 0.000000
n_experiments     0.022850 0.000000

------------------------------------------------------------------------------------------

Predictors: ['degree_average', 'uncertainty', 'n_experiments']
R-squared: 0.4842
R-squared-adj: 0.4839

     Predictor  Coefficient  P-value
         const     0.758856 0.000000
degree_average     0.016064 0.000000
   uncertainty     0.044793 0.000000
 n_experiments     0.023054 0.000000

------------------------------------------------------------------------------------------

Predictors: ['degree_gini', 'uncertainty', 'n_experiments']
R-squared: 0.4399
R-squared-adj: 0.4396

    Predictor  Coefficient  P-value
        const     0.758856 0.000000
  degree_gini    -0.003164 0.000065
  uncertainty     0.044214 0.000000
n_experiments     0.022891 0.000000

-------------

Notes:
- `densify` increases average degree, while trying to keep Gini and Clustering somewhat equal.
- Adding average-degree to the bandit predictors only increases the $R^2$ by roughly 4%.
- The coefficients of average-degree, Gini and Clustering do not change much across multiple conditions. 
- The $R^2$ when including average-degree is basically the same as the $R^2$ when including Gini and Clustering too.

## Ridge & Lasso

Lasso and Ridge regression are both regularization techniques used to prevent overfitting in linear regression models by adding a penalty term to the loss function.
 The key difference lies in the type of penalty applied: Ridge regression uses L2 regularization, which adds a penalty proportional to the sum of the squared coefficients, while Lasso regression uses L1 regularization, which adds a penalty proportional to the sum of the absolute values of the coefficients.

This fundamental difference leads to distinct behaviors. Ridge regression shrinks the coefficients of less important features toward zero but does not set them exactly to zero, meaning all predictors remain in the model.
 This approach is beneficial when all features are potentially relevant and the goal is to reduce overfitting without eliminating variables.
 In contrast, Lasso regression can set some coefficients exactly to zero, effectively performing automatic feature selection by excluding irrelevant or redundant predictors from the model.
 This results in a simpler, more interpretable model with fewer features.

The geometric interpretation explains this behavior: the L1 constraint in Lasso creates a diamond-shaped boundary with corners, making it more likely for the solution to land on a corner where a coefficient is zero. The L2 constraint in Ridge forms a circular boundary, which is rotationally invariant and less likely to produce zero coefficients.
 Consequently, Lasso is preferred when feature selection is important or when a sparse solution is desired, such as in high-dimensional data where only a few features are expected to be relevant. Ridge is more suitable when all features are believed to contribute to the outcome, such as in predicting house prices where multiple factors like size, location, and number of bedrooms are all potentially important.

Lasso can be more sensitive to outliers due to the absolute value in its penalty term, while Ridge is generally more robust.
 Computationally, Ridge regression is typically faster as it does not involve feature selection, whereas Lasso may be slower due to the need to identify which coefficients to set to zero.
 Both methods require tuning a hyperparameter (lambda) to control the strength of regularization.
 In cases where features are correlated, Elastic Net, which combines both L1 and L2 penalties, can be advantageous by balancing feature selection and handling of correlated predictors.

Conclusion
- For densify: LASSO selects Average Degree as the most relevant network feature. It consistently drives Gini and Clustering to zero, as expected. 
- For equalize: LASSO selects Gini as the most relevant network feature. It consistently drives Clustering to zero, as expected. 

### Ridge

In [84]:
def ridge_subsets(
    df: pd.DataFrame, 
    fixed_predictors = ['uncertainty', 'n_experiments'], 
    main_predictors = ['degree_average', 'degree_gini', 'clustering_average'], 
    target = 'conclusion',
    alpha=1.0 # Regularization strength
):
    """
    Fits Ridge regression models for all subsets of main_predictors.
    """

    # Generate all non-empty subsets
    def all_subsets(lst):
        # Includes the empty subset for completeness, which results in fixed_predictors only
        return chain.from_iterable(combinations(lst, r) for r in range(0, len(lst)+1))
    
    for subset in all_subsets(main_predictors):
        predictors = list(subset) + fixed_predictors
        X = df[predictors]
        y = df[target]

        # Normalize predictors
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        X_scaled_df = pd.DataFrame(X_scaled, columns=predictors, index=X.index)
        
        # Fit Ridge model
        # NOTE: Ridge and Lasso automatically include an intercept (constant) by default.
        model = Ridge(alpha=alpha, random_state=42) # random_state for reproducibility
        model.fit(X_scaled_df, y)
        
        # Extract results
        # Intercept is stored separately in sklearn
        coef_names = ['Intercept'] + predictors
        coefficients = [model.intercept_] + list(model.coef_)
        r_squared = model.score(X_scaled_df, y) # R^2 on the training data
        
        # Build a tidy results table for this model
        results_df = pd.DataFrame({
            'Predictor': coef_names,
            'Coefficient': coefficients,
        })

        # Print formatted output: predictor table and R-squared
        print(f"--- Ridge Regression (alpha={alpha}) ---")
        print(f"Predictors: {predictors}")
        print(f"R-squared (Train): {r_squared:.4f}")
        print(f"Number of non-zero coefficients: {sum(1 for c in model.coef_ if abs(c) > 1e-9)}") # Always equal to len(predictors) in Ridge
        print('\n' + results_df.to_string(index=False, float_format='{:.6f}'.format))

        print('\n'+'-'*90+'\n')

In [85]:
ridge_subsets(df_densify, alpha=1.0)

--- Ridge Regression (alpha=1.0) ---
Predictors: ['uncertainty', 'n_experiments']
R-squared (Train): 0.4381
Number of non-zero coefficients: 2

    Predictor  Coefficient
    Intercept     0.758856
  uncertainty     0.044219
n_experiments     0.022845

------------------------------------------------------------------------------------------

--- Ridge Regression (alpha=1.0) ---
Predictors: ['degree_average', 'uncertainty', 'n_experiments']
R-squared (Train): 0.4842
Number of non-zero coefficients: 3

     Predictor  Coefficient
     Intercept     0.758856
degree_average     0.016060
   uncertainty     0.044783
 n_experiments     0.023050

------------------------------------------------------------------------------------------

--- Ridge Regression (alpha=1.0) ---
Predictors: ['degree_gini', 'uncertainty', 'n_experiments']
R-squared (Train): 0.4399
Number of non-zero coefficients: 3

    Predictor  Coefficient
    Intercept     0.758856
  degree_gini    -0.003163
  uncertainty     0.

In [86]:
ridge_subsets(df_equalize, main_predictors=['degree_gini', 'clustering_average'], alpha=1.0)

--- Ridge Regression (alpha=1.0) ---
Predictors: ['uncertainty', 'n_experiments']
R-squared (Train): 0.4976
Number of non-zero coefficients: 2

    Predictor  Coefficient
    Intercept     0.762036
  uncertainty     0.051188
n_experiments     0.026173

------------------------------------------------------------------------------------------

--- Ridge Regression (alpha=1.0) ---
Predictors: ['degree_gini', 'uncertainty', 'n_experiments']
R-squared (Train): 0.5435
Number of non-zero coefficients: 3

    Predictor  Coefficient
    Intercept     0.762036
  degree_gini    -0.017431
  uncertainty     0.051180
n_experiments     0.026511

------------------------------------------------------------------------------------------

--- Ridge Regression (alpha=1.0) ---
Predictors: ['clustering_average', 'uncertainty', 'n_experiments']
R-squared (Train): 0.5010
Number of non-zero coefficients: 3

         Predictor  Coefficient
         Intercept     0.762036
clustering_average    -0.004761
      

### LASSO

This function uses L1 regularization via sklearn.linear_model.Lasso. LASSO is particularly useful for feature selection as it can drive the coefficients of less important predictors exactly to zero.

In [87]:
def lasso_subsets(
    df: pd.DataFrame, 
    fixed_predictors = ['uncertainty', 'n_experiments'], 
    main_predictors = ['degree_average', 'degree_gini', 'clustering_average'], 
    target = 'conclusion',
    alpha=0.01 # A smaller alpha is often a better starting point for Lasso
):
    """
    Fits LASSO regression models for all subsets of main_predictors.
    """
    # Generate all non-empty subsets
    def all_subsets(lst):
        # Includes the empty subset for completeness, which results in fixed_predictors only
        return chain.from_iterable(combinations(lst, r) for r in range(0, len(lst)+1))

    for subset in all_subsets(main_predictors):
        predictors = list(subset) + fixed_predictors
        X = df[predictors]
        y = df[target]

        # Normalize predictors
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        X_scaled_df = pd.DataFrame(X_scaled, columns=predictors, index=X.index)
        
        # Fit LASSO model
        # NOTE: We use a small alpha (e.g., 0.01) to allow for some coefficients to remain non-zero.
        model = Lasso(alpha=alpha, random_state=42, max_iter=10000) # Increased max_iter for robustness
        model.fit(X_scaled_df, y)
        
        # Extract results
        # Intercept is stored separately in sklearn
        coef_names = ['Intercept'] + predictors
        coefficients = [model.intercept_] + list(model.coef_)
        r_squared = model.score(X_scaled_df, y) # R^2 on the training data
        
        # Build a tidy results table for this model
        results_df = pd.DataFrame({
            'Predictor': coef_names,
            'Coefficient': coefficients,
        })

        # Print formatted output: predictor table and R-squared
        print(f"--- LASSO Regression (alpha={alpha}) ---")
        print(f"Predictors: {predictors}")
        print(f"R-squared (Train): {r_squared:.4f}")
        # Count non-zero coefficients. LASSO is known for setting coefficients exactly to zero.
        print(f"Number of non-zero coefficients: {sum(1 for c in model.coef_ if abs(c) > 1e-9)}") 
        print('\n' + results_df.to_string(index=False, float_format='{:.6f}'.format))

        print('\n'+'-'*90+'\n')

In [88]:
lasso_subsets(df_densify, alpha=0.01)

--- LASSO Regression (alpha=0.01) ---
Predictors: ['uncertainty', 'n_experiments']
R-squared (Train): 0.4018
Number of non-zero coefficients: 2

    Predictor  Coefficient
    Intercept     0.758856
  uncertainty     0.034080
n_experiments     0.012702

------------------------------------------------------------------------------------------

--- LASSO Regression (alpha=0.01) ---
Predictors: ['degree_average', 'uncertainty', 'n_experiments']
R-squared (Train): 0.4283
Number of non-zero coefficients: 3

     Predictor  Coefficient
     Intercept     0.758856
degree_average     0.005571
   uncertainty     0.034276
 n_experiments     0.012773

------------------------------------------------------------------------------------------

--- LASSO Regression (alpha=0.01) ---
Predictors: ['degree_gini', 'uncertainty', 'n_experiments']
R-squared (Train): 0.4018
Number of non-zero coefficients: 2

    Predictor  Coefficient
    Intercept     0.758856
  degree_gini    -0.000000
  uncertainty    

Notes:
- LASSO consistently drives Gini and Clustering to zero, as expected.

In [89]:
lasso_subsets(df_equalize, main_predictors=['degree_gini', 'clustering_average'], alpha=0.01)

--- LASSO Regression (alpha=0.01) ---
Predictors: ['uncertainty', 'n_experiments']
R-squared (Train): 0.4673
Number of non-zero coefficients: 2

    Predictor  Coefficient
    Intercept     0.762036
  uncertainty     0.041160
n_experiments     0.016140

------------------------------------------------------------------------------------------

--- LASSO Regression (alpha=0.01) ---
Predictors: ['degree_gini', 'uncertainty', 'n_experiments']
R-squared (Train): 0.4975
Number of non-zero coefficients: 3

    Predictor  Coefficient
    Intercept     0.762036
  degree_gini    -0.007242
  uncertainty     0.041156
n_experiments     0.016280

------------------------------------------------------------------------------------------

--- LASSO Regression (alpha=0.01) ---
Predictors: ['clustering_average', 'uncertainty', 'n_experiments']
R-squared (Train): 0.4673
Number of non-zero coefficients: 2

         Predictor  Coefficient
         Intercept     0.762036
clustering_average    -0.000000
   

Notes:
- LASSO consistently drives Clustering to zero, as expected.

## Elastic Net

Note:
Elastic Net introduces bias by shrinking coefficient estimates toward zero through regularization (controlled by λ). This bias-variance trade-off reduces model variance and overfitting, often improving prediction accuracy. However, the magnitude of coefficients is underestimated, making them poor estimates of true effect sizes. While useful for feature selection and ranking, the coefficients should not be interpreted as unbiased measures of importance or causal impact.

In [90]:
def stats_enet(
        df: pd.DataFrame, 
        fixed_predictors: list[str]= ['uncertainty', 'n_experiments'], 
        main_predictors: list[str] = ['degree_average', 'degree_gini', 'clustering_average'], 
        target: str = 'conclusion',
        n_bootstraps: int = 100,
        l1_ratio: float | list[float]= [0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1],
        selection_threshold: float = 1e-6
    ) -> None:
    """
    Perform bootstrapped Elastic Net regression with variable selection over all subsets of main predictors.

    Fits ElasticNetCV models on bootstrap samples for every combination of main_predictors 
    (e.g., 'degree_average', 'degree_gini', 'clustering_average') combined with fixed_predictors 
    (e.g., 'uncertainty', 'n_experiments'). For each predictor set:
    
    - Scales features using StandardScaler
    - Uses 5-fold cross-validation to select optimal alpha and l1_ratio
    - Aggregates coefficient means and selection frequencies across bootstraps
    - Computes average R²

    Note: l1_ratio can be set to a float or a list of floats.
    
    Parameters
    ----------
    df : pd.DataFrame
        Input data containing predictors and target variable.
    fixed_predictors : list of str, default ['uncertainty', 'n_experiments']
        Predictors included in all models.
    main_predictors : list of str, default ['degree_average', 'degree_gini', 'clustering_average']
        Predictors to evaluate in all possible subsets.
    target : str, default 'conclusion'
        Name of the target variable column in df.
    n_bootstraps : int, default 100
        Number of bootstrap iterations for stability assessment.
    l1_ratio: float or list of float, default [0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1]
        L1 ratio(s) used in cross-validation step.
    selection_threshold : float, default 1e-6
        Absolute coefficient threshold to count a variable as selected.

    Outputs
    -------
    Prints for each predictor subset:
        - List of predictors
        - Average R² across bootstraps
        - Average R² (adjusted) placeholder (currently not computed)
        - Table of predictors, their average (normalized) coefficients, and selection frequency
    """   

    def all_subsets(lst):
        return chain.from_iterable(combinations(lst, r) for r in range(0, len(lst)+1))
    
    for subset in all_subsets(main_predictors):
        predictors = list(subset) + fixed_predictors
        X = df[predictors]
        y = df[target]

        # Store coefficient and selection count per predictor
        coef_sum = {pred: 0.0 for pred in predictors}
        selection_count = {pred: 0 for pred in predictors}
        r2_sum = 0.0
        r2_adj_sum = 0.0

        for _ in tqdm(range(n_bootstraps)):
            sample_idx = np.random.choice(X.index, size=len(X), replace=True)
            X_boot, y_boot = X.loc[sample_idx], y.loc[sample_idx]

            scaler = StandardScaler()
            X_scaled = scaler.fit_transform(X_boot)
            
            model = ElasticNetCV(
                cv=5, 
                random_state=42, 
                l1_ratio=l1_ratio, 
                max_iter=1000,
            )
            
            model.fit(X_scaled, y_boot)
            
            r2_sum += model.score(X_scaled, y_boot)
            # r2_adj_sum = model.(X_scaled, y_boot)

            for i, pred in enumerate(predictors):
                if abs(model.coef_[i]) > selection_threshold:
                    coef_sum[pred] += model.coef_[i]
                    selection_count[pred] += 1
                    

        # Compute average coefficient and frequency
        results = []
        for pred in predictors:
            avg_coef = coef_sum[pred] / n_bootstraps
            freq = selection_count[pred] / n_bootstraps
            results.append({'Predictor': pred, 'Coefficient': avg_coef, 'Frequency': freq})
        
        results_df = pd.DataFrame(results)
        print(f"Predictors: {predictors}")
        print(f"R-squared: {r2_sum/n_bootstraps:.4f}\n")
        print(f"R-squared (adjusted): {r2_adj_sum/n_bootstraps:.4f}\n")
        print(results_df.to_string(index=False, float_format='{:.6f}'.format))
        print('\n' + '-'*90 + '\n')   

In [91]:
stats_enet(df_densify)

  0%|          | 0/100 [00:00<?, ?it/s]

Predictors: ['uncertainty', 'n_experiments']
R-squared: 0.4391

R-squared (adjusted): 0.0000

    Predictor  Coefficient  Frequency
  uncertainty     0.044329   1.000000
n_experiments     0.022764   1.000000

------------------------------------------------------------------------------------------



  0%|          | 0/100 [00:00<?, ?it/s]

Predictors: ['degree_average', 'uncertainty', 'n_experiments']
R-squared: 0.4831

R-squared (adjusted): 0.0000

     Predictor  Coefficient  Frequency
degree_average     0.015864   1.000000
   uncertainty     0.044808   1.000000
 n_experiments     0.022964   1.000000

------------------------------------------------------------------------------------------



  0%|          | 0/100 [00:00<?, ?it/s]

Predictors: ['degree_gini', 'uncertainty', 'n_experiments']
R-squared: 0.4402

R-squared (adjusted): 0.0000

    Predictor  Coefficient  Frequency
  degree_gini    -0.003120   1.000000
  uncertainty     0.044153   1.000000
n_experiments     0.022935   1.000000

------------------------------------------------------------------------------------------



  0%|          | 0/100 [00:00<?, ?it/s]

Predictors: ['clustering_average', 'uncertainty', 'n_experiments']
R-squared: 0.4382

R-squared (adjusted): 0.0000

         Predictor  Coefficient  Frequency
clustering_average     0.000086   0.680000
       uncertainty     0.043893   1.000000
     n_experiments     0.022622   1.000000

------------------------------------------------------------------------------------------



  0%|          | 0/100 [00:00<?, ?it/s]

Predictors: ['degree_average', 'degree_gini', 'uncertainty', 'n_experiments']
R-squared: 0.4832

R-squared (adjusted): 0.0000

     Predictor  Coefficient  Frequency
degree_average     0.015826   1.000000
   degree_gini    -0.000772   0.840000
   uncertainty     0.044551   1.000000
 n_experiments     0.022914   1.000000

------------------------------------------------------------------------------------------



  0%|          | 0/100 [00:00<?, ?it/s]

Predictors: ['degree_average', 'clustering_average', 'uncertainty', 'n_experiments']
R-squared: 0.4835

R-squared (adjusted): 0.0000

         Predictor  Coefficient  Frequency
    degree_average     0.015977   1.000000
clustering_average     0.001102   0.860000
       uncertainty     0.044698   1.000000
     n_experiments     0.023042   1.000000

------------------------------------------------------------------------------------------



  0%|          | 0/100 [00:00<?, ?it/s]

Predictors: ['degree_gini', 'clustering_average', 'uncertainty', 'n_experiments']
R-squared: 0.4405

R-squared (adjusted): 0.0000

         Predictor  Coefficient  Frequency
       degree_gini    -0.003013   1.000000
clustering_average    -0.000052   0.780000
       uncertainty     0.044101   1.000000
     n_experiments     0.022835   1.000000

------------------------------------------------------------------------------------------



  0%|          | 0/100 [00:00<?, ?it/s]

Predictors: ['degree_average', 'degree_gini', 'clustering_average', 'uncertainty', 'n_experiments']
R-squared: 0.4845

R-squared (adjusted): 0.0000

         Predictor  Coefficient  Frequency
    degree_average     0.016005   1.000000
       degree_gini    -0.000712   0.820000
clustering_average     0.001072   0.900000
       uncertainty     0.044507   1.000000
     n_experiments     0.022927   1.000000

------------------------------------------------------------------------------------------



In [92]:
stats_enet(df_equalize, main_predictors=['degree_gini', 'clustering_average'])

  0%|          | 0/100 [00:00<?, ?it/s]

Predictors: ['uncertainty', 'n_experiments']
R-squared: 0.4990

R-squared (adjusted): 0.0000

    Predictor  Coefficient  Frequency
  uncertainty     0.051401   1.000000
n_experiments     0.025963   1.000000

------------------------------------------------------------------------------------------



  0%|          | 0/100 [00:00<?, ?it/s]

Predictors: ['degree_gini', 'uncertainty', 'n_experiments']
R-squared: 0.5437

R-squared (adjusted): 0.0000

    Predictor  Coefficient  Frequency
  degree_gini    -0.017300   1.000000
  uncertainty     0.051223   1.000000
n_experiments     0.026466   1.000000

------------------------------------------------------------------------------------------



  0%|          | 0/100 [00:00<?, ?it/s]

Predictors: ['clustering_average', 'uncertainty', 'n_experiments']
R-squared: 0.5005

R-squared (adjusted): 0.0000

         Predictor  Coefficient  Frequency
clustering_average    -0.004715   1.000000
       uncertainty     0.051041   1.000000
     n_experiments     0.026131   1.000000

------------------------------------------------------------------------------------------



  0%|          | 0/100 [00:00<?, ?it/s]

Predictors: ['degree_gini', 'clustering_average', 'uncertainty', 'n_experiments']
R-squared: 0.5450

R-squared (adjusted): 0.0000

         Predictor  Coefficient  Frequency
       degree_gini    -0.017039   1.000000
clustering_average    -0.001870   0.990000
       uncertainty     0.051039   1.000000
     n_experiments     0.026546   1.000000

------------------------------------------------------------------------------------------

