# Correlogram Analysis

This notebook analyzes the results from correlogram experiments.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import glob

## 1. Summary Statistics from Main Results

Read the main CSV results file and create a grouped summary table.

In [2]:
# Read main results file
df = pd.read_csv('outputs/results_2026_01_07_correlogram.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,problem,size,size_cat,solver,budget,dist_type,name_type,rep,elapsed,cor_length,onestep_cor,diameter
0,0,OneMax,8,small,correlogram,50,coarse,lin,0,22.642278,3.943671,0.754493,7.9644
1,1,OneMax,8,small,correlogram,50,coarse,str,0,27.572831,3.943671,0.754493,7.9644
2,2,OneMax,8,small,correlogram,50,fine,lin,0,23.381888,3.967602,0.725501,7.9644
3,3,OneMax,8,small,correlogram,50,fine,str,0,24.528706,3.967602,0.725501,7.9644
4,4,OneMax,16,medium,correlogram,200,coarse,lin,0,43.582565,7.817221,0.893213,16.0972


In [3]:
# Create groupby table across problem / size / dist_type / name_type
# Note: The main results file should have columns for diameter, cor_length, and onestep_cor
# If there are multiple reps, we'll aggregate (mean)

groupby_cols = ['problem', 'size', 'dist_type', 'name_type']

# Check if diameter is in the dataframe (it might not be if not saved)
# If not, we'll just show cor_length and onestep_cor
if 'diameter' in df.columns:
    summary_cols = ['diameter', 'cor_length', 'onestep_cor']
else:
    summary_cols = ['cor_length', 'onestep_cor']

summary_table = df.groupby(groupby_cols)[summary_cols].mean()
summary_table

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,diameter,cor_length,onestep_cor
problem,size,dist_type,name_type,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
OneMax,8,coarse,lin,7.9644,3.943671,0.754493
OneMax,8,coarse,str,7.9644,3.943671,0.754493
OneMax,8,fine,lin,7.9644,3.967602,0.725501
OneMax,8,fine,str,7.9644,3.967602,0.725501
OneMax,16,coarse,lin,16.0972,7.817221,0.893213
OneMax,16,coarse,str,16.0972,7.817221,0.893213
OneMax,16,fine,lin,16.0972,7.834287,0.880644
OneMax,16,fine,str,16.0972,7.834287,0.880644
Sphere,8,coarse,lin,16.0,6.46043,0.855054
Sphere,8,coarse,str,16.0,6.46043,0.855054


In [4]:
# Save the summary table to CSV
summary_table.to_csv('outputs/correlogram_summary_table.csv')
print("Summary table saved to outputs/correlogram_summary_table.csv")

Summary table saved to outputs/correlogram_summary_table.csv


## 2. Plot Correlogram Curves

Read each xy results file and create plots in PDF format.

In [5]:
# Find all xy CSV files
xy_files = glob.glob('outputs/results_2026_01_07_correlogram_xy_*.csv')
print(f"Found {len(xy_files)} xy files to plot")

Found 16 xy files to plot


In [6]:
# Create plots for each xy file
for xy_file in xy_files:
    # Read the xy data
    xy_df = pd.read_csv(xy_file)
    
    # Extract metadata from filename
    # Format: results_2026_01_07_correlogram_xy_{problem}_{size}_{size_cat}_{budget}_{dist_type}_{name_type}_rep{rep}.csv
    filename = Path(xy_file).stem
    parts = filename.replace('results_2026_01_07_correlogram_xy_', '').split('_')
    
    # Try to extract metadata (this is a simple approach, might need adjustment)
    # The exact parsing depends on your filename structure
    
    # Create the plot
    plt.figure(figsize=(8, 6))
    plt.plot(xy_df['x_axis'], xy_df['y_axis'], marker='o', linestyle='-', linewidth=2, markersize=6)
    plt.xlabel('Distance', fontsize=12)
    plt.ylabel('Correlation', fontsize=12)
    plt.title(filename.replace('results_2026_01_07_correlogram_xy_', '').replace('_', ' '), fontsize=10)
    plt.grid(True, alpha=0.3)
    plt.axhline(y=0, color='k', linestyle='--', alpha=0.3)
    
    # Save as PDF
    pdf_filename = xy_file.replace('.csv', '.pdf')
    plt.savefig(pdf_filename, format='pdf', bbox_inches='tight')
    plt.close()
    
print(f"Created {len(xy_files)} PDF plots")

Created 16 PDF plots


## 3. Summary Statistics by Problem Type

Additional analysis grouping by different dimensions.

In [7]:
# Group by problem only
by_problem = df.groupby('problem')[summary_cols].mean()
print("\nMean values by problem:")
by_problem


Mean values by problem:


Unnamed: 0_level_0,diameter,cor_length,onestep_cor
problem,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
OneMax,12.0308,5.890695,0.813463
Sphere,16.033667,7.526872,0.930087


In [8]:
# Group by size only
by_size = df.groupby('size')[summary_cols].mean()
print("\nMean values by size:")
by_size


Mean values by size:


Unnamed: 0_level_0,diameter,cor_length,onestep_cor
size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
8,9.319579,4.276,0.821786
16,18.744888,9.141567,0.921763


In [9]:
# Group by dist_type and name_type
by_type = df.groupby(['dist_type', 'name_type'])[summary_cols].mean()
print("\nMean values by dist_type and name_type:")
by_type


Mean values by dist_type and name_type:


Unnamed: 0_level_0,Unnamed: 1_level_0,diameter,cor_length,onestep_cor
dist_type,name_type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
coarse,lin,18.0154,8.543982,0.859379
coarse,str,18.0154,8.543982,0.859379
fine,lin,10.049067,4.873585,0.884171
fine,str,10.049067,4.873585,0.884171
