# Pixel size optimization

This notebook outlines how the data for the pixel size optimization was generated for the Leopard-EM manuscript.

The first step was to run pixel size optimzation on all micrographs that had refine template results with 50 LSU particles with a z-score of more than 8.
This was performed by using the following scripts and config files.

I NEED TO REDO THESE WITH THE NEW VERBOSE OPTIONS

In [1]:
!cat run_optimize_template_all.sh


#!/bin/bash

# Load any necessary modules (adjust for your system)
# Print current shell and environment before activation
echo "=== ENVIRONMENT BEFORE ACTIVATION ==="
echo "Current shell: $SHELL"
echo "Current conda environments:"
conda env list
echo "Current Python: $(which python)"
echo "Current Python version: $(python --version 2>&1)"

# Activate leopard-em conda environment 
echo "=== ACTIVATING CONDA ENVIRONMENT ==="
source $(conda info --base)/etc/profile.d/conda.sh
conda activate leopard-em
ACTIVATION_STATUS=$?

# Check if activation succeeded
if [ $ACTIVATION_STATUS -ne 0 ]; then
    echo "ERROR: Failed to activate the leopard-em environment"
    echo "Available environments:"
    conda env list
    exit 1
fi

# Print environment details after activation
echo "=== ENVIRONMENT AFTER ACTIVATION ==="
echo "Active conda environment: $CONDA_PREFIX"
echo "Python interpreter: $(which python)"
echo "Python version: $(python --version 2>&1)"
echo "Conda packages in environment:"
conda

In [2]:
!cat run_optimize_template.py

from leopard_em.pydantic_models.managers import OptimizeTemplateManager

OPTIMIZE_YAML_PATH = "optimize_template_example_config.yaml"


def main() -> None:
    """Main function to run the optimize template program."""
    otm = OptimizeTemplateManager.from_yaml(OPTIMIZE_YAML_PATH)
    otm.run_optimize_template(
        output_text_path="results_optim_px_new/optimize_template_results_crop_4.txt",
        write_individual_csv=True,
        min_snr=8,
    )



if __name__ == "__main__":
    main()


In [3]:
!cat optimize_template_example_config.yaml

#####################################################
### OptimizeTemplateManager configuration example ###
#####################################################
# An example YAML configuration to modify.
# Call `OptimizeTemplateManager.from_yaml(path)` to load this configuration.
particle_stack:
  df_path: results/results_crop_4.csv  # Needs to be readable by pandas
  extracted_box_size: [528, 528]
  original_template_size: [512, 512]
pixel_size_coarse_search:
  enabled: true
  pixel_size_min: -0.05
  pixel_size_max: 0.05
  pixel_size_step: 0.01
pixel_size_fine_search:
  enabled: true
  pixel_size_min: -0.008 
  pixel_size_max: 0.008
  pixel_size_step: 0.001
preprocessing_filters:
  whitening_filter:
    do_power_spectrum: true
    enabled: true
    max_freq: 1.0  # In terms of Nyquist frequency
    num_freq_bins: null
  bandpass_filter:
    enabled: false
    falloff: null
    high_freq_cutoff: null
    low_freq_cutoff: null
computational_config:
  gpu_ids: 0
  num_cpus: 1
simulator:

In [None]:
./run_optimize_template_all.sh

We will now find the best pixel size for each particle.

In [1]:
#!/usr/bin/env python3
"""
Script to find the best pixel size for each particle based on maximum refined_mip.

For each xenon folder in results/, this script:
1. Reads all px_results_pix=*.csv files
2. For each particle, finds the pixel size with maximum refined_mip
3. Saves a particles_best_px.csv file with the best results for each particle
"""

import os
import glob
import pandas as pd
import numpy as np
from pathlib import Path


def extract_pixel_size(filename):
    """Extract pixel size from filename like 'px_results_pix=0.920.csv'"""
    basename = os.path.basename(filename)
    if 'pix=' in basename:
        pix_str = basename.split('pix=')[1].replace('.csv', '')
        return float(pix_str)
    return None


def process_folder(folder_path):
    """Process a single xenon folder to find best pixel size per particle"""
    print(f"\nProcessing: {folder_path}")
    
    # Find all px_results_pix=*.csv files
    px_files = glob.glob(os.path.join(folder_path, 'px_results_pix=*.csv'))
    
    if not px_files:
        print(f"  No px_results_pix=*.csv files found")
        return
    
    print(f"  Found {len(px_files)} pixel size files")
    
    # Dictionary to store data for each particle
    # particle_id -> list of (pix, refined_mip, defocus_u, defocus_v, relative_defocus, refined_relative_defocus)
    particle_data = {}
    
    # Read each file
    for px_file in px_files:
        pix = extract_pixel_size(px_file)
        if pix is None:
            continue
            
        try:
            df = pd.read_csv(px_file, index_col=0)
            
            # Iterate through particles in this file
            for idx, row in df.iterrows():
                particle_idx = row['particle_index']
                refined_mip = row['refined_mip']
                defocus_u = row['defocus_u']
                defocus_v = row['defocus_v']
                relative_defocus = row['relative_defocus']
                refined_relative_defocus = row['refined_relative_defocus']
                
                if particle_idx not in particle_data:
                    particle_data[particle_idx] = []
                
                particle_data[particle_idx].append({
                    'pix': pix,
                    'refined_mip': refined_mip,
                    'defocus_u': defocus_u,
                    'defocus_v': defocus_v,
                    'relative_defocus': relative_defocus,
                    'refined_relative_defocus': refined_relative_defocus
                })
        except Exception as e:
            print(f"  Error reading {px_file}: {e}")
            continue
    
    if not particle_data:
        print(f"  No particle data collected")
        return
    
    # For each particle, find the pixel size with max refined_mip
    best_results = []
    for particle_idx in sorted(particle_data.keys()):
        records = particle_data[particle_idx]
        
        # Find record with maximum refined_mip
        best_record = max(records, key=lambda x: x['refined_mip'])
        
        best_results.append({
            'particle_index': particle_idx,
            'best_pix': best_record['pix'],
            'best_refined_mip': best_record['refined_mip'],
            'defocus_u': best_record['defocus_u'],
            'defocus_v': best_record['defocus_v'],
            'relative_defocus': best_record['relative_defocus'],
            'refined_relative_defocus': best_record['refined_relative_defocus']
        })
    
    # Create DataFrame and save
    result_df = pd.DataFrame(best_results)
    output_file = os.path.join(folder_path, 'particles_best_px.csv')
    result_df.to_csv(output_file, index=False)
    print(f"  Saved {len(best_results)} particles to {output_file}")


def main():
    """Main function to process all xenon folders"""
    results_dir = '/data/papers/Leopard-EM_paper_data/xe30kv/optimize_px/results_clean'
    
    if not os.path.exists(results_dir):
        print(f"Results directory not found: {results_dir}")
        return
    
    # Find all xenon folders
    xenon_folders = sorted(glob.glob(os.path.join(results_dir, 'xenon_*_refined_results')))
    
    print(f"Found {len(xenon_folders)} xenon folders to process")
    
    # Process each folder
    for folder in xenon_folders:
        process_folder(folder)
    
    print("\n=== Processing complete ===")


if __name__ == '__main__':
    main()



Found 70 xenon folders to process

Processing: /data/papers/Leopard-EM_paper_data/xe30kv/optimize_px/results_clean/xenon_213_000_0.0_DWS_refined_results
  Found 23 pixel size files
  Saved 194 particles to /data/papers/Leopard-EM_paper_data/xe30kv/optimize_px/results_clean/xenon_213_000_0.0_DWS_refined_results/particles_best_px.csv

Processing: /data/papers/Leopard-EM_paper_data/xe30kv/optimize_px/results_clean/xenon_214_000_0.0_DWS_refined_results
  Found 21 pixel size files
  Saved 17 particles to /data/papers/Leopard-EM_paper_data/xe30kv/optimize_px/results_clean/xenon_214_000_0.0_DWS_refined_results/particles_best_px.csv

Processing: /data/papers/Leopard-EM_paper_data/xe30kv/optimize_px/results_clean/xenon_215_000_0.0_DWS_refined_results
  Found 23 pixel size files
  Saved 48 particles to /data/papers/Leopard-EM_paper_data/xe30kv/optimize_px/results_clean/xenon_215_000_0.0_DWS_refined_results/particles_best_px.csv

Processing: /data/papers/Leopard-EM_paper_data/xe30kv/optimize_px/r

An now plot the data

In [3]:
#!/usr/bin/env python3
"""
Clean analysis script for optimal pixel size data.
Generates only:
- Plot 4: Histogram of mean optimal pixel size (SNR > 8)
- Plot 5: Normalized SNR vs absolute pixel size change
- Fit data CSV
- Fit parameters CSV
"""

import os
import glob
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy.optimize import curve_fit
from scipy.interpolate import interp1d
import matplotlib.ticker as ticker

# Set font
plt.rcParams['font.family'] = 'Nimbus Sans'
plt.rcParams['font.size'] = 7

# Constants
SNR_THRESHOLD = 8
MIN_PARTICLES = 50

def style_axis(ax):
    """Apply consistent styling to axis"""
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['left'].set_alpha(0.6)
    ax.spines['bottom'].set_alpha(0.6)
    ax.tick_params(axis='both', which='major', labelsize=7, 
                   colors=(0, 0, 0, 0.6), width=0.5)
    ax.xaxis.label.set_alpha(0.6)
    ax.yaxis.label.set_alpha(0.6)

def gaussian(x, amplitude, center, sigma):
    return amplitude * np.exp(-((x - center) / sigma)**2)

def lorentzian(x, amplitude, center, gamma):
    return amplitude / (1 + ((x - center) / gamma)**2)

def voigt(x, amplitude, center, sigma, gamma):
    # Simplified Voigt profile
    return amplitude * np.exp(-((x - center) / sigma)**2) / (1 + ((x - center) / gamma)**2)

def double_gaussian(x, amplitude1, center1, sigma1, amplitude2, center2, sigma2):
    return (amplitude1 * np.exp(-((x - center1) / sigma1)**2) + 
            amplitude2 * np.exp(-((x - center2) / sigma2)**2))

def exponential_decay(x, amplitude, center, decay_rate):
    return amplitude * np.exp(-decay_rate * np.abs(x - center))

def main():
    # Set up paths
    results_dir = '/data/papers/Leopard-EM_paper_data/xe30kv/optimize_px/results_clean'
    os.makedirs(results_dir, exist_ok=True)
    
    # Find all particles_best_px.csv files
    pattern = '/data/papers/Leopard-EM_paper_data/xe30kv/optimize_px/results_clean/xenon_*_refined_results/particles_best_px.csv'
    csv_files = glob.glob(pattern)
    
    print(f"Found {len(csv_files)} particles_best_px.csv files")
    
    # Read and combine all data
    all_data = []
    excluded_files = []
    
    for csv_file in csv_files:
        try:
            df = pd.read_csv(csv_file)
            # Add micrograph name to each particle
            micrograph_name = os.path.basename(os.path.dirname(csv_file))
            df['micrograph'] = micrograph_name
            
            if len(df) < MIN_PARTICLES:
                excluded_files.append(f"{micrograph_name}: {len(df)} particles")
                continue
            all_data.append(df)
        except Exception as e:
            print(f"Error reading {csv_file}: {e}")
            continue
    
    if excluded_files:
        print(f"\nExcluded {len(excluded_files)} files with < {MIN_PARTICLES} particles:")
        for f in excluded_files:
            print(f"  {f}")
    
    if not all_data:
        print("No data files found!")
        return
    
    # Combine all data
    combined_df = pd.concat(all_data, ignore_index=True)
    print(f"\nTotal particles before filtering: {len(combined_df)}")
    
    # Filter by SNR threshold
    filtered_df = combined_df[combined_df['best_refined_mip'] >= SNR_THRESHOLD].copy()
    print(f"Particles with SNR > {SNR_THRESHOLD}: {len(filtered_df)}")
    
    # Calculate mean and std for outlier removal
    mean_px = filtered_df['best_pix'].mean()
    std_px = filtered_df['best_pix'].std()
    
    # Remove outliers (>5 sigma from mean)
    outliers = np.abs(filtered_df['best_pix'] - mean_px) > 5 * std_px
    n_outliers = outliers.sum()
    if n_outliers > 0:
        print(f"Particles excluded (>5σ from mean): {n_outliers}")
        filtered_df = filtered_df[~outliers]
    
    print(f"Particles remaining after all filters: {len(filtered_df)}")
    
    # Calculate micrograph statistics
    micrograph_stats = filtered_df.groupby('micrograph').agg({
        'best_pix': ['mean', 'count'],
        'best_refined_mip': 'mean'
    }).round(4)
    micrograph_stats.columns = ['mean_optimal_px', 'particle_count', 'mean_SNR']
    micrograph_stats = micrograph_stats[micrograph_stats['particle_count'] >= MIN_PARTICLES]
    
    print(f"\nMicrographs after filtering: {len(micrograph_stats)}")
    print(f"Mean particles per micrograph: {micrograph_stats['particle_count'].mean():.1f}")
    
    # === PLOT 4: Histogram of mean optimal pixel size (SNR > 8) ===
    print(f"\nGenerating Plot 4 (SNR>{SNR_THRESHOLD})...")
    
    # Calculate statistics
    mean_px_mgr_p4 = micrograph_stats['mean_optimal_px'].mean()
    std_px_mgr_p4 = micrograph_stats['mean_optimal_px'].std()
    n_mgr_p4 = len(micrograph_stats)
    se_mgr_p4 = std_px_mgr_p4 / np.sqrt(n_mgr_p4)
    ci_mgr_p4 = stats.t.interval(0.95, n_mgr_p4-1, loc=mean_px_mgr_p4, scale=se_mgr_p4)
    
    # Create bins centered around every 0.001
    min_px_mgr = micrograph_stats['mean_optimal_px'].min()
    max_px_mgr = micrograph_stats['mean_optimal_px'].max()
    first_center_mgr = np.ceil(min_px_mgr * 1000) / 1000
    last_center_mgr = np.floor(max_px_mgr * 1000) / 1000
    bin_centers_mgr = np.arange(first_center_mgr, last_center_mgr + 0.001, 0.001)
    bin_edges_mgr = bin_centers_mgr - 0.0005
    bin_edges_mgr = np.append(bin_edges_mgr, bin_centers_mgr[-1] + 0.0005)
    
    fig4, ax4 = plt.subplots(figsize=(90/25.4, 60/25.4))
    
    ax4.hist(micrograph_stats['mean_optimal_px'], bins=bin_edges_mgr, alpha=0.7, 
             edgecolor='black', color='white', linewidth=0.8)
    ax4.axvline(x=mean_px_mgr_p4, color='black', linestyle='--', linewidth=0.8, 
                label=f'Mean = {mean_px_mgr_p4:.4f}')
    ax4.axvline(x=ci_mgr_p4[0], color='black', linestyle=':', linewidth=0.5, alpha=0.6,
                label=f'95% CI: [{ci_mgr_p4[0]:.4f}, {ci_mgr_p4[1]:.4f}]')
    ax4.axvline(x=ci_mgr_p4[1], color='black', linestyle=':', linewidth=0.5, alpha=0.6)
    ax4.set_xlabel('Mean Optimal Pixel Size (Å)')
    ax4.set_ylabel('Frequency (Micrographs)')
    ax4.legend(frameon=False, loc='upper right', fontsize=6, bbox_to_anchor=(0.98, 0.98))
    ax4.grid(False)
    style_axis(ax4)
    
    plt.tight_layout()
    output_plot4_png = os.path.join(results_dir, f'plot4_histogram_meanpx_SNR{SNR_THRESHOLD}.png')
    output_plot4_pdf = os.path.join(results_dir, f'plot4_histogram_meanpx_SNR{SNR_THRESHOLD}.pdf')
    plt.savefig(output_plot4_png, dpi=300, bbox_inches='tight')
    plt.savefig(output_plot4_pdf, bbox_inches='tight')
    print(f"Plot 4 (SNR>{SNR_THRESHOLD}) saved: {output_plot4_png} and .pdf (n={n_mgr_p4} micrographs)")
    plt.close()
    
    # === PLOT 5: Normalized SNR vs Absolute Pixel Size Change ===
    print(f"\nGenerating Plot 5 (absolute pixel size change)...")
    
    # Find all px_results_all.csv files
    all_csv_pattern = '/data/papers/Leopard-EM_paper_data/xe30kv/optimize_px/results_clean/xenon_*_refined_results/px_results_all.csv'
    all_csv_files = glob.glob(all_csv_pattern)
    
    print(f"Found {len(all_csv_files)} px_results_all.csv files")
    
    # Process each micrograph
    all_pct_data = []
    
    for csv_file in all_csv_files:
        try:
            df = pd.read_csv(csv_file)
            
            # Check for correct column names
            if 'Pixel Size (Å)' in df.columns:
                df['optimal_px'] = df['Pixel Size (Å)']
            elif 'px' in df.columns:
                df['optimal_px'] = df['px']
            else:
                continue
            
            if 'SNR' not in df.columns:
                continue
            
            # Remove outlier at 0.943
            df = df[df['optimal_px'] != 0.943]
            
            if len(df) == 0:
                continue
                
            # Find max SNR and corresponding pixel size
            max_snr_idx = df['SNR'].idxmax()
            max_snr = df.loc[max_snr_idx, 'SNR']
            max_px = df.loc[max_snr_idx, 'optimal_px']
            
            # Normalize SNR to 1
            df['normalized_snr'] = df['SNR'] / max_snr
            
            # Calculate absolute pixel size change (Å)
            df['abs_px_change'] = df['optimal_px'] - max_px
            
            # Round to 0.001 Å precision
            df['abs_px_change_rounded'] = np.round(df['abs_px_change'], 3)
            
            # Store data
            all_pct_data.append(df[['abs_px_change_rounded', 'normalized_snr']].copy())
            
        except Exception as e:
            continue
    
    if not all_pct_data:
        print("No data found for Plot 5!")
        return
    
    # Combine all data
    combined_pct_data = pd.concat(all_pct_data, ignore_index=True)
    
    # Average data points within each 0.001 Å bin
    rounded_data = combined_pct_data.groupby('abs_px_change_rounded')['normalized_snr'].mean().reset_index()
    
    # Convert back to percentage change (assuming base pixel size of 0.933 Å)
    base_px = 0.933
    rounded_data['pct_change'] = (rounded_data['abs_px_change_rounded'] / base_px) * 100
    
    # Filter data to only include -4% to +4% range
    rounded_data = rounded_data[(rounded_data['pct_change'] >= -4.0) & (rounded_data['pct_change'] <= 4.0)]
    
    # Prepare data for fitting
    abs_px_change = rounded_data['abs_px_change_rounded'].values
    mean_snr_rounded = rounded_data['normalized_snr'].values
    pct_from_abs = rounded_data['pct_change'].values
    
    # Fit various functions to the rounded data
    fits = {}
    residuals = {}
    
    # Gaussian fit
    try:
        popt_gauss_rounded, _ = curve_fit(gaussian, pct_from_abs, mean_snr_rounded, 
                                        p0=[1.0, 0.0, 2.0], maxfev=10000)
        fits['gaussian'] = popt_gauss_rounded
        residuals['gaussian'] = np.sum((mean_snr_rounded - gaussian(pct_from_abs, *popt_gauss_rounded))**2)
    except:
        residuals['gaussian'] = np.inf
    
    # Lorentzian fit
    try:
        popt_lorentz_rounded, _ = curve_fit(lorentzian, pct_from_abs, mean_snr_rounded, 
                                          p0=[1.0, 0.0, 2.0], maxfev=10000)
        fits['lorentzian'] = popt_lorentz_rounded
        residuals['lorentzian'] = np.sum((mean_snr_rounded - lorentzian(pct_from_abs, *popt_lorentz_rounded))**2)
    except:
        residuals['lorentzian'] = np.inf
    
    # Voigt fit
    try:
        popt_voigt_rounded, _ = curve_fit(voigt, pct_from_abs, mean_snr_rounded, 
                                        p0=[1.0, 0.0, 1.0, 2.0], maxfev=10000)
        fits['voigt'] = popt_voigt_rounded
        residuals['voigt'] = np.sum((mean_snr_rounded - voigt(pct_from_abs, *popt_voigt_rounded))**2)
    except:
        residuals['voigt'] = np.inf
    
    # Double Gaussian fit
    try:
        popt_double_rounded, _ = curve_fit(double_gaussian, pct_from_abs, mean_snr_rounded, 
                                         p0=[0.5, 0.0, 1.0, 0.5, -1.0, 2.0], maxfev=10000)
        fits['double_gaussian'] = popt_double_rounded
        residuals['double_gaussian'] = np.sum((mean_snr_rounded - double_gaussian(pct_from_abs, *popt_double_rounded))**2)
    except:
        residuals['double_gaussian'] = np.inf
    
    # Exponential decay fit
    try:
        popt_exp_rounded, _ = curve_fit(exponential_decay, pct_from_abs, mean_snr_rounded, 
                                      p0=[1.0, 0.0, 0.1], maxfev=10000)
        fits['exponential'] = popt_exp_rounded
        residuals['exponential'] = np.sum((mean_snr_rounded - exponential_decay(pct_from_abs, *popt_exp_rounded))**2)
    except:
        residuals['exponential'] = np.inf
    
    # Choose best fit
    best_fit_name = min(residuals, key=residuals.get)
    print(f"Best fit for rounded data: {best_fit_name} (residuals: {residuals[best_fit_name]:.6f})")
    
    # Save fit parameters to CSV
    fit_params_data = []
    for fit_name, residual in residuals.items():
        if fit_name in fits:
            params = fits[fit_name]
            if fit_name == 'gaussian':
                fit_params_data.append({
                    'fit_type': fit_name,
                    'residuals': residual,
                    'amplitude': params[0],
                    'center': params[1],
                    'sigma': params[2],
                    'gamma': None,
                    'amplitude2': None,
                    'center2': None,
                    'sigma2': None,
                    'decay_rate': None
                })
            elif fit_name == 'lorentzian':
                fit_params_data.append({
                    'fit_type': fit_name,
                    'residuals': residual,
                    'amplitude': params[0],
                    'center': params[1],
                    'sigma': None,
                    'gamma': params[2],
                    'amplitude2': None,
                    'center2': None,
                    'sigma2': None,
                    'decay_rate': None
                })
            elif fit_name == 'voigt':
                fit_params_data.append({
                    'fit_type': fit_name,
                    'residuals': residual,
                    'amplitude': params[0],
                    'center': params[1],
                    'sigma': params[2],
                    'gamma': params[3],
                    'amplitude2': None,
                    'center2': None,
                    'sigma2': None,
                    'decay_rate': None
                })
            elif fit_name == 'double_gaussian':
                fit_params_data.append({
                    'fit_type': fit_name,
                    'residuals': residual,
                    'amplitude': params[0],
                    'center': params[1],
                    'sigma': params[2],
                    'gamma': None,
                    'amplitude2': params[3],
                    'center2': params[4],
                    'sigma2': params[5],
                    'decay_rate': None
                })
            elif fit_name == 'exponential':
                fit_params_data.append({
                    'fit_type': fit_name,
                    'residuals': residual,
                    'amplitude': params[0],
                    'center': params[1],
                    'sigma': None,
                    'gamma': None,
                    'amplitude2': None,
                    'center2': None,
                    'sigma2': None,
                    'decay_rate': params[2]
                })
    
    # Save fit parameters CSV
    fit_params_df = pd.DataFrame(fit_params_data)
    fit_params_csv = os.path.join(results_dir, 'plot5_fit_parameters.csv')
    fit_params_df.to_csv(fit_params_csv, index=False)
    print(f"Fit parameters saved to: {fit_params_csv}")
    
    # Create fine x-axis for smooth plotting
    pct_fine_rounded = np.linspace(pct_from_abs.min(), pct_from_abs.max(), 500)
    
    # Generate fit curve based on best fit
    if best_fit_name == 'gaussian':
        snr_fit_rounded = gaussian(pct_fine_rounded, *fits['gaussian'])
    elif best_fit_name == 'lorentzian':
        snr_fit_rounded = lorentzian(pct_fine_rounded, *fits['lorentzian'])
    elif best_fit_name == 'voigt':
        snr_fit_rounded = voigt(pct_fine_rounded, *fits['voigt'])
    elif best_fit_name == 'double_gaussian':
        snr_fit_rounded = double_gaussian(pct_fine_rounded, *fits['double_gaussian'])
    elif best_fit_name == 'exponential':
        snr_fit_rounded = exponential_decay(pct_fine_rounded, *fits['exponential'])
    
    # Enforce maximum of 1.0
    snr_fit_rounded = np.minimum(snr_fit_rounded, 1.0)
    
    # Create Plot 5
    fig5, ax5 = plt.subplots(figsize=(90/25.4, 60/25.4))
    
    # Plot data points as x's
    ax5.plot(pct_from_abs, mean_snr_rounded, marker='x', color='black', 
             linestyle='None', markersize=3, markeredgewidth=0.5)
    
    # Plot fit
    ax5.plot(pct_fine_rounded, snr_fit_rounded, color='black', linestyle='-', linewidth=1.0)
    
    # Add vertical line at x=0
    ax5.axvline(x=0, color='black', linestyle='--', linewidth=0.8)
    
    # Styling
    ax5.set_xlabel('% Pixel Size Change', fontsize=7)
    ax5.set_ylabel('Normalized z-score', fontsize=7)
    ax5.grid(True, alpha=0.2, linewidth=0.3)
    style_axis(ax5)
    
    plt.tight_layout()
    output_plot5_png = os.path.join(results_dir, 'plot5_normalized_SNR_vs_abs_px_change.png')
    output_plot5_pdf = os.path.join(results_dir, 'plot5_normalized_SNR_vs_abs_px_change.pdf')
    plt.savefig(output_plot5_png, dpi=300, bbox_inches='tight')
    plt.savefig(output_plot5_pdf, bbox_inches='tight')
    print(f"Plot 5 (absolute px change, 0.001 Å rounded) saved: {output_plot5_png} and .pdf")
    plt.close()
    
    # Save fit data to CSV with 0.1% increments
    pct_changes_csv = np.arange(-4.0, 4.1, 0.1)  # -4% to 4% in 0.1% increments
    
    # Interpolate the fit to the requested range
    fit_interp = interp1d(pct_fine_rounded, snr_fit_rounded, kind='linear', bounds_error=False, fill_value='extrapolate')
    snr_values_csv = fit_interp(pct_changes_csv)
    
    # Create DataFrame and save
    fit_data = pd.DataFrame({
        'pct_pixel_change': pct_changes_csv,
        'normalized_snr': snr_values_csv
    })
    fit_data_csv = os.path.join(results_dir, 'plot5_fit_data.csv')
    fit_data.to_csv(fit_data_csv, index=False)
    print(f"Plot 5 fit data saved to: {fit_data_csv}")
    
    print(f"\n=== Clean Analysis Complete ===")
    print(f"Generated files:")
    print(f"  - Plot 4: {output_plot4_png} and {output_plot4_pdf}")
    print(f"  - Plot 5: {output_plot5_png} and {output_plot5_pdf}")
    print(f"  - Fit parameters: {fit_params_csv}")
    print(f"  - Fit data: {fit_data_csv}")

if __name__ == "__main__":
    main()

Found 70 particles_best_px.csv files

Excluded 13 files with < 50 particles:
  xenon_270_000_0.0_DWS_refined_results: 2 particles
  xenon_245_000_0.0_DWS_refined_results: 3 particles
  xenon_255_000_0.0_DWS_refined_results: 1 particles
  xenon_265_000_0.0_DWS_refined_results: 1 particles
  xenon_264_000_0.0_DWS_refined_results: 1 particles
  xenon_274_000_0.0_DWS_refined_results: 40 particles
  xenon_268_000_0.0_DWS_refined_results: 29 particles
  xenon_220_000_0.0_DWS_refined_results: 8 particles
  xenon_271_000_0.0_DWS_refined_results: 3 particles
  xenon_215_000_0.0_DWS_refined_results: 48 particles
  xenon_214_000_0.0_DWS_refined_results: 17 particles
  xenon_253_000_0.0_DWS_refined_results: 37 particles
  xenon_290_000_0.0_DWS_refined_results: 8 particles

Total particles before filtering: 12366
Particles with SNR > 8: 12173
Particles excluded (>5σ from mean): 1
Particles remaining after all filters: 12172

Micrographs after filtering: 57
Mean particles per micrograph: 213.5

Gene