# **pyCoreRelator** [![GitHub](https://img.shields.io/badge/GitHub-pyCoreRelator-blue?logo=github)](https://github.com/GeoLarryLai/pyCoreRelator) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.xxxxxxxx.svg)](https://doi.org/10.5281/zenodo.xxxxxxxx)
## **Workshop Notebook #5: Core Pair Correlation Analysis**   [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/GeoLarryLai/pyCoreRelator/blob/main/pyCoreRelator_5_core_pair_analysis.ipynb)
This notebook demonstrates the general workflow for using modules from **pyCoreRelator** to correlate core log pairs using Dynamic Time Warping (DTW) with boundary constraints and age considerations.

### Key Functions from **pyCoreRelator**
- **`load_core_log_data()`**: Load and visualize core log data with picked datums
- **`load_core_age_constraints()`**: Load age constraint data for cores
- **`calculate_interpolated_ages()`**: Calculate interpolated ages for picked depths
- **`run_comprehensive_dtw_analysis()`**: Perform comprehensive DTW analysis on core pairs
- **`find_complete_core_paths()`**: Search for complete correlation paths
- **`find_best_mappings()`**: Identify best correlation mappings
- **`visualize_combined_segments()`**: Visualize combined DTW segments
- **`plot_correlation_distribution()`**: Plot quality metric distributions

For advanced usage, see [FUNCTION_DOCUMENTATION.md](https://github.com/GeoLarryLai/pyCoreRelator/blob/main/FUNCTION_DOCUMENTATION.md) for more details.
<hr>

# **Import Packages**
Load core correlation and DTW analysis functions from **pyCoreRelator**

In [None]:
import pandas as pd

from pyCoreRelator import (
    run_comprehensive_dtw_analysis,
    find_complete_core_paths,
    diagnose_chain_breaks,
    calculate_interpolated_ages,
    load_pickeddepth_ages_from_csv,
    visualize_combined_segments,
    visualize_dtw_results_from_csv,
    load_core_log_data,
    plot_correlation_distribution,
    find_best_mappings,
    load_core_age_constraints
)

%matplotlib inline

<hr>

# **Define Core Pair Configuration**

Configure the core pairs and data sources for correlation analysis.

## Select Core A

In [None]:
CORE_A = "M9907-25PC"
# CORE_A = "M9907-23PC"

## Select Core B


In [None]:
CORE_B = "M9907-23PC"
# CORE_B = "M9907-11PC"

## Define Log Columns and Data Paths

Specify which log types to use for correlation and configure file paths for both cores.

In [None]:
LOG_COLUMNS = ['hiresMS', 'CT', 'Lumin']  # Choose which logs to include
# LOG_COLUMNS = ['hiresMS']
DEPTH_COLUMN = 'SB_DEPTH_cm'

**Function: `load_core_log_data()`**

**What it does:**
1. Loads log data from multiple CSV files for a single core
2. Normalizes log values to [0, 1] range if requested
3. Loads picked datum depths from CSV file
4. Creates visualization with core images and log traces
5. Returns log data arrays and picked depth information

**Key Parameters:**
- `log_paths` *(dict)*: Dictionary mapping log names to file paths
- `core_name` *(str)*: Name identifier for the core
- `log_columns` *(list, default=None)*: List of log column names to extract (if None, uses all keys from log_paths)
- `depth_column` *(str, default='SB_DEPTH_cm')*: Name of the depth column
- `normalize` *(bool, default=True)*: Whether to normalize log values to [0, 1]
- `column_alternatives` *(dict, default=None)*: Optional. Dictionary of alternative log column names
- `core_img_1` *(str or array, default=None)*: Path to first core image (e.g., RGB) or pre-loaded image array
- `core_img_2` *(str or array, default=None)*: Path to second core image (e.g., CT) or pre-loaded image array
- `figsize` *(tuple, default=(20, 4))*: Figure size (width, height)
- `picked_datum` *(str, default=None)*: Path to CSV file with picked depths
- `categories` *(int/list/tuple/set, default=None)*: Category or categories to filter and display (None displays all)
- `show_bed_number` *(bool, default=False)*: If True, displays bed numbers next to category depth lines
- `cluster_data` *(dict, default=None)*: Dictionary containing cluster data with keys: 'depth_vals', 'labels_vals', 'k'
- `core_img_1_cmap_range` *(tuple, default=None)*: Color map range for first core image (min_value, max_value)
- `core_img_2_cmap_range` *(tuple, default=None)*: Color map range for second core image (min_value, max_value)
- `show_fig` *(bool, default=True)*: Whether to display the figure

**Returns:**
- `log` (numpy.ndarray): Combined normalized log data array
- `md` (numpy.ndarray): Measured depth array
- `picked_depths` (numpy.ndarray): Array of picked datum depths
- `interpreted_beds` (numpy.ndarray): Array of interpreted bed names corresponding to picked depths


In [None]:
core_a_log_paths = {
    'hiresMS': f'example_data/processed_data/{CORE_A}/{CORE_A}_hiresMS_MLfilled.csv',
    'CT': f'example_data/processed_data/{CORE_A}/{CORE_A}_CT_MLfilled.csv',
    'Lumin': f'example_data/processed_data/{CORE_A}/{CORE_A}_RGB_MLfilled.csv'
}

core_a_rgb_img = f"example_data/processed_data/{CORE_A}/{CORE_A}_RGB.tiff"
core_a_ct_img = f"example_data/processed_data/{CORE_A}/{CORE_A}_CT.tiff"

In [None]:
log_a, md_a, picked_depths_a, interpreted_bed_a = load_core_log_data(
    log_paths=core_a_log_paths,
    core_name=CORE_A,
    log_columns=LOG_COLUMNS,
    depth_column=DEPTH_COLUMN,
    core_img_1=core_a_rgb_img,
    core_img_2=core_a_ct_img,
    picked_datum=f'example_data/picked_datum/{CORE_A}_pickeddepth.csv',
    categories=[1]
)

In [None]:
core_b_log_paths = {
    'hiresMS': f'example_data/processed_data/{CORE_B}/{CORE_B}_hiresMS_MLfilled.csv',
    'CT': f'example_data/processed_data/{CORE_B}/{CORE_B}_CT_MLfilled.csv',
    'Lumin': f'example_data/processed_data/{CORE_B}/{CORE_B}_RGB_MLfilled.csv'
}
core_b_rgb_img = f"example_data/processed_data/{CORE_B}/{CORE_B}_RGB.tiff"
core_b_ct_img = f"example_data/processed_data/{CORE_B}/{CORE_B}_CT.tiff"

In [None]:
log_b, md_b, picked_depths_b, interpreted_bed_b = load_core_log_data(
    log_paths=core_b_log_paths,
    core_name=CORE_B,
    log_columns=LOG_COLUMNS,
    depth_column=DEPTH_COLUMN,
    core_img_1=core_b_rgb_img,
    core_img_2=core_b_ct_img,
    picked_datum=f'example_data/picked_datum/{CORE_B}_pickeddepth.csv',
    categories=[1]
)

<hr>

# **Load Age Data and Estimate Ages for Datums**

## Configure Age Data Parameters

**Function: `load_core_age_constraints()`**

**What it does:**
Loads radiocarbon age constraint data from CSV files for a specific core.

**Key Parameters:**
- `core_name` *(str)*: Name identifier for the core
- `age_base_path` *(str)*: Directory path containing age data CSV files
- `data_columns` *(dict, default=None)*: Dictionary mapping standard field names to CSV column names. Expected keys: 'age', 'pos_error', 'neg_error', 'min_depth', 'max_depth', 'in_sequence', 'core', 'interpreted_bed'
- `mute_mode` *(bool, default=False)*: If True, suppress all print statements

**Required items in `data_columns` dictionary:**
- `'age'`: CSV column name containing calibrated radiocarbon ages (in years BP)
- `'pos_error'`: CSV column name containing positive 2-sigma age uncertainties (in years)
- `'neg_error'`: CSV column name containing negative 2-sigma age uncertainties (in years)
- `'min_depth'`: CSV column name containing minimum depth of the dated interval (in cm)
- `'max_depth'`: CSV column name containing maximum depth of the dated interval (in cm)
- `'in_sequence'`: CSV column name containing flag indicating if constraint is stratigraphically in sequence (boolean or 1/0)
- `'core'`: CSV column name containing core identifier/name
- `'interpreted_bed'`: CSV column name containing interpreted bed name or identifier

**Returns:**
- `depths` (list): Mean depths of age constraints
- `ages` (list): Calibrated radiocarbon ages in years BP
- `pos_errors` (list): Positive 2-sigma uncertainties
- `neg_errors` (list): Negative 2-sigma uncertainties
- `in_sequence_flags` (list): Boolean flags for stratigraphic sequence
- `core` (str): Core identifier


In [None]:
staisch2024 = {
    'age': 'calib810_agebp',
    'pos_error': 'calib810_2sigma_pos', 
    'neg_error': 'calib810_2sigma_neg',
    'min_depth': 'mindepth_cm',
    'max_depth': 'maxdepth_cm',
    'in_sequence': 'in_sequence',
    'core': 'core',
    'interpreted_bed': 'interpreted_bed'
}

In [None]:
# Organize age constraints loading for Core A
age_data_a = load_core_age_constraints(
    CORE_A,
    age_base_path='example_data/raw_data/C14age_data',
    data_columns=staisch2024
)

In [None]:
age_data_b = load_core_age_constraints(
    CORE_B,
    age_base_path='example_data/raw_data/C14age_data',
    data_columns=staisch2024
)

## Estimate Age for Every Picked Datum

**Function: `calculate_interpolated_ages()`**

**What it does:**
1. Interpolates ages for picked depths using age constraint data
2. Calculates age uncertainties using Monte Carlo, Linear, or Gaussian methods
3. Handles top and bottom boundary ages
4. Exports results to CSV and creates visualization plots

**Key Parameters:**
- `picked_datum` *(list)*: Depths to interpolate ages for
- `age_constraints_depths` *(list or pd.Series, default=None)*: List of mean depths for all age constraints (not required if age_data is provided)
- `age_constraints_ages` *(list, default=None)*: List of calibrated ages for all age constraints in years BP (not required if age_data is provided)
- `age_constraints_pos_errors` *(list, default=None)*: List of positive error values for all age constraints in years (not required if age_data is provided)
- `age_constraints_neg_errors` *(list, default=None)*: List of negative error values for all age constraints in years (not required if age_data is provided)
- `age_constraint_source_core` *(list, default=None)*: List of source core names for each age constraint (not required if age_data is provided)
- `age_constraints_in_sequence_flags` *(list, default=None)*: List indicating which age constraints are in sequence (not required if age_data is provided)
- `age_data` *(dict, default=None)*: Dictionary containing age constraint data from `load_core_age_constraints()`. If provided, individual age constraint parameters are not required. Expected keys: 'depths', 'ages', 'pos_errors', 'neg_errors', 'in_sequence_flags', 'core'
- `top_bottom` *(bool, default=True)*: Include top and bottom boundary depths/ages
- `top_age` *(float, default=0)*: Age at top of core in years BP
- `top_age_pos_error` *(float, default=0)*: Positive uncertainty of top age in years
- `top_age_neg_error` *(float, default=0)*: Negative uncertainty of top age in years
- `top_depth` *(float, default=0.0)*: Depth at top of core in cm
- `bottom_depth` *(float, default=None)*: Maximum depth of core in cm (if None, uses last in-sequence constraint depth)
- `uncertainty_method` *(str, default='MonteCarlo')*: Method for uncertainty calculation ('MonteCarlo', 'Linear', or 'Gaussian')
- `n_monte_carlo` *(int, default=10000)*: Number of Monte Carlo iterations (only used when uncertainty_method='MonteCarlo')
- `show_plot` *(bool, default=True)*: Whether to display age-depth plot
- `save_plot` *(bool, default=False)*: Whether to save the age-depth plot
- `plot_filename` *(str, default=None)*: Filename for saving the plot
- `core_name` *(str, default=None)*: Core name for plot title and file naming
- `export_csv` *(bool, default=True)*: Whether to export results to CSV
- `csv_filename` *(str, default=None)*: Output CSV filename (if None, default is '{core_name}_pickeddepth_age_{uncertainty_method}.csv')
- `print_ages` *(bool, default=True)*: Whether to print age data
- `mute_mode` *(bool, default=False)*: Whether to suppress console output

**Alternative Function: `load_pickeddepth_ages_from_csv()`**

**What it does:**
Loads pre-calculated interpolated ages from CSV file (skips the age interpolation calculation step).

**Key Parameters:**
- `pickeddepth_age_csv` *(str)*: Path to CSV file containing pre-calculated ages from `calculate_interpolated_ages()`

**Returns:**
- `depths` (numpy.ndarray): Picked datum depths
- `ages` (numpy.ndarray): Interpolated ages in years BP
- `pos_uncertainties` (numpy.ndarray): Positive age uncertainties
- `neg_uncertainties` (numpy.ndarray): Negative age uncertainties


In [None]:
# Choose the method for age uncertainty calculation: 'MonteCarlo', 'Linear', or 'Gaussian'
sigma_method = 'MonteCarlo'

In [None]:
estimated_datum_ages_a = calculate_interpolated_ages(
    picked_datum=picked_depths_a,
    age_data=age_data_a,
    top_depth=0.0,
    bottom_depth=md_a[-1],
    top_age=0,
    top_age_pos_error=75,
    top_age_neg_error=75,
    uncertainty_method=sigma_method,
    core_name=CORE_A,
    csv_filename=f'example_data/picked_datum/{CORE_A}_pickeddepth_ages_{sigma_method}.csv'
)

In [None]:
# Alternative: Load pre-calculated ages from CSV
# estimated_datum_ages_a = load_pickeddepth_ages_from_csv(
#     pickeddepth_age_csv=f"example_data/picked_datum/{CORE_A}_pickeddepth_ages_{sigma_method}.csv"
# )

In [None]:
estimated_datum_ages_b = calculate_interpolated_ages(
    picked_datum=picked_depths_b,
    age_data=age_data_b,
    top_depth=0.0,
    bottom_depth=md_b[-1],
    top_age=0,
    top_age_pos_error=75,
    top_age_neg_error=75,
    uncertainty_method=sigma_method,
    core_name=CORE_B,
    csv_filename=f'example_data/picked_datum/{CORE_B}_pickeddepth_ages_{sigma_method}.csv'
)

In [None]:
# estimated_datum_ages_b = load_pickeddepth_ages_from_csv(
#     pickeddepth_age_csv=f"example_data/picked_datum/{CORE_B}_pickeddepth_ages_{sigma_method}.csv"
# )

<hr>

# **Run DTW Analysis Between Two Cores**

## Perform DTW Analysis on Segment Pairs

**Function: `run_comprehensive_dtw_analysis()`**

**What it does:**
1. Creates segments between picked datum boundaries
2. Identifies valid segment pairs based on depth and age constraints
3. Performs DTW analysis on each valid segment pair
4. Calculates DTW distances and quality metrics
5. Creates visualizations (DTW matrix, animations) of segment pairs

**Key Parameters:**
- `log_a` *(array)*: Core A log data
- `log_b` *(array)*: Core B log data
- `md_a` *(array)*: Core A measured depths
- `md_b` *(array)*: Core B measured depths
- `picked_datum_a` *(list, default=None)*: Picked datum depths for Core A
- `picked_datum_b` *(list, default=None)*: Picked datum depths for Core B
- `top_bottom` *(bool, default=True)*: Include top and bottom boundaries
- `top_depth` *(float, default=0.0)*: Starting depth for analysis
- `independent_dtw` *(bool, default=False)*: Use independent DTW for each log dimension
- `create_dtw_matrix` *(bool, default=False)*: Generate DTW distance matrix visualization
- `visualize_pairs` *(bool, default=True)*: Show segment pairs in matrix plot
- `visualize_segment_labels` *(bool, default=False)*: Show segment labels in visualizations
- `dtwmatrix_output_filename` *(str, default='SegmentPair_DTW_matrix.png')*: Filename for DTW matrix output
- `creategif` *(bool, default=False)*: Create animated GIF of segment correlations
- `gif_output_filename` *(str, default='SegmentPair_DTW_animation.gif')*: Filename for animation output
- `max_frames` *(int, default=100)*: Maximum number of frames in animation
- `debug` *(bool, default=False)*: Enable debug output
- `color_interval_size` *(float, default=10)*: Color interval size for visualizations
- `keep_frames` *(bool, default=True)*: Keep individual animation frames
- `age_consideration` *(bool, default=False)*: Apply age constraint analysis
- `ages_a` *(dict, default=None)*: Age data for Core A with keys: 'depths', 'ages', 'pos_uncertainties', 'neg_uncertainties'
- `ages_b` *(dict, default=None)*: Age data for Core B with keys: 'depths', 'ages', 'pos_uncertainties', 'neg_uncertainties'
- `restricted_age_correlation` *(bool, default=True)*: Use strict age overlap requirements
- `core_a_age_data` *(dict, default=None)*: Complete age constraint data for Core A from `load_core_age_constraints()`. Expected keys: 'in_sequence_ages', 'in_sequence_depths', 'in_sequence_pos_errors', 'in_sequence_neg_errors', 'core'
- `core_b_age_data` *(dict, default=None)*: Complete age constraint data for Core B from `load_core_age_constraints()`. Expected keys: 'in_sequence_ages', 'in_sequence_depths', 'in_sequence_pos_errors', 'in_sequence_neg_errors', 'core'
- `dtw_distance_threshold` *(float, default=None)*: Maximum allowed DTW distance for segment acceptance
- `exclude_deadend` *(bool, default=True)*: Filter out dead-end segment pairs
- `core_a_name` *(str, default=None)*: Core A identifier
- `core_b_name` *(str, default=None)*: Core B identifier
- `mute_mode` *(bool, default=False)*: Suppress all print output
- `pca_for_dependent_dtw` *(bool, default=False)*: Use PCA for dependent multidimensional DTW (if False, uses conventional multidimensional DTW)
- `dpi` *(int, default=None)*: Resolution for saved figures and GIF frames in dots per inch. If None, uses default (150)

**Returns:**
- `dtw_result` (dict): Dictionary containing all DTW analysis results with keys:
  - `dtw_correlation` (dict): DTW results for valid segment pairs (renamed from dtw_results)
  - `valid_dtw_pairs` (set): Set of valid segment pair tuples (a_idx, b_idx)
  - `segments_a` (list): Segment definitions for Core A
  - `segments_b` (list): Segment definitions for Core B
  - `depth_boundaries_a` (list): Depth boundary indices for Core A
  - `depth_boundaries_b` (list): Depth boundary indices for Core B
  - `dtw_distance_matrix_full` (numpy.ndarray): Full DTW distance matrix


In [None]:
# Determine if age constraints should be considered based on missing values in estimated ages
age_consideration = not (pd.isna(estimated_datum_ages_a['ages']).any() or pd.isna(estimated_datum_ages_b['ages']).any())
# age_consideration = False  # (manual override option for debugging)

# Set whether to enforce strict age overlap requirement for valid segment pairs
restricted_age_overlap = True

# Choose code for which age consideration method is in effect (for filenames/logic)
if age_consideration:
    if restricted_age_overlap:
        YES_NO_AGE = 'restricted_age'   # If strict (restricted) age overlap is used
    else:
        YES_NO_AGE = 'loose_age'        # If looser age overlap acceptance is used
else:
    YES_NO_AGE = 'no_age'               # If age consideration is disabled

In [None]:
dtw_result = run_comprehensive_dtw_analysis(
    log_a,
    log_b,
    md_a,
    md_b,
    picked_datum_a=picked_depths_a,
    picked_datum_b=picked_depths_b,
    core_a_name=CORE_A,
    core_b_name=CORE_B,
    age_consideration=age_consideration,
    ages_a=estimated_datum_ages_a,
    ages_b=estimated_datum_ages_b,
    restricted_age_correlation=restricted_age_overlap,
    core_a_age_data=age_data_a,
    core_b_age_data=age_data_b,
    create_dtw_matrix=True,
    dtwmatrix_output_filename=f'example_data/analytical_outputs/{CORE_A}_{CORE_B}/{"_".join(LOG_COLUMNS)}/SegmentPair_DTW_matrix_{YES_NO_AGE}.png',
    creategif=True,
    gif_output_filename=f'example_data/analytical_outputs/{CORE_A}_{CORE_B}/{"_".join(LOG_COLUMNS)}/SegmentPair_DTW_animation_{YES_NO_AGE}.gif'
)

## Diagnose Connectivity for Topological Solution Search

**Function: `diagnose_chain_breaks()`**

**What it does:**
Analyzes the segment network to identify connectivity issues and potential breaks in the correlation chain. Traces all possible paths and identifies missing connections. Computes total complete paths and finds the "far most" bounding complete paths.

**Key Parameters:**
- `dtw_result` *(dict)*: Dictionary containing DTW analysis results from `run_comprehensive_dtw_analysis()`. Expected keys: 'valid_dtw_pairs', 'segments_a', 'segments_b', 'depth_boundaries_a', 'depth_boundaries_b'

#### Estimating Total Possible Solutions

Given $n$ picked datums per core, segments are $S = 2n + 1$. Theoretical maximum segment pairs: $P_{\max} = S_A \times S_B$. DTW shortest path filtering often retains ~74.5%, giving:

$$P_{\mathrm{valid}} \approx 0.745 \times (S_A \times S_B)$$

Total solution count follows a quadratic-in-log-space pattern (fitted on 97 Cascadia turbidite core pairs, $R^2 > 0.999$):

$$C \approx e^{4.395 (\ln P_{\mathrm{valid}})^2 - 43.179 \ln P_{\mathrm{valid}} + 116.872}$$

**Example:** Cores with 6-8 datums (13-17 segments) yield $10^{4}$-$10^{8}$ solutions; 30-31 datums (61-63 segments) yield $10^{20}$-$10^{24}$ solutions. Solution space grows exponentially with problem size.
**Returns:**
- `complete_paths` (list): All complete correlation paths found
- `num_complete_paths` (int): Total number of complete paths
- `chain_breaks` (list): Identified connectivity breaks in the network


In [None]:
diagnostic_result = diagnose_chain_breaks(dtw_result)


<hr>

# **Find Stratigraphically Plausible Correlation Solutions**

## Search for All Complete DTW Paths

**Function: `find_complete_core_paths()`**

**What it does:**
1. Searches through all valid segment pairs to find complete correlation paths
2. Calculates quality metrics for each complete path
3. Uses shortest path algorithms to optimize search
4. Exports all valid mappings to CSV file

**Key Parameters:**
- `dtw_result` *(dict)*: Dictionary containing DTW analysis results from `run_comprehensive_dtw_analysis()`. Expected keys: 'dtw_correlation', 'valid_dtw_pairs', 'segments_a', 'segments_b', 'depth_boundaries_a', 'depth_boundaries_b', 'dtw_distance_matrix_full'
- `log_a` *(array)*: Core A log data for metric computation
- `log_b` *(array)*: Core B log data for metric computation
- `output_csv` *(str, default='complete_core_paths.csv')*: Output CSV filename for mappings
- `debug` *(bool, default=False)*: Enable detailed progress reporting
- `start_from_top_only` *(bool, default=True)*: Only start paths from top segments
- `batch_size` *(int, default=1000)*: Processing batch size for memory management
- `n_jobs` *(int, default=-1)*: Number of parallel jobs (-1 uses all CPU cores)
- `shortest_path_search` *(bool, default=True)*: Keep only shortest path lengths during search
- `shortest_path_level` *(int, default=2)*: Number of shortest unique lengths to keep (higher = more segments)
- `max_search_path` *(int, default=5000)*: Maximum paths per segment pair to prevent memory overflow. Higher, more comprehensive in the solution search (e.g., 100000 would typically work great).
- `output_metric_only` *(bool, default=False)*: If True, only output quality metrics without full path details
- `mute_mode` *(bool, default=False)*: Suppress all print output
- `pca_for_dependent_dtw` *(bool, default=False)*: Use PCA for dependent DTW quality calculations

**Returns:**
- `complete_path_search_result` (dict): Dictionary containing:
    - `complete_paths` (list): All complete correlation paths with quality metrics
    - `num_paths` (int): Total number of complete paths found
    - `csv_file` (str): Path to output CSV file containing all mappings


In [None]:
complete_path_search_result = find_complete_core_paths(
    dtw_result,
    log_a,
    log_b,
    output_csv=f"example_data/analytical_outputs/{CORE_A}_{CORE_B}/{"_".join(LOG_COLUMNS)}/mappings_{YES_NO_AGE}.csv",
    shortest_path_level=2,
    max_search_path=5000 
)

## Visualize Mapping Results

**Function: `visualize_dtw_results_from_csv()`**

**What it does:**
1. Loads complete path mappings from CSV file
2. Creates visualizations for a representative subset of mappings
3. Generates animated GIFs showing correlation and DTW matrix views
4. Displays age constraints and interpreted bed correlations

**Key Parameters:**
- `input_mapping_csv` *(str)*: Path to CSV file containing DTW mapping results
- `dtw_result` *(dict)*: Dictionary containing DTW analysis results from `run_comprehensive_dtw_analysis()`. Expected keys: 'dtw_correlation', 'valid_dtw_pairs', 'segments_a', 'segments_b', 'depth_boundaries_a', 'depth_boundaries_b', 'dtw_distance_matrix_full'
- `log_a` *(array)*: Normalized log data for Core A
- `log_b` *(array)*: Normalized log data for Core B
- `md_a` *(array)*: Measured depth array for Core A
- `md_b` *(array)*: Measured depth array for Core B
- `color_interval_size` *(int, default=None)*: Step size for warping path visualization
- `max_frames` *(int, default=150)*: Maximum number of frames to generate
- `debug` *(bool, default=False)*: Enable debug output
- `creategif` *(bool, default=True)*: Whether to create GIF files
- `keep_frames` *(bool, default=False)*: Whether to keep individual PNG frames
- `correlation_gif_output_filename` *(str, default='CombinedDTW_correlation_mappings.gif')*: Output filename for correlation GIF
- `matrix_gif_output_filename` *(str, default='CombinedDTW_matrix_mappings.gif')*: Output filename for matrix GIF
- `visualize_pairs` *(bool, default=False)*: Whether to visualize segment pairs
- `visualize_segment_labels` *(bool, default=False)*: Whether to show segment labels
- `mark_depths` *(bool, default=True)*: Whether to mark depth boundaries
- `mark_ages` *(bool, default=True)*: Whether to mark age constraints
- `ages_a` *(dict, default=None)*: Age data dictionaries for Core A picked depths
- `ages_b` *(dict, default=None)*: Age data dictionaries for Core B picked depths
- `core_a_age_data` *(dict, default=None)*: Complete age constraint data from `load_core_age_constraints()`. Expected keys: 'in_sequence_ages', 'in_sequence_depths', 'in_sequence_pos_errors', 'in_sequence_neg_errors', 'core'
- `core_b_age_data` *(dict, default=None)*: Complete age constraint data from `load_core_age_constraints()`. Expected keys: 'in_sequence_ages', 'in_sequence_depths', 'in_sequence_pos_errors', 'in_sequence_neg_errors', 'core'
- `core_a_name` *(str, default=None)*: Core A identifier for labels
- `core_b_name` *(str, default=None)*: Core B identifier for labels
- `core_a_interpreted_beds` *(dict, default=None)*: Interpreted bed names for Core A
- `core_b_interpreted_beds` *(dict, default=None)*: Interpreted bed names for Core B
- `dpi` *(int, default=None)*: Resolution for saved frames and GIFs in dots per inch. If None, uses default (150)



In [None]:
visualize_dtw_results_from_csv(
    dtw_result,
    log_a,
    log_b,
    md_a,
    md_b,
    input_mapping_csv = f'example_data/analytical_outputs/{CORE_A}_{CORE_B}/{"_".join(LOG_COLUMNS)}/mappings_{YES_NO_AGE}.csv',
    core_a_name=CORE_A,
    core_b_name=CORE_B,
    visualize_pairs=False,
    visualize_segment_labels=False,
    mark_depths=False,
    creategif=True,
    correlation_gif_output_filename=f'example_data/analytical_outputs/{CORE_A}_{CORE_B}/{"_".join(LOG_COLUMNS)}/CombinedDTW_correlation_mappings_{YES_NO_AGE}.gif',
    matrix_gif_output_filename=f'example_data/analytical_outputs/{CORE_A}_{CORE_B}/{"_".join(LOG_COLUMNS)}/CombinedDTW_matrix_mappings_{YES_NO_AGE}.gif',
    mark_ages=age_consideration,
    ages_a=estimated_datum_ages_a,
    ages_b=estimated_datum_ages_b,
    core_a_age_data=age_data_a,
    core_b_age_data=age_data_b,
    core_a_interpreted_beds=interpreted_bed_a,
    core_b_interpreted_beds=interpreted_bed_b
)

## Find Best Correlation Mapping

**Function: `find_best_mappings()`**

**What it does:**
1. Loads all complete path mappings from CSV
2. Filters mappings based on interpreted bed correlations (if provided)
3. Ranks mappings by quality metrics
4. Returns top-scored mapping(s)

**Key Parameters:**
- `input_mapping_csv` *(str)*: Path to mappings CSV file
- `top_n` *(int, default=10)*: Number of top mappings to return
- `filter_shortest_dtw` *(bool, default=True)*: Filter for shortest DTW paths
- `metric_weight` *(dict, default=None)*: Custom weights for quality metrics (e.g., {'corr_coef': 2.0, 'norm_dtw': 1.5})
- `core_a_picked_datums` *(array, default=None)*: Picked depths for Core A
- `core_b_picked_datums` *(array, default=None)*: Picked depths for Core B
- `core_a_interpreted_beds` *(dict, default=None)*: Interpreted bed names for Core A
- `core_b_interpreted_beds` *(dict, default=None)*: Interpreted bed names for Core B
- `dtw_result` *(dict, default=None)*: Dictionary containing DTW analysis results from `run_comprehensive_dtw_analysis()`. Expected keys: 'valid_dtw_pairs', 'segments_a', 'segments_b'. Required only for boundary correlation mode

**Returns:**
- `top_mapping_ids` (list): Mapping IDs of top-ranked solutions
- `top_mapping_pairs` (list): Segment pair lists for each top mapping
- `top_mappings_df` (pandas.DataFrame): DataFrame containing top mappings with all quality metrics


In [None]:
top_mapping_ids, top_mapping_pairs, top_mappings_df = find_best_mappings(
    input_mapping_csv = f'example_data/analytical_outputs/{CORE_A}_{CORE_B}/{"_".join(LOG_COLUMNS)}/mappings_{YES_NO_AGE}.csv',
    core_a_picked_datums=picked_depths_a,
    core_b_picked_datums=picked_depths_b,
    core_a_interpreted_beds=interpreted_bed_a,
    core_b_interpreted_beds=interpreted_bed_b,
    dtw_result=dtw_result
)

## Visualize Best Mapping

**Function: `visualize_combined_segments()`**

**What it does:**
1. Combines all segment pairs in the selected mapping
2. Creates comprehensive correlation visualization
3. Shows DTW paths, age constraints, and interpreted bed correlations
4. Saves high-quality figures in specified formats

**Key Parameters:**
- `dtw_result` *(dict)*: Dictionary containing DTW analysis results from `run_comprehensive_dtw_analysis()`. Expected keys: 'dtw_correlation', 'valid_dtw_pairs', 'segments_a', 'segments_b', 'depth_boundaries_a', 'depth_boundaries_b', 'dtw_distance_matrix_full'
- `log_a` *(array)*: Core A log data
- `log_b` *(array)*: Core B log data
- `md_a` *(array)*: Core A measured depths
- `md_b` *(array)*: Core B measured depths
- `segment_pairs_to_combine` *(list)*: List of tuples (a_idx, b_idx) for segment pairs to combine
- `correlation_save_path` *(str, default='CombinedSegmentPairs_DTW_correlation.png')*: Path to save correlation figure
- `matrix_save_path` *(str, default='CombinedSegmentPairs_DTW_matrix.png')*: Path to save DTW matrix figure
- `color_interval_size` *(int, default=None)*: Step size for warping path visualization
- `visualize_pairs` *(bool, default=True)*: Whether to show segment pair boundaries
- `visualize_segment_labels` *(bool, default=True)*: Whether to show segment labels
- `mark_depths` *(bool, default=True)*: Whether to mark picked depths
- `mark_ages` *(bool, default=False)*: Whether to mark age constraints
- `ages_a` *(dict, default=None)*: Age data for Core A picked depths
- `ages_b` *(dict, default=None)*: Age data for Core B picked depths
- `core_a_age_data` *(dict, default=None)*: Complete age constraint data for Core A from `load_core_age_constraints()`
- `core_b_age_data` *(dict, default=None)*: Complete age constraint data for Core B from `load_core_age_constraints()`
- `core_a_name` *(str, default=None)*: Core A identifier for labels
- `core_b_name` *(str, default=None)*: Core B identifier for labels
- `core_a_interpreted_beds` *(dict, default=None)*: Arrays of interpreted bed names corresponding to depth boundaries (when both provided with matching bed names, correlation lines will be drawn between cores)
- `core_b_interpreted_beds` *(dict, default=None)*: Arrays of interpreted bed names corresponding to depth boundaries (when both provided with matching bed names, correlation lines will be drawn between cores)

**Returns:**
- `combined_wp` (numpy.ndarray): Combined warping path spanning all selected segments
- `combined_quality` (dict): Aggregated quality metrics for the combined correlation


In [None]:
%matplotlib inline

_, _ = visualize_combined_segments(
    dtw_result,
    log_a,
    log_b,
    md_a,
    md_b,
    segment_pairs_to_combine=top_mapping_pairs[0],
    visualize_pairs=True,
    visualize_segment_labels=False,
    mark_depths=True,
    correlation_save_path=f'example_data/analytical_outputs/{CORE_A}_{CORE_B}/{"_".join(LOG_COLUMNS)}/CombinedDTW_correlation_{YES_NO_AGE}_{top_mapping_ids[0]}.png',
    matrix_save_path=f'example_data/analytical_outputs/{CORE_A}_{CORE_B}/{"_".join(LOG_COLUMNS)}/CombinedDTW_matrix_{YES_NO_AGE}_{top_mapping_ids[0]}.png',
    mark_ages=age_consideration,
    ages_a=estimated_datum_ages_a if age_consideration else None,
    ages_b=estimated_datum_ages_b if age_consideration else None,
    core_a_age_data=age_data_a if age_consideration else None,
    core_b_age_data=age_data_b if age_consideration else None,
    core_a_name=CORE_A,
    core_b_name=CORE_B,
    core_a_interpreted_beds=interpreted_bed_a,
    core_b_interpreted_beds=interpreted_bed_b
)

## Plot Quality Metric Distributions

**Function: `plot_correlation_distribution()`**

**What it does:**
1. Loads all mappings from CSV file
2. Extracts quality metrics for the target mapping
3. Plots histogram and probability distribution
4. Compares target mapping against all other solutions
5. Saves distribution plot as PNG

**Key Parameters:**
- `mapping_csv` *(str)*: Path to mappings CSV or Parquet file
- `target_mapping_id` *(int, default=None)*: ID of mapping to highlight in the plot (optional)
- `quality_index` *(str)*: Quality metric to plot - **required**. Options: 'corr_coef', 'norm_dtw', 'dtw_ratio', 'variance_deviation', 'perc_diag', 'match_min', 'match_mean', 'perc_age_overlap'
- `save_png` *(bool, default=True)*: Whether to save plot as PNG
- `png_filename` *(str, default=None)*: Output PNG filename (optional)
- `core_a_name` *(str, default=None)*: Core A name for plot title (optional)
- `core_b_name` *(str, default=None)*: Core B name for plot title (optional)
- `bin_width` *(float, default=None)*: Histogram bin width (auto if None, based on quality_index)
- `pdf_method` *(str, default='normal')*: PDF fitting method ('KDE', 'skew-normal', or 'normal')
- `kde_bandwidth` *(float, default=0.05)*: Bandwidth for KDE when pdf_method='KDE'
- `mute_mode` *(bool, default=False)*: If True, suppress all print statements
- `targeted_binsize` *(tuple, default=None)*: (synthetic_bins, bin_width) for consistent bin sizing with synthetic data
- `dpi` *(int, default=None)*: Resolution for saved figures in dots per inch. If None, uses default (150)

**Returns:**
- `fit_params` (dict): Dictionary containing distribution statistics including histogram data, PDF parameters, and percentile information



##### Plot Correlation Coefficient Distribution

In [None]:
_ = plot_correlation_distribution(
    mapping_csv=f'example_data/analytical_outputs/{CORE_A}_{CORE_B}/{"_".join(LOG_COLUMNS)}/mappings_{YES_NO_AGE}.csv',
    target_mapping_id=top_mapping_ids[0],
    quality_index='corr_coef',
    core_a_name=CORE_A,
    core_b_name=CORE_B,
    save_png=True,
    png_filename=f'example_data/analytical_outputs/{CORE_A}_{CORE_B}/{"_".join(LOG_COLUMNS)}/r-values_distribution_{YES_NO_AGE}.png')

##### Plot Normalized DTW Cost Distribution

In [None]:
_ = plot_correlation_distribution(
    mapping_csv=f'example_data/analytical_outputs/{CORE_A}_{CORE_B}/{"_".join(LOG_COLUMNS)}/mappings_{YES_NO_AGE}.csv',
    target_mapping_id=top_mapping_ids[0],
    quality_index='norm_dtw',
    core_a_name=CORE_A,
    core_b_name=CORE_B,
    save_png=True,
    png_filename=f'example_data/analytical_outputs/{CORE_A}_{CORE_B}/{"_".join(LOG_COLUMNS)}/norm_dtw_distribution_{YES_NO_AGE}.png')