# ecDNA Detection Pipeline Parameter Optimization and Validation

### This supplementary notebook includes:

#### Hyperparameter Tuning via Bayesian Optimization

- Explanation of method rationale.

- Detailed function explanations.

- Objective function definitions.

- Optimization methodology.


# Introduction and Setup


## Introduction
This notebook presents a systematic approach to optimize and evaluate an automated pipeline for detecting and counting extrachromosomal circular DNA (ecDNA) in Fluorescence in situ Hybridization (FISH) images. The pipeline processes RGB FISH images and corresponding DAPI grayscale images to enhance, detect, and classify ecDNAs, distinguishing them from chromosomal structures based on color and morphology. 

### Purpose
The primary goal is to tune the pipeline's hyperparameters using Bayesian Optimization to minimize the median absolute percentage error (MdAPE) against ground truth data, ensuring robustness against outliers. Subsequently, the optimized parameters are applied to a dataset of 388 images to assess the pipeline's accuracy and compare its performance with predictions from the MIA tool.

### Objectives
- Optimize the Image Processing Pipeline: Refine the pipeline by tuning hyperparameters, including new parameters in the top_hat_enhancement function, to improve ecDNA detection accuracy.
- Validate Performance: Apply the optimized parameters to process 388 ground truth images, comparing predictions against both ground truth and MIA predictions.

### Datasets
Note: Absolute paths `(e.g., FACS-FISH_redistribution_NCIH2170/...)` should be replaced with relative paths or a configuration file for reproducibility.

- **RGB Images**: FISH images with probes (e.g., HER2, MYC) stored in `FACS-FISH_redistribution_NCIH2170/version_2_Acc_87/2_optimization/input_directory/ground_truth_RGB`.
- **DAPI Images**: Grayscale images highlighting nuclei, stored in `FACS-FISH_redistribution_NCIH2170/version_2_Acc_87/2_optimization/input_directory/ground_truth_DAPI`.
- **Ground Truth**: CSV file (`output_ground_truth.csv`) with 388 entries, containing columns `unique_id` and `ground_truth_count`.
- **MIA Predictions**: CSV file (`output_mia.csv`) with columns `Image Name` and `Count` for comparison.


### Dependencies
The following libraries are required. Ensure the specified versions are installed for reproducibility:

In [2]:
import os
import cv2  
import numpy as np  
import math
import csv
import json
import pandas as pd  
from concurrent.futures import ProcessPoolExecutor, as_completed
from bayes_opt import BayesianOptimization  # Bayesian-Optimization 1.2.0
from bayes_opt.event import Events
import matplotlib
import matplotlib.pyplot as plt  
import seaborn as sns

#### To install, use:

`pip install opencv-python==4.5.5 numpy==1.21.0 pandas==1.3.0 bayesian-optimization==1.2.0 matplotlib==3.4.2`

# Hyperparameter Tuning via Bayesian Optimization

## Introduction

The pipeline’s performance hinges on selecting optimal hyperparameters for image processing and object detection steps. Initially, a grid search approach was considered, which evaluates all possible parameter combinations within specified ranges. However, given the high dimensionality of the parameter space (13 parameters) and the wide range of possible values, grid search was deemed computationally infeasible. Bayesian Optimization was adopted instead, as it efficiently explores the parameter space by modeling the objective function as a probabilistic surrogate (typically a Gaussian Process). It balances exploration (testing uncertain regions) and exploitation (refining promising regions), reducing the number of evaluations needed compared to grid search while achieving robust optimization. This method is particularly suited for expensive-to-evaluate functions, such as processing 400 images per iteration.

### Median Absolute Percentage Error (MdAPE)

#### Why the Median Absolute Percentage Error?

In our evaluation of pipeline performance, occasional outliers in predictions (e.g., extremely large or small counts compared to the ground truth) were observed. These outliers disproportionately affected mean-based metrics such as Mean Absolute Percentage Error (MAPE). To mitigate this effect, we selected the Median Absolute Percentage Error (MdAPE) as our primary metric.

MdAPE is defined mathematically as:

$$
\text{MdAPE} = \text{median} \left( \left| \frac{y_{\text{pred}} - y_{\text{true}}}{y_{\text{true}}} \right| \times 100\% \right)
$$

where:

- $y_{\text{pred}}$ is the predicted ecDNA count from our pipeline.
- $y_{\text{true}}$ is the manually annotated ground truth count.

- The median function provides robust performance by ensuring that large deviations in a minority of predictions do not disproportionately influence the overall performance metric.

### Define file paths

In [3]:
rgb_folder = r"/work/users/b/e/behnamie/FACS-FISH_redistribution_NCIH2170/version_2_Acc_87/2_optimization/input_directory/ground_truth_RGB"
dapi_folder = r"/work/users/b/e/behnamie/FACS-FISH_redistribution_NCIH2170/version_2_Acc_87/2_optimization/input_directory/ground_truth_DAPI"
ground_truth_csv = r"/work/users/b/e/behnamie/FACS-FISH_redistribution_NCIH2170/version_2_Acc_87/2_optimization/input_directory/GT.csv"
mia_predictions_csv = r"/work/users/b/e/behnamie/FACS-FISH_redistribution_NCIH2170/version_2_Acc_87/2_optimization/input_directory/output_mia.csv"  # Adjust path as needed

#Load ground truth data
ground_truth_df = pd.read_csv(ground_truth_csv)


### Utility Functions

These functions handle image I/O and preprocessing:

In [4]:
def debug_save_image(image, name, step, out_folder, unique_id=""):
    """
    Saves intermediate images for debugging purposes.

    Parameters:
    image (numpy.ndarray): Image to save.
    name (str): Descriptive name of the image.
    step (int): Processing step number.
    out_folder (str): Output directory.
    unique_id (str): Unique identifier for the image.

    Returns:
    str: Filepath of the saved image.
    """
    filename = f"{unique_id}_{step:02d}_{name}.tif" if unique_id else f"{step:02d}_{name}.tif"
    filepath = os.path.join(out_folder, filename)
    # cv2.imwrite(filepath, image)  # Uncomment to save debug images
    print(f"Saved: {filepath}")
    return filepath

def extract_unique_id(filename, suffixes):
    """
    Extracts the base unique identifier from a filename by removing known suffixes.

    Parameters:
    filename (str): Name of the file.
    suffixes (list): List of possible suffixes to remove.

    Returns:
    str: Extracted unique identifier.
    """
    base = os.path.splitext(filename)[0]
    for suf in suffixes:
        if base.endswith(suf):
            return base[:-len(suf)]
    return base

def find_corresponding_dapi(unique_id, dapi_folder):
    """
    Locates the corresponding DAPI image for a given RGB image based on unique ID.

    Parameters:
    unique_id (str): Unique identifier of the image.
    dapi_folder (str): Directory containing DAPI images.

    Returns:
    str or None: Path to the DAPI image or None if not found.
    """
    dapi_suffixes = ["_DAPI", "_DAPI.tif", "_Merge.tif (RGB)", "_Merge.tif(RGB)",
                     "_Merge.tif (RGB).tif", "_Merge.tif(RGB).tif"]
    for fname in os.listdir(dapi_folder):
        if not fname.lower().endswith(('.tif', '.tiff', '.png')):
            continue
        uid = extract_unique_id(fname, dapi_suffixes)
        if uid == unique_id:
            return os.path.join(dapi_folder, fname)
    return None

### Image Processing Functions

In [5]:
def mask_rgb_with_dapi(rgb_img, dapi_img):
    """
    Masks the RGB image using the DAPI image to focus on the nucleus.

    Parameters:
    rgb_img (numpy.ndarray): RGB image.
    dapi_img (numpy.ndarray): DAPI grayscale image.

    Returns:
    numpy.ndarray: Masked RGB image.

    Logic:
    Pixels where the DAPI image is black (intensity = 0) are set to black in the RGB image,
    isolating the region of interest (ROI) corresponding to the nucleus.
    """
    if len(dapi_img.shape) != 2:
        dapi_img = cv2.cvtColor(dapi_img, cv2.COLOR_BGR2GRAY)
    mask = (dapi_img == 0)
    rgb_masked = rgb_img.copy()
    rgb_masked[mask] = 0
    return rgb_masked

def top_hat_enhancement(gray, kernel_size=20, chrom_kernel_size=200, dampening_factor=0.6):
    """
    Enhances small bright features (ecDNAs) while suppressing large structures (chromosomes).

    Parameters:
    gray (numpy.ndarray): Grayscale image.
    kernel_size (int): Size of the structuring element for top-hat transformation.
    chrom_kernel_size (int): Size of the structuring element to estimate chromosomes.
    dampening_factor (float): Factor (0 < factor < 1) to dampen chromosome regions.

    Returns:
    numpy.ndarray: Enhanced image with suppressed chromosomes.

    Logic:

    Top-Hat Transformation: Morphological opening removes small features, and subtraction from the original image highlights ecDNAs.
    Chromosome Suppression: Morphological closing estimates large structures (chromosomes), 
    which are identified via Otsu thresholding and dampened to preserve nearby ecDNAs. """ 

    # Top-hat transformation
    se = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (int(kernel_size), int(kernel_size)))
    opened = cv2.morphologyEx(gray, cv2.MORPH_OPEN, se)
    top_hat = cv2.subtract(gray, opened)
    top_hat_norm = cv2.normalize(top_hat, None, 0, 255, cv2.NORM_MINMAX, dtype=cv2.CV_8U)
    
    # Estimate chromosomes
    chrom_se = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (int(chrom_kernel_size), int(chrom_kernel_size)))
    chromosome_est = cv2.morphologyEx(gray, cv2.MORPH_CLOSE, chrom_se)
    _, chrom_mask = cv2.threshold(chromosome_est, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    
    # Dampen chromosome regions
    top_hat_soft = top_hat_norm.astype(np.float32)
    top_hat_soft[chrom_mask == 255] *= dampening_factor
    return np.clip(top_hat_soft, 0, 255).astype(np.uint8)
    

def custom_clahe(gray, clip_limit, tile_grid_size):
    """
    Applies Contrast Limited Adaptive Histogram Equalization to enhance local contrast.

    Parameters:
    gray (numpy.ndarray): Grayscale image.
    clip_limit (float): Threshold for contrast limiting.
    tile_grid_size (int or tuple): Size of the grid for histogram equalization.

    Returns:
    numpy.ndarray: Enhanced image.

    Logic:
    CLAHE adjusts intensity histograms locally, improving visibility of ecDNAs against varying backgrounds.
    """
    tile_size = (int(tile_grid_size), int(tile_grid_size))
    clahe = cv2.createCLAHE(clipLimit=clip_limit, tileGridSize=tile_size)
    return clahe.apply(gray)

def apply_sharpening_gray(gray, strength):
    """
    Sharpens the image to enhance edges of ecDNAs.

    Parameters:
    gray (numpy.ndarray): Grayscale image.
    strength (float): Sharpening intensity.

    Returns:
    numpy.ndarray: Sharpened image.

    Logic:
    A high-pass filter amplifies edge gradients, making ecDNA boundaries more distinct.
    """
    kernel = np.array([[0, -1, 0], [-1, 5, -1], [0, -1, 0]], dtype=np.float32) * (strength / 5.0)
    return cv2.filter2D(gray, -1, kernel)

def apply_sigmoid(gray, cutoff, gain):
    """
    Applies a sigmoid function to binarize bright spots.

    Parameters:
    gray (numpy.ndarray): Grayscale image.
    cutoff (float): Intensity threshold for sigmoid.
    gain (float): Steepness of the sigmoid curve.

    Returns:
    numpy.ndarray: Binarized image.
    
    Logic:
    Maps intensities to a binary-like output, emphasizing ecDNAs as bright spots.
    """
    norm = gray.astype(np.float32) / 255.0
    c = cutoff / 255.0
    out = 1.0 / (1.0 + np.exp(-gain * (norm - c)))
    return (out * 255).astype(np.uint8)


def process_images(input_image, params, save_debug=False, output_folder=None, unique_id=""):
    """
    Processes the input image through the enhancement pipeline.

    Parameters:
    input_image (numpy.ndarray): Input RGB or grayscale image.
    [See individual function docstrings for other parameters]

    Returns:
    tuple: (simple_gray, enhanced) - Original grayscale and enhanced images.
    
    Logic:
    Sequentially applies top-hat enhancement, CLAHE, sharpening, and sigmoid adjustment to isolate ecDNAs.
    """ 
    if input_image.ndim == 3:
        gray = cv2.cvtColor(cv2.convertScaleAbs(input_image), cv2.COLOR_BGR2GRAY)
    else:
        gray = cv2.convertScaleAbs(input_image)
    
    simple_gray = gray.copy()
    th = top_hat_enhancement(gray, params['kernel_size'], params['chrom_kernel_size'], params['dampening_factor'])
    sharp = apply_sharpening_gray(th, params['strength'])    
    clahe_img = custom_clahe(sharp, params['clip_limit'], params['tile_grid_size'])
    enhanced = apply_sigmoid(clahe_img, params['cutoff'], params['gain'])
    
    if save_debug and output_folder:
        debug_save_image(simple_gray, "simple_gray", 1, output_folder, unique_id)
        debug_save_image(enhanced, "enhanced_gray", 2, output_folder, unique_id)
    
    return simple_gray, enhanced

# Object Detection and Post-Processing

In [6]:
def merge_close_objects(objects, merge_distance):
    """
    Merges objects closer than a specified distance to avoid over-counting.

    Parameters:
    objects (list): List of detected objects with 'bbox', 'centroid', and 'area'.
    merge_distance (float): Maximum distance to merge objects.
    
    Returns:
    list: Merged objects.

    Logic:
    Combines nearby objects, likely fragments of the same ecDNA, based on centroid proximity.
    """

    merged_objects = []
    taken = [False] * len(objects)
    for i in range(len(objects)):
        if taken[i]:
            continue
        current = objects[i].copy()
        for j in range(i + 1, len(objects)):
            if taken[j]:
                continue
            if math.dist(current["centroid"], objects[j]["centroid"]) < merge_distance:
                x1, y1, w1, h1 = current["bbox"]
                x2, y2, w2, h2 = objects[j]["bbox"]
                current["bbox"] = (min(x1, x2), min(y1, y2), max(x1 + w1, x2 + w2) - min(x1, x2), max(y1 + h1, y2 + h2) - min(y1, y2))
                cx1, cy1 = current["centroid"]
                cx2, cy2 = objects[j]["centroid"]
                current["centroid"] = ((cx1 + cx2) / 2.0, (cy1 + cy2) / 2.0)
                current["area"] += objects[j]["area"]
                taken[j] = True
        merged_objects.append(current)
        taken[i] = True
    return merged_objects

def classify_as_white_or_ecDNA(roi, white_value_threshold=150, white_saturation_threshold=45):
    """
    Classifies objects as 'chromosome' (white) or 'ecDNA' based on HSV color.

    Parameters:
    roi (numpy.ndarray): Region of interest from RGB image.
    white_value_threshold (float): Minimum brightness (V) for white classification.
    white_saturation_threshold (float): Maximum saturation (S) for white classification.

    Returns:
    str: 'chromosome' or 'ecDNA'.

    Logic:
    In HSV space, white objects (chromosomes) have high brightness (V) and low saturation (S),
    while ecDNAs typically exhibit distinct colors.
    """

    hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
    _, S_mean, V_mean, _ = cv2.mean(hsv_roi)
    return "chromosome" if (V_mean > white_value_threshold and S_mean < white_saturation_threshold) else "ecDNA"

def object_detection_and_overlay(enhanced_img, rgb_img_path, params, output_folder, unique_id):
    """
    Detects objects in the enhanced image and classifies them using the RGB image.

    Parameters:
    enhanced_img (numpy.ndarray): Enhanced grayscale image.
    rgb_img_path (str): Path to the corresponding RGB image.
    [See individual function docstrings for other parameters]

    Returns:
    tuple: (total_count, merged_objects, counts) - Count of ecDNAs, detected objects, and classification counts.

    Logic:
    Uses connected components analysis to detect objects, merges close ones, and classifies them based on color.
    """
    _, thresh = cv2.threshold(enhanced_img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (2, 2))
    cleaned = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
    cleaned = cv2.morphologyEx(cleaned, cv2.MORPH_CLOSE, kernel)
    
    num_labels, _, stats, centroids = cv2.connectedComponentsWithStats(cleaned, connectivity=8)
    objects = [{"bbox": (stats[i, 0], stats[i, 1], stats[i, 2], stats[i, 3]), "area": stats[i, 4], "centroid": centroids[i]}
               for i in range(1, num_labels) if params['min_area'] <= stats[i, 4] <= params['max_area']]
    
    merged_objects = merge_close_objects(objects, params['merge_distance'])
    
    if os.path.exists(rgb_img_path):
        rgb_img = cv2.imread(rgb_img_path, cv2.IMREAD_COLOR)
        counts = {"chromosome": 0, "ecDNA": 0}
        for obj in merged_objects:
            x, y, w, h = obj["bbox"]
            roi = rgb_img[y:y+h, x:x+w]
            label = classify_as_white_or_ecDNA(roi, params['white_value_threshold'], params['white_saturation_threshold'])
            counts[label] += 1
    else:
        counts = {}
    
    return len(merged_objects), counts

## Objective Function with MdAPE

In [7]:
def median_absolute_percentage_error(mape_list):
    """
    Computes the median absolute percentage error (MdAPE).

    Parameters:
        mape_list (list): List of percentage errors.

    Returns:
        float: Median of the absolute percentage errors.

    Logic:
        MdAPE provides a robust central tendency measure, minimizing the impact of outliers in error distributions.
    """

    return np.median(mape_list) if mape_list else 1e6

def process_single_image(row, params):
    """
    Processes a single image and computes its MAPE.

    Parameters:
        row (pandas.Series): Ground truth row with 'unique_id' and 'ground_truth_count'.
        params (dict): Hyperparameters for the pipeline.

    Returns:
        float: MAPE for the image.

    Logic:
        Applies the full pipeline to an image and calculates the percentage error against ground truth.
    """
    try:
        unique_id = row["unique_id"]
        true_count = row["ground_truth_count"]
        if true_count == 0:
            return 1000  # High error for undefined MAPE

        rgb_path = os.path.join(rgb_folder, f"{unique_id}.tif")
        dapi_path = find_corresponding_dapi(unique_id, dapi_folder)

        rgb_img = cv2.imread(rgb_path, cv2.IMREAD_COLOR)
        if rgb_img is None:
            return 1000

        if dapi_path:
            dapi_img = cv2.imread(dapi_path, cv2.IMREAD_GRAYSCALE)
            if dapi_img is not None:
                rgb_img = mask_rgb_with_dapi(rgb_img, dapi_img)

        _, enhanced = process_images(rgb_img, params)
        total_count, _ = object_detection_and_overlay(enhanced, rgb_path, params, "", unique_id)
        return 100.0 * abs(total_count - true_count) / true_count

    except Exception as e:
        # Log the error (or print) and return a high error value
        print(f"Error processing image {row['unique_id']}: {e}")
        return 1000

    


def objective_function(kernel_size, clip_limit, tile_grid_size, strength, cutoff, gain,
                      merge_distance, min_area, max_area, white_value_threshold,
                      white_saturation_threshold, chrom_kernel_size, dampening_factor):
    """
    Objective function for Bayesian Optimization to minimize MdAPE.

    Parameters:
        [See individual function docstrings for parameter details]

    Returns:
        float: Negative MdAPE (for maximization in Bayesian Optimization).

    Logic:
        Evaluates the pipeline across all images in parallel, computing MdAPE to guide parameter optimization.
    """
    params = {
        'kernel_size': kernel_size,
        'strength': strength,
        'clip_limit': clip_limit,
        'tile_grid_size': tile_grid_size,
        'cutoff': cutoff,
        'gain': gain,
        'chrom_kernel_size': chrom_kernel_size,
        'dampening_factor': dampening_factor,
        'merge_distance': merge_distance,
        'min_area': min_area,
        'max_area': max_area,
        'white_value_threshold': white_value_threshold,
        'white_saturation_threshold': white_saturation_threshold
    }
    
    ground_truth_df = pd.read_csv(ground_truth_csv)
    mape_values = []

    from concurrent.futures import ThreadPoolExecutor
    with ThreadPoolExecutor(max_workers=4) as executor:
        futures = [executor.submit(process_single_image, row, params) for _, row in ground_truth_df.iterrows()]
        for future in as_completed(futures):
            mape_values.append(future.result())

    
    mdape = median_absolute_percentage_error(mape_values)
    return -mdape  # Negative for maximization

In [8]:
# Testing single image
row = ground_truth_df.iloc[0]  # Pick just one row for quick debugging
test_params = {
    'kernel_size': 30,
    'strength': 5.0,
    'clip_limit': 2.5,
    'tile_grid_size': 30,
    'cutoff': 100,
    'gain': 25,
    'chrom_kernel_size': 200,
    'dampening_factor': 0.5,
    'merge_distance': 10,
    'min_area': 5,
    'max_area': 700,
    'white_value_threshold': 150,
    'white_saturation_threshold': 50
}

print(f"Testing single image pipeline for: {row['unique_id']}")

single_mape = process_single_image(row, test_params)

print(f"Single image MAPE: {single_mape}")


Testing single image pipeline for: NCIH2170_HIGH_HER2_R1_G1_2404_S2_1_Merge(RGB)
Single image MAPE: 6.629834254143646


## Hyperparameter Tuning with Bayesian Optimization

The performance of our automated ecDNA counting pipeline depends heavily on selecting optimal hyperparameters (e.g., kernel sizes for morphological transformations, thresholds for image segmentation, and merging distances). 
Initially, we considered the Grid search approach, which systematically evaluates every possible combination of parameters within specified ranges. However, given the high dimensionality and wide range of possible values for each parameter, grid search proved computationally infeasible. 
To efficiently explore the parameter space, we adopted a Bayesian Optimization approach, which intelligently selects promising parameter configurations based on prior observations, significantly reducing the computational burden compared to grid search while maintaining robust performance.
regions.

### Parameters Tuned
- `kernel_size`: Controls the scale of features enhanced by top-hat transformation.
- `clip_limit`: Limits contrast in CLAHE to prevent over-amplification.
- `tile_grid_size`: Defines the locality of CLAHE adjustments.
- `strength`: Adjusts sharpening intensity.
- `cutoff` and `gain`: Shape the sigmoid function for binarization.
- `chrom_kernel_size`: Estimates chromosome size for suppression.
- `dampening_factor`: Reduces intensity of chromosome regions.
- `merge_distance`: Merges nearby objects to correct fragmentation.
- `min_area` and `max_area`: Filter objects by size.
- `white_value_threshold` and `white_saturation_threshold`: Classobjects by color.



In [8]:
if __name__ == '__main__':
    pbounds = {
        'kernel_size': (10, 50),
        'strength': (2.0, 9.5),
        'clip_limit': (0.1, 5.0),
        'tile_grid_size': (10, 50),
        'cutoff': (50, 150),
        'gain': (1, 50),
        'chrom_kernel_size': (100, 300),
        'dampening_factor': (0.1, 0.9),
        'merge_distance': (1, 20),
        'min_area': (1, 20),
        'max_area': (400, 1000),
        'white_value_threshold': (100, 200),
        'white_saturation_threshold': (20, 80)
    }
    
    optimizer = BayesianOptimization(
        f=objective_function,
        pbounds=pbounds,
        random_state=10,
        verbose=2
    )
    
    # Run optimization: 10 initial points for exploration, 50 iterations for refinement
    optimizer.maximize(init_points=15, n_iter=85)
    
    print("Best Parameters:", optimizer.max)
    
    # Save best parameters
    best_params = optimizer.max['params']
    with open('best_params.json', 'w') as f:
        json.dump(best_params, f)


|   iter    |  target   | chrom_... | clip_l... |  cutoff   | dampen... |   gain    | kernel... | max_area  | merge_... | min_area  | strength  | tile_g... | white_... | white_... |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| [39m1        [39m | [39m-99.68   [39m | [39m254.3    [39m | [39m0.2017   [39m | [39m113.4    [39m | [39m0.699    [39m | [39m25.43    [39m | [39m18.99    [39m | [39m518.8    [39m | [39m15.45    [39m | [39m4.213    [39m | [39m2.663    [39m | [39m37.41    [39m | [39m77.2     [39m | [39m100.4    [39m |
| [35m2        [39m | [35m-33.25   [39m | [35m202.4    [39m | [35m4.082    [39m | [35m111.3    [39m | [35m0.6774   [39m | [35m15.3     [39m | [35m46.71    [39m | [35m828.7    [39m | [35m11.31    [39m | [35m3.701    [39m | [35m4.8      [39m | [35m36.97    [39m | [35m46.51    [

### Conclusion

The validation results demonstrate the pipeline’s performance. The lower MdAPE and higher correlation of the pipeline suggest improved accuracy and consistency.

This study optimized an ecDNA detection pipeline using Bayesian Optimization, achieving a minimized MdAPE of [13.31]. Application to 388 images demonstrated the pipeline's performance relative to ground truth and MIA predictions.



## Acknowledgment

**Use of AI-Based Code Generation and Editing Tools**  
This project was enhanced by the use of AI tools—ChatGPT, Grok, and Copilot—which played significant roles in improving the code, explanations, and overall presentation of the work.

### Specific Contributions

- **ChatGPT**:
  - Added comments to various functions to improve code readability and understanding.
  - Assisted in optimizing functions and replacing less effective ones, leading to better performance and results.
  - Proposed the enhancement process, including the use of Baysian optimization since Grid search was not feasible.
  - Provided clear, simplified explanations of complex concepts to make the content more accessible.
  - Edited text for grammatical accuracy, clarity, and an appropriate tone.

- **Grok**:
  - Contributed to adding comments to functions for better documentation.
  - Helped optimize functions and suggested replacements that improved the project’s outcomes.
  - Assisted in refining explanations to ensure they were concise and easy to understand.


All three tools—ChatGPT, Grok, and Copilot—were used to simplify explanations, making them clearer and more accessible. They also assisted in editing the grammar and tone of the text to improve readability and professionalism.

- **Iterative Refinement Process:**
For most parts of this project, I provided the same prompts to ChatGPT, Grok, and Copilot to compare their responses. I then asked each tool to evaluate the outputs of the others and provide feedback. This feedback was shared back with the tools, allowing them to refine their suggestions iteratively. This process ensured that the final code, explanations, and results were optimized for accuracy, clarity, and effectiveness.

### Citations

- OpenAI. (2023). ChatGPT (Version o3-mini-high for code and logics & Version 4.5 for text editing) Large language model. https://openai.com/chatgpt
- Grok (2023). xAI.(Version 3.0). Used for text generation and code refinement in this project. https://xai.com/grok
- Microsoft Copilot (an AI assistant developed by Microsoft),



