# CLG Step 2: Single Neuron Signal Extraction & 3D Calibration

This notebook implements the core **"CLG" (Comprehensive Label-Guided)** step described in the paper.

**Key Functions:**
1.  **Load Data**: Reads registered functional calcium imaging data (TIFF) and the corresponding 3D structural segmentation masks (from Cellpose).
2.  **Map Slices**: Aligns functional imaging planes to the high-resolution structural 3D volume using a provided mapping file.
3.  **Extract Traces**: Extracts mean fluorescence traces for each ROI on every functional plane.
4.  **3D Calibration (Merge)**: Identifies ROIs belonging to the same 3D neuron (spanning multiple z-planes) based on unique Cellpose IDs and merges their signals to eliminate **axial overcounting**.
5.  **Save Results**: Exports the calibrated single-neuron activity matrix (`trace`), centroids (`pos`), and metadata for downstream analysis.

In [15]:
import os
import numpy as np
import pandas as pd
import tifffile
from skimage.measure import regionprops
from tqdm import tqdm
import scipy.io as sio
import matplotlib.pyplot as plt

BASE_DIR = r'yourpath/CLG-Volumetric-Imaging-Analysis-Framework/main/extraction/fish4_example'
EXPERIMENT = 'spon'

FUNC_DATA_DIR = os.path.join(BASE_DIR, EXPERIMENT, 'result_denoised_registed')
MAPPING_FILE  = os.path.join(BASE_DIR, EXPERIMENT, 'spon.txt')

MASK_DIR = BASE_DIR
MASK_FILENAME = 'C1-fish4 3d double_tiff_reconstructed_imNor_30_0.001000_jupyter_cp2masks.tif'

OUTPUT_DIR = os.path.join(BASE_DIR, EXPERIMENT, 'extraction_results')
os.makedirs(OUTPUT_DIR, exist_ok=True)

print(f'Processing Experiment: {EXPERIMENT}')
print(f'Functional data dir: {FUNC_DATA_DIR}')
print(f'Mask file: {os.path.join(MASK_DIR, MASK_FILENAME)}')
print(f'Output  dir: {OUTPUT_DIR}')

Processing Experiment: spon
Functional data dir: /home/jjzhao/CLG-Volumetric-Imaging-Analysis-Framework/main/extraction/fish4_example/spon/result_denoised_registed
Mask file: /home/jjzhao/CLG-Volumetric-Imaging-Analysis-Framework/main/extraction/fish4_example/C1-fish4 3d double_tiff_reconstructed_imNor_30_0.001000_jupyter_cp2masks.tif
Output  dir: /home/jjzhao/CLG-Volumetric-Imaging-Analysis-Framework/main/extraction/fish4_example/spon/extraction_results


## 1. Load Structural 3D Masks
Load the 3D segmentation result from Cellpose. Each unique integer ID in this volume represents a distinct 3D neuron.

In [16]:
mask_path = os.path.join(MASK_DIR, MASK_FILENAME)
print("Loading 3D Mask Volume...")
mask_3d = tifffile.imread(mask_path)
print(f"Mask Volume Shape: {mask_3d.shape}")

# Calculate 3D Centroids for all neurons (useful for visualization later)
print("Calculating 3D centroids...")
props = regionprops(mask_3d)
centroids_3d = {prop.label: prop.centroid for prop in props}
print(f"Found {len(centroids_3d)} unique 3D neurons.")

Loading 3D Mask Volume...
Mask Volume Shape: (301, 1024, 1024)
Calculating 3D centroids...
Found 67414 unique 3D neurons.


## 2. Load Slice Mapping
Read the text file that maps each Functional Z-plane (low res) to a specific Structural Z-plane (high res).

In [17]:
# Read mapping file (Format: FuncSlice-StructSlice, e.g., 1-48)
# Note: Adjust the separator ('-') or columns as per your specific file format
try:
    mapping_df = pd.read_csv(MAPPING_FILE, sep='-', header=None, names=['FuncSlice', 'StructSlice'])
    print("Mapping loaded successfully.")
    print(mapping_df.head())
except Exception as e:
    print(f"Error loading mapping file: {e}")

# Create a dictionary for easy lookup: func_z -> struct_z
slice_map = dict(zip(mapping_df['FuncSlice'], mapping_df['StructSlice']))

Mapping loaded successfully.
   FuncSlice  StructSlice
0          2           85
1          3           92
2          4           99
3          5          106
4          6          111


## 3. Extract Raw Signals (Per Plane)
Iterate through each functional plane, find its corresponding mask slice, and extract activity traces.
Note: This step generates "redundant" traces (same neuron may be extracted multiple times).

In [18]:
# Initialize lists to store raw extraction results
raw_traces = []
raw_nuc_indices = []
raw_positions = []
raw_areas = []
slice_info = []

# Determine range of slices (assuming file naming z_02.tif, etc.)
func_slices = sorted(slice_map.keys())

print("Starting Signal Extraction...")
total_neurons = 0
for func_z in tqdm(func_slices):
    # 1. Load Functional Image
    # Filename format: z_02.tif. Adjust pattern if needed.
    func_img_path = os.path.join(FUNC_DATA_DIR, f'z_{func_z:02d}.tif')
    if not os.path.exists(func_img_path):
        print(f"Warning: File not found {func_img_path}, skipping.")
        continue
    
    # 2. Get Corresponding Mask Slice
    struct_z = slice_map[func_z]
    mask_z_idx = struct_z - 1 
    if mask_z_idx >= mask_3d.shape[0]:
        print(f"Warning: Mask slice {mask_z_idx} out of bounds. Skipping.")
        continue

    # Print which slices we are processing
    print(f"Reading functional image slice {func_z}, corresponding to structural template slice {struct_z} (Python index: {mask_z_idx}).")

    func_img = tifffile.imread(func_img_path)
    
    # tif图像读取默认就是(T, H, W)的格式   
    num_frames, h, w = func_img.shape

    mask_slice = mask_3d[mask_z_idx, :, :]
    
    # 3. Iterate over each cell in this mask slice
    unique_cells = np.unique(mask_slice)
    unique_cells = unique_cells[unique_cells != 0] # Exclude background

    # Flatten image for faster indexing: (Time, Pixels)
    func_img_flat = func_img.reshape(num_frames, -1)

    cell_count_this_slice = 0
    for cell_id in unique_cells:
        # Find pixels belonging to this cell
        pixel_indices = np.where(mask_slice.ravel() == cell_id)[0]
        
        # Extract mean trace
        cell_pixels = func_img_flat[:, pixel_indices]
        mean_trace = np.mean(cell_pixels, axis=1)
        
        # Calculate 2D centroid on this slice for reference
        ys, xs = np.unravel_index(pixel_indices, (h, w))
        cy, cx = np.mean(ys), np.mean(xs)
        
        # Store Data
        raw_traces.append(mean_trace)
        raw_nuc_indices.append(cell_id)
        raw_positions.append([mask_z_idx, cy, cx]) # Z, Y, X
        raw_areas.append(len(pixel_indices))
        slice_info.append(func_z)
        cell_count_this_slice += 1
    
    total_neurons += cell_count_this_slice
    print(f"Extracted {cell_count_this_slice} cells in this slice, total extracted so far: {total_neurons}.")

print(f"Extraction complete. Found {len(raw_traces)} raw ROIs across all planes.")

Starting Signal Extraction...


  0%|          | 0/28 [00:00<?, ?it/s]

Reading functional image slice 15, corresponding to structural template slice 160 (Python index: 159).


100%|██████████| 28/28 [00:03<00:00,  9.30it/s]

Extracted 2478 cells in this slice, total extracted so far: 2478.
Extraction complete. Found 2478 raw ROIs across all planes.





## 4. 3D Calibration (The CLG Step)
Merge traces that belong to the same 3D neuron ID. This eliminates axial overcounting.

In [19]:
# Convert to numpy arrays for processing
raw_traces_arr = np.array(raw_traces)
raw_nuc_indices_arr = np.array(raw_nuc_indices)
raw_areas_arr = np.array(raw_areas)
raw_positions_arr = np.array(raw_positions)

# Filter out small ROIs (Area <= 4 pixels) - Denoising step
min_area = 4
valid_mask = raw_areas_arr > min_area

traces_filtered = raw_traces_arr[valid_mask]
indices_filtered = raw_nuc_indices_arr[valid_mask]

print(f"After filtering small areas (<=4px): {len(indices_filtered)} ROIs remain.")

# --- MERGE DUPLICATES ---
unique_ids = np.unique(indices_filtered)
print(f"Merging traces... Found {len(unique_ids)} unique 3D neurons.")

merged_traces = []
merged_positions = []
merged_ids = []

for uid in tqdm(unique_ids):
    # Find all instances of this neuron across slices
    idx_matches = np.where(indices_filtered == uid)[0]
    
    if len(idx_matches) == 1:
        # No overlap, just take the single trace
        final_trace = traces_filtered[idx_matches[0]]
    else:
        # Overlap exists: Average the traces
        # (weighted averaging could also be implemented here using areas)
        final_trace = np.mean(traces_filtered[idx_matches], axis=0)
    
    # Get 3D Centroid from the original structural mask volume
    if uid in centroids_3d:
        # centroids_3d is (Z, Y, X)
        pos_3d = centroids_3d[uid]
    else:
        pos_3d = [np.nan, np.nan, np.nan]
        
    merged_traces.append(final_trace)
    merged_positions.append(pos_3d)
    merged_ids.append(uid)

merged_traces = np.array(merged_traces)
merged_positions = np.array(merged_positions)
merged_ids = np.array(merged_ids)

print(f"Final Calibrated Dataset: {merged_traces.shape[0]} Neurons, {merged_traces.shape[1]} Timepoints.")

After filtering small areas (<=4px): 2226 ROIs remain.
Merging traces... Found 2226 unique 3D neurons.


100%|██████████| 2226/2226 [00:00<00:00, 188437.66it/s]

Final Calibrated Dataset: 2226 Neurons, 599 Timepoints.





## 5. Save Results
Save the processed data for Step 3 (Network Analysis).

In [20]:
# 仅用mat文件存储校准前后的结果，方便后续MATLAB处理及对比。

# 保存calibration前（未merge前）的原始trace、pos和nucIndex数据
sio.savemat(os.path.join(OUTPUT_DIR, 'CellTrace_before_calibration.mat'), 
            {
                'trace': raw_traces,                # (N_raw_rois, T)
                'pos': raw_positions,                    # (N_raw_rois, 3)
                'nucIndex': raw_nuc_indices           # (N_raw_rois,)
            })

# 保存calibration后（merge后）的数据
sio.savemat(os.path.join(OUTPUT_DIR, 'CellTrace_after_calibration.mat'), 
            {
                'trace': merged_traces,            # (N_merged_neurons, T)
                'pos': merged_positions,           # (N_merged_neurons, 3)
                'nucIndex': merged_ids             # (N_merged_neurons,)
            })

print(f"Data (before & after calibration) saved to {OUTPUT_DIR}")

Data (before & after calibration) saved to /home/jjzhao/CLG-Volumetric-Imaging-Analysis-Framework/main/extraction/fish4_example/spon/extraction_results
