PytorchConnectomics · gnuoyea · Jun 14, 2025 · Jun 14, 2025
diff --git a/ves_analysis/LICENSE.md b/ves_analysis/LICENSE.md
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 Micaela Aeyoung Roth
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/ves_analysis/README.md b/ves_analysis/README.md
@@ -0,0 +1,71 @@
+# ves_analysis
+
+## Instructions for running sample data
+
+#### Given files in `sample/sample_data`:
+- `7-13_mask.h5` mask of two adjacent neuron pieces (small chunk, not full neurons)
+	- Note: since we are using a small chunk of data in a neuron adjacency region instead of storing each neuron as an individual file, there is no stitching step for the sample.
+
+- `7-13_pred_filtered.h5` large vesicles within the target neuron (segid 62) in this sample region
+- `7-13_lv_label.txt` vesicle type classifications
+
+#### Generate metadata
+Run `python -i metadata_and_kdtree/kdTreeMeta.py --which_neurons "sample"`
+
+&rarr; Metadata will be saved to `sample/sample_outputs/sample_com_mapping.txt`
+
+#### Export statistics
+Run `python -i updated_export_stats.py --which_neurons “sample”`
+
+&rarr; Volume/diameter stats and list exports will be saved to `sample/sample_outputs/` in multiple files
+
+#### Generate thresholds
+Run `python -i lv_thresholds.py --which_neurons “sample”`
+
+&rarr; Thresholds spreadsheet will be saved to `sample/sample_outputs/lv_thresholds.xlsx`
+
+#### Generate pointcloud counts (note segid of target neuron is 62)
+Run `python -i vesicle_counts/pointcloud_near_counts.py --which_neurons "sample" --target_segid 62 --lv_threshold [lv threshold] --cv_threshold [cv threshold] --dv_threshold [dv threshold] --dvh_threshold [dvh threshold]` (replacing thresholds with those found from sheet exported in previous step)
+
+&rarr; Near neuron counts spreadsheet will be saved to `sample/sample_outputs/sample_near_counts.xlsx`
+
+
+
+## Analysis scripts
+
+### metadata_and_kdtree
+- `kdTreeMeta.py`: Convert format of dataset from binary mask to point cloud for each neuron, storing as a list of local coordinates of vesicle COMs and calculate statistics to store as corresponding attributes. Finally, construct kdtree from point cloud for density map, and export point cloud + metadata information into txt format for easy readability to dictionary format.
+
+### neuron_stitching
+- `stitching_new.py`: Stitching together all pieces of adjacent neurons into the bounding box of a target neuron using global coordinate offsets; stitching two at a time to reduce memory consumption.
+
+### vesicle_counts
+- `pointcloud_soma_counts.py`: Easily generalizable method to return the number of vesicles within a neuronal region given as a binary mask (in this case, the somas of the neurons). Uses extracted values for each vesicle COM coordinate from the mask of the neuron region to determine which vesicles are within the region.
+- `pointcloud_near_counts.py`: Extracts “near neuron” regions of interest via Euclidean Distance Transform for adjacent neuron pieces from stitched files (see `neuron_stitching/stitching_new.py`). Uses different thresholds for each vesicle type (see `vesicle_stats/lv_diameters.py`), then counts the number of vesicles of each type within their corresponding regions of interest, using the aforementioned method from `pointcloud_soma_counts.py`).
+- `LV_type_counts.py, SV_type_counts.py`: Given exported lists (from `pointcloud_near_counts.py`) of segmentation IDs of vesicles within regions of interest and segID to type mappings, returns vesicle counts separated by type. Allows for adaptability and efficiency in cases of changes to type classifications.
+- `slow_counts.py`: Slow, inefficient method—ignore if using point cloud metadata pipeline for vesicles. Manually counts the number of overlapping vesicles in a given region using vectorized binary mask operations (if both neurons and vesicles in form of binary masks).
+
+### neuron_stats
+- `surface_area.py`: Computes the surface area for a given binary mask of a neuron by generating a 1 voxel border using Euclidean distance transform and calculating the volume of the border region.
+
+### vesicle_stats
+- `updated_extract_stats.py`: Extract statistics from point cloud metadata txt format (`metadata_and_kdtree/metadata_and_kdtree.py`), save as a pandas dataframe, and export into spreadsheet format. Also export lists of volumes and diameters as txt files.
+- `lv_diameters.py`: Finding diameter-based thresholds for near neuron counts, to be used in `vesicle_counts/pointcloud_near_counts.py`.
+- `vesicle_volume_stats.py`: For calculating and exporting vesicle volumes only (not useful if using point cloud metadata format).
+- `LUX2_density.py`: Calculating more specific stats for a particular region of interest within the LUX2 neuron.
+
+## Alternate visualization scripts
+Currently UNUSED but functional alternate visualization methods, see `vesicleEM/ves_vis` for final visualization methods which are in use.
+
+### neuroglancer_heatmap
+- `heatmap.py`: Uses a Gaussian filter to calculate a heatmap image for vesicles within a given neuron, for visualization in Neuroglancer.
+- `ng_heatmap_vis.py`: Neuroglancer rendering script to display a full dataset heatmap (from individual neuron heatmap images created using `heatmap.py`), assembled together by projecting into 2D and using coordinate offsets. Normalizing values and using a 4th color channel along with necessary rotations to align files for later color visualization.
+- `ng_shader_script.glsl`: Shader script to plug into Neuroglancer visualization in order to render a gradient of colors based on values from 0.0 to 1.0.
+
+### neuroglancer_types_map
+- `types_visualization.py`: Generates “color coded” vesicle mask files given segmentation ID to type mappings by relabeling each vesicle to indicate its type.
+- `types_ng.py`: Neuroglancer rendering script to display the full dataset of vesicles color coded by type, using previously generated mask files and using the offset feature to align neurons accurately.
+
+### threshold_density_map
+- `color_new.py`: Generates a heatmap using a Gaussian filter, then classifies/colors vesicles according to their density value through three thresholds as a simpler method if a continuous heatmap is not necessary.
+- `color_new_ng.py`: Neuroglancer rendering script for thresholded heatmaps generated in `color_new.py`.
diff --git a/ves_analysis/metadata_and_kdtree/kdTreeMeta.py b/ves_analysis/metadata_and_kdtree/kdTreeMeta.py
@@ -0,0 +1,202 @@
+#author https://github.com/akgohain
+
+import os
+import gc
+import math
+import h5py
+import numpy as np
+from tqdm import tqdm
+import argparse
+from scipy import spatial
+from concurrent.futures import ProcessPoolExecutor, as_completed
+
+chunk_size = 25 #25            
+
+ds_factor = 1             
+
+voxel_dims = (30, 8 * ds_factor, 8 * ds_factor) 
+
+sample_dir = '/home/rothmr/hydra/sample/'
+
+
+#inpute into args to compute for all neurons
+names_20 = ["KR4", "KR5", "KR6", "SHL55", "PN3", "LUX2", "SHL20", "KR11", "KR10", 
+            "RGC2", "KM4", "NET12", "NET10", "NET11", "PN7", "SHL18", 
+            "SHL24", "SHL28", "RGC7", "SHL17"]
+
+
+def process_chunk_range(params):
+    start, end, file_path, ds_factor, name = params
+    chunk_stats = {}
+    with h5py.File(file_path, 'r', swmr=True) as f:
+        dset = f["main"]
+
+        chunk = dset[start:end, ::ds_factor, ::ds_factor]
+        unique_labels = np.unique(chunk)
+        for label in unique_labels:
+            if label == 0:
+                continue 
+            mask = (chunk == label)
+            coords = np.argwhere(mask)  
+            if coords.size == 0:
+                continue
+            coords[:, 0] += start
+            sum_coords = coords.sum(axis=0)
+            count = coords.shape[0]
+            if label in chunk_stats:
+                chunk_stats[label]['sum'] += sum_coords
+                chunk_stats[label]['count'] += count
+            else:
+                chunk_stats[label] = {'sum': sum_coords, 'count': count}
+        del chunk
+        gc.collect()
+    return chunk_stats
+
+def process_chunks(file_path, ds_factor, chunk_size, name, num_workers=None):
+    chunk_stats = {}
+    with h5py.File(file_path, 'r', swmr=True) as f:
+        full_shape = f["main"].shape
+
+    chunk_ranges = []
+    for start in range(0, full_shape[0], chunk_size):
+        end = min(start + chunk_size, full_shape[0])
+        chunk_ranges.append((start, end, file_path, ds_factor, name))
+
+    with ProcessPoolExecutor(max_workers=num_workers) as executor:
+        futures = [executor.submit(process_chunk_range, params) for params in chunk_ranges]
+        for future in tqdm(as_completed(futures), total=len(futures), desc="Processing chunks", ncols=100):
+            result = future.result()
+            for label, data in result.items():
+                if label in chunk_stats:
+                    chunk_stats[label]['sum'] += data['sum']
+                    chunk_stats[label]['count'] += data['count']
+                else:
+                    chunk_stats[label] = data
+    return chunk_stats
+
+
+def compute_metadata(chunk_stats, voxel_dims):
+    """
+    Computes the center-of-mass (COM), volume, and radius for each vesicle.
+    The volume is computed as: voxel_count * (30 * 64 * 64)
+    The radius (in nm) is computed using the micaela/shulin's method:
+       sqrt((voxel_count * (64*64)) / pi)
+    """
+    metadata = {}
+    for label, stats in chunk_stats.items():
+        com = stats['sum'] / stats['count']
+        volume_nm = stats['count'] * (voxel_dims[0] * voxel_dims[1] * voxel_dims[2])
+        radius_nm = math.sqrt((stats['count'] * (voxel_dims[1] * voxel_dims[2])) / math.pi)
+        metadata[label] = {
+            'com': com,
+            'volume_nm': volume_nm,
+            'radius_nm': radius_nm
+        }
+    return metadata
+
+def compute_density(metadata, voxel_dims, kd_radius=500):
+    """
+    Computes local vesicle density using a KDTree.
+    COM coordinates are converted from voxel indices to physical units (nm) and
+    then the density is calculated as: frequency / (kd_radius^2)
+    The density values are also normalized.
+    """
+    com_list = []
+    labels = []
+    for label, data in metadata.items():
+        com = data['com']
+        # Convert voxel indices to physical coordinates (nm)
+        physical_com = np.array([
+            com[0] * voxel_dims[0],
+            com[1] * voxel_dims[1],
+            com[2] * voxel_dims[2]
+        ])
+        com_list.append(physical_com)
+        labels.append(label)
+
+    print("total num of ves: ", len(com_list))
+    com_array = np.array(com_list)
+
+
+    # Build the KDTree and query neighbors within kd_radius (nm)
+    tree = spatial.KDTree(com_array)
+    neighbors = tree.query_ball_tree(tree, kd_radius)
+    frequency = np.array([len(n) for n in neighbors])
+    density = frequency / (kd_radius ** 2)
+
+    # Normalize the density values
+    min_density = np.min(density)
+    max_density = np.max(density)
+    if max_density > min_density:
+        normalized_density = (density - min_density) / (max_density - min_density)
+    else:
+        normalized_density = density
+
+    # Add density info to metadata
+    for i, label in enumerate(labels):
+        metadata[label]['density'] = density[i]
+        metadata[label]['normalized_density'] = normalized_density[i]
+    return metadata
+
+def process_vesicle_data(name, vesicle_type="lv"):
+    """
+    Processes vesicle data (either 'lv' or 'sv') using sequential chunking.
+    Computes COM, volume, radius, and density via KDTree.
+    The metadata is written out to a text file.
+
+    Note: Re-enumerates labels to ensure uniqueness across LV and SV datasets.
+    """
+    file_prefix = "vesicle_big_" if vesicle_type == "lv" else "vesicle_small_"
+    file_path = f"/data/projects/weilab/dataset/hydra/results/{file_prefix}{name}_30-8-8.h5"
+
+    #if sample
+    if(name=="sample"):
+        if(vesicle_type == "lv"): #should only be lv
+            file_path = f"{sample_dir}sample_data/7-13_pred_filtered.h5"
+
+    print(f"Starting chunked processing for {vesicle_type} data of {name}...")
+    chunk_stats = process_chunks(file_path, ds_factor, chunk_size, name)
+    metadata = compute_metadata(chunk_stats, voxel_dims)
+    metadata = compute_density(metadata, voxel_dims, kd_radius=500)
+
+    # Re-enumerate labels to ensure uniqueness by prefixing with vesicle type
+    unique_metadata = {}
+    for label, data in metadata.items():
+        new_label = f"{vesicle_type}_{label}"
+        unique_metadata[new_label] = data
+    metadata = unique_metadata
+
+    # Ensure output directory exists
+    output_dir = f"metadata/{name}/"
+    os.makedirs(output_dir, exist_ok=True)
+
+    output_file = f"metadata/{name}/{name}_{vesicle_type}_com_mapping.txt"
+
+    if(name=="sample"):
+        output_file = f"{sample_dir}sample_outputs/sample_com_mapping.txt"
+
+    with open(output_file, "w") as f:
+        for label, data in metadata.items():
+            f.write(f"{data['com']}: ('{vesicle_type}', {label}, {data['volume_nm']}, {data['radius_nm']}, {data['density']}, {data['normalized_density']})\n")
+    print(f"Chunked processing complete for {vesicle_type} data of {name}!")
+    return metadata
+
+if __name__ == "__main__":
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--which_neurons", type=str, help="all or sample?") #enter as "all" or "sample"
+    args = parser.parse_args()
+    which_neurons = args.which_neurons
+
+    if(which_neurons=="sample"):
+        lv_metadata = process_vesicle_data("sample", vesicle_type="lv")
+        #only need LV for the near counts example pipeline
+
+    elif(which_neurons=="all"):
+        for name in names_20:
+            lv_metadata = process_vesicle_data(name, vesicle_type="lv")
+            sv_metadata = process_vesicle_data(name, vesicle_type="sv")
+
+
+
+