Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions ves_analysis/LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Micaela Aeyoung Roth

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
71 changes: 71 additions & 0 deletions ves_analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# ves_analysis

## Instructions for running sample data

#### Given files in `sample/sample_data`:
- `7-13_mask.h5` mask of two adjacent neuron pieces (small chunk, not full neurons)
- Note: since we are using a small chunk of data in a neuron adjacency region instead of storing each neuron as an individual file, there is no stitching step for the sample.

- `7-13_pred_filtered.h5` large vesicles within the target neuron (segid 62) in this sample region
- `7-13_lv_label.txt` vesicle type classifications

#### Generate metadata
Run `python -i metadata_and_kdtree/kdTreeMeta.py --which_neurons "sample"`

→ Metadata will be saved to `sample/sample_outputs/sample_com_mapping.txt`

#### Export statistics
Run `python -i updated_export_stats.py --which_neurons “sample”`

→ Volume/diameter stats and list exports will be saved to `sample/sample_outputs/` in multiple files

#### Generate thresholds
Run `python -i lv_thresholds.py --which_neurons “sample”`

→ Thresholds spreadsheet will be saved to `sample/sample_outputs/lv_thresholds.xlsx`

#### Generate pointcloud counts (note segid of target neuron is 62)
Run `python -i vesicle_counts/pointcloud_near_counts.py --which_neurons "sample" --target_segid 62 --lv_threshold [lv threshold] --cv_threshold [cv threshold] --dv_threshold [dv threshold] --dvh_threshold [dvh threshold]` (replacing thresholds with those found from sheet exported in previous step)

→ Near neuron counts spreadsheet will be saved to `sample/sample_outputs/sample_near_counts.xlsx`



## Analysis scripts

### metadata_and_kdtree
- `kdTreeMeta.py`: Convert format of dataset from binary mask to point cloud for each neuron, storing as a list of local coordinates of vesicle COMs and calculate statistics to store as corresponding attributes. Finally, construct kdtree from point cloud for density map, and export point cloud + metadata information into txt format for easy readability to dictionary format.

### neuron_stitching
- `stitching_new.py`: Stitching together all pieces of adjacent neurons into the bounding box of a target neuron using global coordinate offsets; stitching two at a time to reduce memory consumption.

### vesicle_counts
- `pointcloud_soma_counts.py`: Easily generalizable method to return the number of vesicles within a neuronal region given as a binary mask (in this case, the somas of the neurons). Uses extracted values for each vesicle COM coordinate from the mask of the neuron region to determine which vesicles are within the region.
- `pointcloud_near_counts.py`: Extracts “near neuron” regions of interest via Euclidean Distance Transform for adjacent neuron pieces from stitched files (see `neuron_stitching/stitching_new.py`). Uses different thresholds for each vesicle type (see `vesicle_stats/lv_diameters.py`), then counts the number of vesicles of each type within their corresponding regions of interest, using the aforementioned method from `pointcloud_soma_counts.py`).
- `LV_type_counts.py, SV_type_counts.py`: Given exported lists (from `pointcloud_near_counts.py`) of segmentation IDs of vesicles within regions of interest and segID to type mappings, returns vesicle counts separated by type. Allows for adaptability and efficiency in cases of changes to type classifications.
- `slow_counts.py`: Slow, inefficient method—ignore if using point cloud metadata pipeline for vesicles. Manually counts the number of overlapping vesicles in a given region using vectorized binary mask operations (if both neurons and vesicles in form of binary masks).

### neuron_stats
- `surface_area.py`: Computes the surface area for a given binary mask of a neuron by generating a 1 voxel border using Euclidean distance transform and calculating the volume of the border region.

### vesicle_stats
- `updated_extract_stats.py`: Extract statistics from point cloud metadata txt format (`metadata_and_kdtree/metadata_and_kdtree.py`), save as a pandas dataframe, and export into spreadsheet format. Also export lists of volumes and diameters as txt files.
- `lv_diameters.py`: Finding diameter-based thresholds for near neuron counts, to be used in `vesicle_counts/pointcloud_near_counts.py`.
- `vesicle_volume_stats.py`: For calculating and exporting vesicle volumes only (not useful if using point cloud metadata format).
- `LUX2_density.py`: Calculating more specific stats for a particular region of interest within the LUX2 neuron.

## Alternate visualization scripts
Currently UNUSED but functional alternate visualization methods, see `vesicleEM/ves_vis` for final visualization methods which are in use.

### neuroglancer_heatmap
- `heatmap.py`: Uses a Gaussian filter to calculate a heatmap image for vesicles within a given neuron, for visualization in Neuroglancer.
- `ng_heatmap_vis.py`: Neuroglancer rendering script to display a full dataset heatmap (from individual neuron heatmap images created using `heatmap.py`), assembled together by projecting into 2D and using coordinate offsets. Normalizing values and using a 4th color channel along with necessary rotations to align files for later color visualization.
- `ng_shader_script.glsl`: Shader script to plug into Neuroglancer visualization in order to render a gradient of colors based on values from 0.0 to 1.0.

### neuroglancer_types_map
- `types_visualization.py`: Generates “color coded” vesicle mask files given segmentation ID to type mappings by relabeling each vesicle to indicate its type.
- `types_ng.py`: Neuroglancer rendering script to display the full dataset of vesicles color coded by type, using previously generated mask files and using the offset feature to align neurons accurately.

### threshold_density_map
- `color_new.py`: Generates a heatmap using a Gaussian filter, then classifies/colors vesicles according to their density value through three thresholds as a simpler method if a continuous heatmap is not necessary.
- `color_new_ng.py`: Neuroglancer rendering script for thresholded heatmaps generated in `color_new.py`.
202 changes: 202 additions & 0 deletions ves_analysis/metadata_and_kdtree/kdTreeMeta.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
#author https://github.com/akgohain

import os
import gc
import math
import h5py
import numpy as np
from tqdm import tqdm
import argparse
from scipy import spatial
from concurrent.futures import ProcessPoolExecutor, as_completed

chunk_size = 25 #25

ds_factor = 1

voxel_dims = (30, 8 * ds_factor, 8 * ds_factor)

sample_dir = '/home/rothmr/hydra/sample/'


#inpute into args to compute for all neurons
names_20 = ["KR4", "KR5", "KR6", "SHL55", "PN3", "LUX2", "SHL20", "KR11", "KR10",
"RGC2", "KM4", "NET12", "NET10", "NET11", "PN7", "SHL18",
"SHL24", "SHL28", "RGC7", "SHL17"]


def process_chunk_range(params):
start, end, file_path, ds_factor, name = params
chunk_stats = {}
with h5py.File(file_path, 'r', swmr=True) as f:
dset = f["main"]

chunk = dset[start:end, ::ds_factor, ::ds_factor]
unique_labels = np.unique(chunk)
for label in unique_labels:
if label == 0:
continue
mask = (chunk == label)
coords = np.argwhere(mask)
if coords.size == 0:
continue
coords[:, 0] += start
sum_coords = coords.sum(axis=0)
count = coords.shape[0]
if label in chunk_stats:
chunk_stats[label]['sum'] += sum_coords
chunk_stats[label]['count'] += count
else:
chunk_stats[label] = {'sum': sum_coords, 'count': count}
del chunk
gc.collect()
return chunk_stats

def process_chunks(file_path, ds_factor, chunk_size, name, num_workers=None):
chunk_stats = {}
with h5py.File(file_path, 'r', swmr=True) as f:
full_shape = f["main"].shape

chunk_ranges = []
for start in range(0, full_shape[0], chunk_size):
end = min(start + chunk_size, full_shape[0])
chunk_ranges.append((start, end, file_path, ds_factor, name))

with ProcessPoolExecutor(max_workers=num_workers) as executor:
futures = [executor.submit(process_chunk_range, params) for params in chunk_ranges]
for future in tqdm(as_completed(futures), total=len(futures), desc="Processing chunks", ncols=100):
result = future.result()
for label, data in result.items():
if label in chunk_stats:
chunk_stats[label]['sum'] += data['sum']
chunk_stats[label]['count'] += data['count']
else:
chunk_stats[label] = data
return chunk_stats


def compute_metadata(chunk_stats, voxel_dims):
"""
Computes the center-of-mass (COM), volume, and radius for each vesicle.
The volume is computed as: voxel_count * (30 * 64 * 64)
The radius (in nm) is computed using the micaela/shulin's method:
sqrt((voxel_count * (64*64)) / pi)
"""
metadata = {}
for label, stats in chunk_stats.items():
com = stats['sum'] / stats['count']
volume_nm = stats['count'] * (voxel_dims[0] * voxel_dims[1] * voxel_dims[2])
radius_nm = math.sqrt((stats['count'] * (voxel_dims[1] * voxel_dims[2])) / math.pi)
metadata[label] = {
'com': com,
'volume_nm': volume_nm,
'radius_nm': radius_nm
}
return metadata

def compute_density(metadata, voxel_dims, kd_radius=500):
"""
Computes local vesicle density using a KDTree.
COM coordinates are converted from voxel indices to physical units (nm) and
then the density is calculated as: frequency / (kd_radius^2)
The density values are also normalized.
"""
com_list = []
labels = []
for label, data in metadata.items():
com = data['com']
# Convert voxel indices to physical coordinates (nm)
physical_com = np.array([
com[0] * voxel_dims[0],
com[1] * voxel_dims[1],
com[2] * voxel_dims[2]
])
com_list.append(physical_com)
labels.append(label)

print("total num of ves: ", len(com_list))
com_array = np.array(com_list)


# Build the KDTree and query neighbors within kd_radius (nm)
tree = spatial.KDTree(com_array)
neighbors = tree.query_ball_tree(tree, kd_radius)
frequency = np.array([len(n) for n in neighbors])
density = frequency / (kd_radius ** 2)

# Normalize the density values
min_density = np.min(density)
max_density = np.max(density)
if max_density > min_density:
normalized_density = (density - min_density) / (max_density - min_density)
else:
normalized_density = density

# Add density info to metadata
for i, label in enumerate(labels):
metadata[label]['density'] = density[i]
metadata[label]['normalized_density'] = normalized_density[i]
return metadata

def process_vesicle_data(name, vesicle_type="lv"):
"""
Processes vesicle data (either 'lv' or 'sv') using sequential chunking.
Computes COM, volume, radius, and density via KDTree.
The metadata is written out to a text file.

Note: Re-enumerates labels to ensure uniqueness across LV and SV datasets.
"""
file_prefix = "vesicle_big_" if vesicle_type == "lv" else "vesicle_small_"
file_path = f"/data/projects/weilab/dataset/hydra/results/{file_prefix}{name}_30-8-8.h5"

#if sample
if(name=="sample"):
if(vesicle_type == "lv"): #should only be lv
file_path = f"{sample_dir}sample_data/7-13_pred_filtered.h5"

print(f"Starting chunked processing for {vesicle_type} data of {name}...")
chunk_stats = process_chunks(file_path, ds_factor, chunk_size, name)
metadata = compute_metadata(chunk_stats, voxel_dims)
metadata = compute_density(metadata, voxel_dims, kd_radius=500)

# Re-enumerate labels to ensure uniqueness by prefixing with vesicle type
unique_metadata = {}
for label, data in metadata.items():
new_label = f"{vesicle_type}_{label}"
unique_metadata[new_label] = data
metadata = unique_metadata

# Ensure output directory exists
output_dir = f"metadata/{name}/"
os.makedirs(output_dir, exist_ok=True)

output_file = f"metadata/{name}/{name}_{vesicle_type}_com_mapping.txt"

if(name=="sample"):
output_file = f"{sample_dir}sample_outputs/sample_com_mapping.txt"

with open(output_file, "w") as f:
for label, data in metadata.items():
f.write(f"{data['com']}: ('{vesicle_type}', {label}, {data['volume_nm']}, {data['radius_nm']}, {data['density']}, {data['normalized_density']})\n")
print(f"Chunked processing complete for {vesicle_type} data of {name}!")
return metadata

if __name__ == "__main__":

parser = argparse.ArgumentParser()
parser.add_argument("--which_neurons", type=str, help="all or sample?") #enter as "all" or "sample"
args = parser.parse_args()
which_neurons = args.which_neurons

if(which_neurons=="sample"):
lv_metadata = process_vesicle_data("sample", vesicle_type="lv")
#only need LV for the near counts example pipeline

elif(which_neurons=="all"):
for name in names_20:
lv_metadata = process_vesicle_data(name, vesicle_type="lv")
sv_metadata = process_vesicle_data(name, vesicle_type="sv")




Loading