Skip to content

Genentech/barcall

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BarCall

A deep-learning approach to spot detection and base-calling to identify sgRNA barcodes in OPS data. This repository includes the contents to train and evaluate this model. It also includes contents for a paper on this project. Evaluations are in comparison to SCALLOPS and Yuan Gao's internship where she benchmarked open-source spot-calling methods.

Note that anywhere in the code that says "Oracle" refers to hard BAD loss in the manuscript.

Requirements

pip install filelock pytorch_lightning torch kornia pandas tqdm "numpy<2"

Additional requirement for barcall/create_new_bc_dataset.py:

pip install https://github.com/Genentech/scallops.git

Usage

  • To run BarCall inference or training:

    • ISS images must be saved as a memmap, segmentation of cells must be saved as a memmap , and mean and std of the ISS images must be saved. These functions are all defined in in preprocess_image_files.py, and an example of running them on the snippet of PERISCOPE zarr files is included in mini_tutorial.ipynb
    • For training, a dataset must be used. spots_meta.parquet and l1_matching_unamimous_cliques.parquet are included here. The former utilizes the output of DeepBase as new basecalling labels, and l1_matching_unamimous_cliques.parquet uses Scallops as basecalling labels.
    • For inference, checkpoints for BarCall with hard and soft BAD loss, respectively, are provided in checkpoints/
    • BarCall inference can then either be run with barcall/5nm.sh or in the barcall/barcall_inference.ipynb
    • If running inference or training on a plate other than A, you need to overwrite the MEANS, STDS with the well_stats from preprocess_image_files.py (example of this can be seen in barcall_inference.ipynb for plate C)
  • For a DeepBase model

    • To train, run basecalling_model/train_bc.py, with any of the parameters at the top. It similarly requires a memmap of the ISS image, but not of the cell map nor the well stats. It uses the same l1_matching_unamimous_cliques.parquet dataset files as BarCall. Default parameters for DeepBase in train_deepbase.sh
    • To use for inference, see basecalling_model/deep_base_eval.ipynb. This saves a .parquet with the barcode label for each spot provided to the model
  • The generation of Table 1 is in table_3_1.ipynb

  • The generation of Tables 2, 3, and 4, as well as Figure 3 is all in spot_finding/barcall_evaluation.ipynb.

  • The generation of the .parquet files referenced in the manuscript are in spot_finding/barcall_inference.ipynb and basecalling/barcall_inference.ipynb.

Tutorial

See mini_tutorial.ipynb. This operates on a tiny crop of an ISS image from PERISCOPE for the ease of storing and running quickly.

Authors and acknowledgments

Thanks to Joshua Gould, Sergio Hleap, Bo Li, Yuan Gao, Monica Ge, Avtar Sing, Amy Chuong, and David Richmond for their leadership, support, advice, and contributions to this project.

Model Checkpoints

See spot_basecalling_model/checkpoints and basecalling_model/checkpoints

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages