A deep-learning approach to spot detection and base-calling to identify sgRNA barcodes in OPS data. This repository includes the contents to train and evaluate this model. It also includes contents for a paper on this project. Evaluations are in comparison to SCALLOPS and Yuan Gao's internship where she benchmarked open-source spot-calling methods.
Note that anywhere in the code that says "Oracle" refers to hard BAD loss in the manuscript.
pip install filelock pytorch_lightning torch kornia pandas tqdm "numpy<2"
Additional requirement for barcall/create_new_bc_dataset.py:
pip install https://github.com/Genentech/scallops.git
-
To run BarCall inference or training:
- ISS images must be saved as a memmap, segmentation of cells must be saved as a memmap , and mean and std of the ISS images must be saved. These functions are all defined in in
preprocess_image_files.py, and an example of running them on the snippet of PERISCOPE zarr files is included in mini_tutorial.ipynb - For training, a dataset must be used.
spots_meta.parquetandl1_matching_unamimous_cliques.parquetare included here. The former utilizes the output of DeepBase as new basecalling labels, and l1_matching_unamimous_cliques.parquet uses Scallops as basecalling labels. - For inference, checkpoints for BarCall with hard and soft BAD loss, respectively, are provided in checkpoints/
- BarCall inference can then either be run with
barcall/5nm.shor in thebarcall/barcall_inference.ipynb - If running inference or training on a plate other than A, you need to overwrite the MEANS, STDS with the well_stats from preprocess_image_files.py (example of this can be seen in barcall_inference.ipynb for plate C)
- ISS images must be saved as a memmap, segmentation of cells must be saved as a memmap , and mean and std of the ISS images must be saved. These functions are all defined in in
-
For a DeepBase model
- To train, run basecalling_model/train_bc.py, with any of the parameters at the top. It similarly requires a memmap of the ISS image, but not of the cell map nor the well stats. It uses the same
l1_matching_unamimous_cliques.parquetdataset files as BarCall. Default parameters for DeepBase in train_deepbase.sh - To use for inference, see
basecalling_model/deep_base_eval.ipynb. This saves a .parquet with the barcode label for each spot provided to the model
- To train, run basecalling_model/train_bc.py, with any of the parameters at the top. It similarly requires a memmap of the ISS image, but not of the cell map nor the well stats. It uses the same
-
The generation of Table 1 is in table_3_1.ipynb
-
The generation of Tables 2, 3, and 4, as well as Figure 3 is all in spot_finding/barcall_evaluation.ipynb.
-
The generation of the .parquet files referenced in the manuscript are in spot_finding/barcall_inference.ipynb and basecalling/barcall_inference.ipynb.
See mini_tutorial.ipynb. This operates on a tiny crop of an ISS image from PERISCOPE for the ease of storing and running quickly.
Thanks to Joshua Gould, Sergio Hleap, Bo Li, Yuan Gao, Monica Ge, Avtar Sing, Amy Chuong, and David Richmond for their leadership, support, advice, and contributions to this project.
See spot_basecalling_model/checkpoints and basecalling_model/checkpoints