Adding script to generate data for image pipeline (SCP-4584) #269

bistline · 2022-08-30T21:35:34Z

BACKGROUND

Work on image pipeline can continue much quicker if rather than reading expression data from the API, it can read from pre-rendered data artifacts that represent gene-level expression data for a given matrix/cluster combination. This reduced load on the portal and drastically speeds up rendering images.

CHANGES

This adds render_expression_arrays.py, a scratch script that will take a given cluster and matrix (dense or sparse with features/barcodes files) and write out optimized & compressed JSON arrays of the resulting expression from the matrix, filtered through the list of cells from the cluster. This mimics the expression attribute on expression visualization responses, including interpolating non-existent 0 values from sparse matrix files). These arrays can then be read directly by Image Pipeline, or even the Plotly front-end in an instance of SCP.

MANUAL TESTING

Pull branch and navigate to the scripts/scratch_ingest directory
Run the following Python:

python3 render_expression_arrays.py --matrix-file ../../tests/data/dense_expression_matrix.txt \
                                    --cluster-file ../../tests/data/cluster_example.txt \
                                    --cluster-name 'Dense Example' --precision 1

You should see output in the terminal like the following:

creating data directory at Dense_Example-1f6c26e7-5c6c-455b-b32c-d86e24166b08
using 1 digits of precision for non-zero data
reading ../../tests/data/dense_expression_matrix.txt as dense matrix
writing Itm2a data... 0.0s
writing Sergef data... 0.0s
completed, total runtime in minutes: 5e-05

In the Dense_Example data directory, validate that there are two output files
Open Dense_Example--Sergef.json.gz in a text editor, and validate that the non-zero expression values have only 1 digit of precision, and the 0 values are all integers
(Optional) Run the other example usage scripts from render_expression_arrays.py and confirm they all execute

codecov · 2022-08-30T21:41:23Z

Codecov Report

Base: 65.02% // Head: 65.02% // No change to project coverage 👍

Coverage data is based on head (cbef530) compared to base (688c4b4).
Patch has no changes to coverable lines.

Additional details and impacted files

@@             Coverage Diff              @@
##           development     #269   +/-   ##
============================================
  Coverage        65.02%   65.02%           
============================================
  Files               28       28           
  Lines             3714     3714           
============================================
  Hits              2415     2415           
  Misses            1299     1299

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

eweitz

Code looks good! I suggest small maintainability refinements, definitely no blockers at this early prototyping stage.

eweitz · 2022-08-31T13:14:22Z