Improve matrix ingest error reporting (SCP-3832) #220
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Generating dense expression matrix files incorrectly can incorporate dataframe row indices in the matrix file. When this user error occurs, the row indices become the first column of the expression matrix and gene names end up in the second column, where gene expression values are expected, causing ingest to fail.
This PR improves upon the existing error message by passing along the offending value for easier troubleshooting.
To test, set your local instance "Ingest Pipeline Docker Image" configuration to use
gcr.io/broad-singlecellportal-staging/scp-ingest-jlc_improve_error_msg:6b54ecaand upload processed matrix, using the dense matrix the file found at:
gs://fc-2f8ef4c0-b7eb-44b1-96fe-a07f0ea9a982/test_Data/ingest_manual_test/non-numeric_dense_value.csvThe errror in the email notification should state
OR
run ingest_pipeline locally - you'll need both a valid study-id and a valid study-file-id
(recommendation: create a designated local study for such tests, upload a small text file as a file of type "other" and use its study-file-id)
python ingest_pipeline.py --study-id <your local test study-id> --study-file-id <your local test study-file-id> ingest_expression --matrix-file gs://fc-2f8ef4c0-b7eb-44b1-96fe-a07f0ea9a982/test_Data/ingest_manual_test/non-numeric_dense_value.csv --matrix-file-type dense --taxon-name Homo sapiens --taxon-common-name human --ncbi-taxid 9606The resulting user_log.txt file should contain:
This addresses SCP-3832