Skip to content

Conversation

@jlchang
Copy link
Contributor

@jlchang jlchang commented Jun 20, 2022

This PR modifies DE to only convert empty cells to --Unspecified-- and leaves NA strings as-is so they are faithful to values in the metadata file.

Background:
Currently N/A-like strings as annotation labels currently will not display DE results. Pandas detects empty cells as numeric NaN which is problematic for group annotations so DE converts those values to --Unspecified-- as the DE annotation label. Unfortunately, pandas also converts a set of default NaN recognized values (like N/A). Since we construct the file name for data retrieval from annotationList values - if the value is one of the "pandas default NaN recognized values", the value in the filename is --Unspecified-- instead of N/A so the file is not found. ().

Manual test:
activate the scp-ingest-pipeline repo virtualenv
then from the scripts directory of the scp-ingest-pipeline repo, perform this setup:

source ../scripts/setup_mongo_dev.sh
unset BARD_HOST_URL

Run the DE job from the ingest directory of the scp-ingest-pipeline repo:
python ingest_pipeline.py --study-id addedfeed000000000000000 --study-file-id dec0dedfeed1111111111111 differential_expression --annotation-name cell_type__ontology_label --annotation-type group --annotation-scope study --matrix-file-path ../tests/data/differential_expression/de_dense_matrix.tsv --matrix-file-type dense --annotation-file ../tests/data/differential_expression/de_dense_metadata_na.txt --cluster-file ../tests/data/differential_expression/de_dense_cluster.tsv --cluster-name de_na --study-accession SCPna --differential-expression

confirm that the job runs successfully and the output files have the expected filenames:

de_na--cell_type__ontology_label--N_A--study--wilcoxon.tsv
de_na--cell_type__ontology_label--NaN--study--wilcoxon.tsv
de_na--cell_type__ontology_label--__Unspecified__--study--wilcoxon.tsv
de_na--cell_type__ontology_label--cholinergic_neuron--study--wilcoxon.tsv
de_na--cell_type__ontology_label--cranial_somatomotor_neuron--study--wilcoxon.tsv
de_na--cell_type__ontology_label--null--study--wilcoxon.tsv
de_na--cell_type__ontology_label--pyramidal_neuron--study--wilcoxon.tsv
de_na--cell_type__ontology_label--somatomotor_neuron--study--wilcoxon.tsv
de_na--cell_type__ontology_label--sympathetic_cholinergic_neuron--study--wilcoxon.tsv

This PR satisfies SCP-4436.

@jlchang jlchang requested review from bistline, ehanna4 and eweitz June 20, 2022 23:16
@jlchang jlchang merged commit 27e55ae into development Jun 21, 2022
@jlchang jlchang deleted the jlc_handle_de_na branch June 21, 2022 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants