Skip to content

Conversation

@eweitz
Copy link
Member

@eweitz eweitz commented Sep 7, 2023

This helps enable authors to specify a wider variety of size and significance metrics for their differential expression data.

It also improves robustness for delimiter detection, and error handling for formats that aren't yet implemented. It integrates with the Core updates over in broadinstitute/single_cell_portal_core#1874.

Automated tests have been updated to verify these changes. To manually test:

  • Run:

python ingest_pipeline.py --study-id addedfeed000000000000000 --study-file-id dec0dedfeed1111111111111 ingest_differential_expression --annotation-name General_Celltype --annotation-type group --annotation-scope study --cluster-name cluster_umap_txt --study-accession SCPdev --ingest-differential-expression --differential-expression-file ../tests/data/author_de/pval_lfc_pvaladj_seurat-like.tsv --method wilcoxon --size-metric avg_log2FC --significance-metric p_val_adj

  • Run:

head -n 2 cluster_umap_txt--General_Celltype--CSN1S1_macrophages--eosinophils--study--wilcoxon.tsv

  • Confirm output:
	genes	avg_log2FC	p_val_adj	pct.2	pct.1	p_val	cluster
0	ACE2	0.02746239327	0.1815101221	0.6656237486	0.3343762514	0.1815101221	3

This satisfies SCP-5241.

@codecov
Copy link

codecov bot commented Sep 7, 2023

Codecov Report

Merging #321 (d8af4dc) into development (97a06dd) will decrease coverage by 0.05%.
Report is 2 commits behind head on development.
The diff coverage is 80.70%.

❗ Current head d8af4dc differs from pull request most recent head ede5bf3. Consider uploading reports for the commit ede5bf3 to get more accurate results

Impacted file tree graph

@@               Coverage Diff               @@
##           development     #321      +/-   ##
===============================================
- Coverage        73.77%   73.72%   -0.05%     
===============================================
  Files               31       30       -1     
  Lines             4129     4141      +12     
===============================================
+ Hits              3046     3053       +7     
- Misses            1083     1088       +5     
Files Changed Coverage
ingest/ingest_pipeline.py ø
ingest/author_de.py 80.00%
ingest/cli_parser.py 100.00%

@eweitz eweitz marked this pull request as ready for review September 7, 2023 18:59
@eweitz eweitz changed the title Parameterize size and significance metrics, improve robustness Parameterize size and significance metrics, improve robustness (SCP-5241) Sep 7, 2023
Copy link
Contributor

@bistline bistline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - do we have associated tickets to update the author DE forms & Rails/Life Sciences API integration to collect and pass these values?

else:
delimiter = ','
data = pd.read_csv(file_path, delimiter)
# sep=None invokes detecting separator via csv.Sniffer, per
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice

@eweitz
Copy link
Member Author

eweitz commented Sep 7, 2023

do we have associated tickets to update the author DE forms & Rails/Life Sciences API integration to collect and pass these values?

Good question -- I'm handling that as part of the ticket for https://github.com/broadinstitute/single_cell_portal_core/pull/1874/files. I just got a basic Core-Ingest integration passing.

@bistline
Copy link
Contributor

Seeing the Codecov complaint makes me think we should apply the same logic from broadinstitute/single_cell_portal_core#1709 to this repository.

@eweitz
Copy link
Member Author

eweitz commented Sep 12, 2023

Good call -- done! Having the same coverage checks across repos makes sense.

Copy link
Contributor

@jlchang jlchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual test performs as described. Minor nit, suggestion to clarify comment for future me.

Approving :)

comparison_metrics = sorted(comparison_metrics)

# Put qval first
# Rank significance 1st (ultimately ranked 2nd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - Was a little confused by the wording... suggesting:
"Arrange significance in expected order (ultimately ranked 2nd)"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in ede5bf3!

)

# Put logfoldchanges first
# Rank size 1st (ultimately ranked 1st)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - to parallel above suggestion (but the original comment was not confusing so use whatever wording feels natural):
"Arrange size in expected order (ultimately ranked first)"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: ede5bf3

@eweitz eweitz merged commit 68a6fb4 into development Sep 15, 2023
@eweitz eweitz deleted the ew-parameterize-size-significance branch September 15, 2023 15:58
@eweitz eweitz mentioned this pull request Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants