Allow plotting selection metrics with partial data availability, for all species, and add shell injection protection#393
Conversation
- Modified workflows/deepcsa.nf to allow plotting when ANY selection method is available (changed && to ||) - Updated plot_selectionsideplots.py to check file existence before loading data - Added graceful handling of missing data files with informative warnings - Script now dynamically determines which tracks to plot based on available data - No plots generated if no data files are found Co-authored-by: FerriolCalvet <38539786+FerriolCalvet@users.noreply.github.com>
- Added check in plot_all_positive_selection to handle n_tracks == 0 - Added check to avoid saving figure when None is returned - Improved robustness when no data is available Co-authored-by: FerriolCalvet <38539786+FerriolCalvet@users.noreply.github.com>
|
can you remove the requirement of only plotting whenever the species is homo_sapiens? |
There was a problem hiding this comment.
Looks good, I would like to test it and then give some more feedback
can you build some unit tests for this step so that we can test the different scenarios?
There was a problem hiding this comment.
Added comprehensive unit tests in commit 84470d7. Created bin/test_plot_selectionsideplots.py with tests covering:
- File existence checks
- Dynamic track selection
- All data scenarios (all files, some files, no files)
- Backward compatibility with all methods
- Edge cases (empty gene lists, missing MAF)
- Partial data handling
Run with: python3 bin/test_plot_selectionsideplots.py
|
@copilot apply the suggested changes |
…MAPPING - Removed homo_sapiens check from plotting condition (line 548) - Removed homo_sapiens check from DNA2PROTEINMAPPING (line 232) - DNA2PROTEINMAPPING now runs for all species - Added comprehensive unit tests for plot_selectionsideplots.py - Tests cover file existence checks, dynamic track selection, and edge cases Co-authored-by: FerriolCalvet <38539786+FerriolCalvet@users.noreply.github.com>
Changes applied in commit 84470d7:
|
- mutdensityqc works without requiring omega - integrated plotting for saturation of subgenic regions
- works when not all data is available
There was a problem hiding this comment.
Pull request overview
This PR enables plotting of selection metrics with partial data availability and removes species restrictions that previously limited plotting to homo_sapiens only. The changes allow the pipeline to gracefully handle scenarios where only some selection methods (omega, oncodrive3d, oncodrivefml, indels) are enabled, rather than requiring all four methods.
Key Changes:
- Changed workflow condition from AND to OR logic for plotting triggers - now runs when any selection method is available
- Removed homo_sapiens species restriction from plotting workflows
- Added dynamic file existence checks and track selection in plotting scripts
- Fixed bug where omega_missense_genes incorrectly referenced omega_truncating data
- Added comprehensive unit tests for partial data scenarios
Reviewed changes
Copilot reviewed 5 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| workflows/deepcsa.nf | Changed plotting condition from requiring all four methods to requiring any method; removed species restriction; added empty channel initializations for all_compiled_omegas and site_comparison_results |
| subworkflows/local/plottingsummary/main.nf | Added new inputs (all_mutations, expanded_panel) and new PLOTSATURATIONPROPORTIONS process call |
| modules/local/plot/saturation/main.nf | Made output optional to handle cases with no data |
| modules/local/plot/saturation/proportions/main.nf | New module for plotting saturation proportions |
| bin/plot_selectionsideplots.py | Added os.path.exists() checks for all data files; implemented dynamic track building; added None checks in plot_all_positive_selection; fixed omega_missense_genes bug |
| bin/saturation_in_genes.py | New script for generating gene saturation plots with multiple grouping modes |
| bin/plot_gene_saturation.py | Added file existence checks for omega, site selection, and oncodrive3d files; improved track list management; fixed groupby().apply() deprecation |
| bin/test_plot_selectionsideplots.py | New comprehensive test suite with 15+ test cases covering file existence, dynamic tracks, and partial data scenarios |
| nextflow_schema.json | Removed default value for use_custom_depths boolean; added custom_depths_table_usage to schema definitions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Enhanced sample name validation in check_samplesheet.py - Only allow alphanumeric characters, underscores, hyphens, and dots - Prevent sample names starting with hyphen (flag injection) - Block shell metacharacters: semicolons, dollar signs, backticks, pipes, ampersands, redirects, quotes, slashes - Added comprehensive test suite (15 tests) in test_check_samplesheet.py - Prevents command injection attacks via malicious sample names in samplesheet Co-authored-by: FerriolCalvet <38539786+FerriolCalvet@users.noreply.github.com>
FerriolCalvet
left a comment
There was a problem hiding this comment.
all looks good and is tested!
The plotting pipeline required all four selection methods (omega, oncodrive3d, oncodrivefml, indels) to be enabled and was restricted to homo_sapiens only. If any methods were disabled/missing or a different species was used, no plots were generated. Additionally, sample names from the samplesheet were not validated, creating a potential shell injection vulnerability.
Changes
Workflow condition (
workflows/deepcsa.nf)params.omega || params.oncodrive3d || params.oncodrivefml || params.indelsvep_species == 'homo_sapiens'check - plotting now works for all speciesDNA2PROTEINMAPPING (
workflows/deepcsa.nf)Dynamic track selection (
bin/plot_selectionsideplots.py)os.path.exists()for all data files)plot_all_positive_selection()Security: Shell injection protection (
bin/check_samplesheet.py);,$,`,|,&,>,<,(,),/, quotes, etc.-Unit tests
bin/test_plot_selectionsideplots.py: 15+ test cases for plotting functionalitypython3 bin/test_plot_selectionsideplots.pybin/test_check_samplesheet.py: 15 security test cases for samplesheet validationpython3 bin/test_check_samplesheet.pyBug fix
omega_missense_geneswas incorrectly referencingomega_truncating["GENE"]Species-Specific Analysis
Restrictions Removed:
Restrictions Kept (valid reasons):
Behavior
Security Impact
sample;rm -rf /)Backward compatible - existing pipelines with all methods enabled and valid sample names work identically.
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.