Add rnaseqc container. cleanup rseqc section

also cleanup HTML report and added precommit
cokelaer · Dec 13, 2023 · 84ee640 · 84ee640
1 parent c980933
commit 84ee640
Show file tree

Hide file tree

Showing 9 changed files with 263 additions and 243 deletions.
diff --git a/README.rst b/README.rst
@@ -8,7 +8,7 @@
     :alt: JOSS (journal of open source software) DOI
 
 .. image:: https://github.com/sequana/rnaseq/actions/workflows/main.yml/badge.svg
-   :target: https://github.com/sequana/rnaseq/actions/workflows/main.yaml 
+   :target: https://github.com/sequana/rnaseq/actions/workflows/main.yaml
 
 
 
@@ -17,7 +17,7 @@ This is is the **RNA-seq** pipeline from the `Sequana <https://sequana.readthedo
 :Overview: RNASeq analysis from raw data to feature counts
 :Input: A set of Fastq Files and genome reference and annotation.
 :Output: MultiQC and HTML reports, BAM and bigwig files, feature Counts, script to launch differential analysis
-:Status: Production. 
+:Status: Production.
 :Citation(sequana): Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352
 :Citation(pipeline):
     .. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.4047837.svg
@@ -40,13 +40,13 @@ Usage
     sequana_rnaseq --help
     sequana_rnaseq --input-directory DATAPATH --genome-directory genome --aligner star
 
-This creates a directory with the pipeline and configuration file. You will then need 
+This creates a directory with the pipeline and configuration file. You will then need
 to execute the pipeline::
 
     cd rnaseq
     sh rnaseq.sh  # for a local run
 
-This launch a snakemake pipeline. If you are familiar with snakemake, you can 
+This launch a snakemake pipeline. If you are familiar with snakemake, you can
 retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters::
 
     snakemake -s rnaseq.rules -c config.yaml --cores 4 --stats stats.txt
@@ -80,7 +80,7 @@ Or use the conda.yaml file available in this repository. If you start a new
 environment from scratch, those commands will create the environment and install
 all dependencies for you::
 
-    conda create --name sequana_env python 3.7.3 
+    conda create --name sequana_env python 3.7.3
     conda activate sequana_env
     conda install -c anaconda qt pyqt>5
     pip install sequana
@@ -100,22 +100,22 @@ To use apptainer, initialise the pipeline with the --use-singularity option and
 Details
 ~~~~~~~~~
 
-This pipeline runs a **RNA-seq** analysis of sequencing data. It runs in 
-parallel on a set of input FastQ files (paired or not). 
+This pipeline runs a **RNA-seq** analysis of sequencing data. It runs in
+parallel on a set of input FastQ files (paired or not).
 A brief HTML report is produced together with a MultiQC report.
 
 This pipeline is complex and requires some expertise for the interpretation.
-Many online-resources are available and should help you deciphering the output. 
+Many online-resources are available and should help you deciphering the output.
 
 Yet, it should be quite straigtforward to execute it as shown above. The
-pipeline uses bowtie1 to look for ribosomal contamination (rRNA). Then, 
+pipeline uses bowtie1 to look for ribosomal contamination (rRNA). Then,
 it cleans  the data with cutapdat if you say so (your data may already be
-pre-processed). If no adapters are provided (default), reads are 
-trimmed for low quality bases only. Then, mapping is performed with standard mappers such as 
+pre-processed). If no adapters are provided (default), reads are
+trimmed for low quality bases only. Then, mapping is performed with standard mappers such as
 star or bowtie2 (--aligner option). Finally,
 feature counts are extracted from the previously generated BAM files. We guess
 the strand and save the feature counts into the directoy
-./rnadiff/feature_counts. 
+./rnadiff/feature_counts.
 
 The pipelines stops there. However, RNA-seq analysis are followed by a different
 analysis (DGE hereafter). Although the DGE is not part of the pipeline, you can
@@ -138,7 +138,7 @@ Rules and configuration details
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Here is the `latest documented configuration file <https://raw.githubusercontent.com/sequana/sequana_rnaseq/main/sequana_pipelines/rnaseq/config.yaml>`_
-to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file. 
+to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.
 
 
 .. warning:: the RNAseQC rule is switch off and is not currently functional in
@@ -158,6 +158,9 @@ Changelog
 ========= ====================================================================
 Version   Description
 ========= ====================================================================
+0.19.1    * add rnaseqc container.
+          * Update rseqc rules (redirection)
+          * cleanup onsuccess rule
 0.19.0    * Refactorisation to use click
 0.18.1    * fastp multiqc regression. Fixed missing sample names by updating
             multiqc_config and adding sample names in the output filename
@@ -166,28 +169,28 @@ Version   Description
           * BUG: Fix missing params (options) in star_mapping rule not taken
             into account
 0.17.1    * use new rulegraph / graphviz apptainer
-0.17.0    * fastp step changed to use sequana-wrappers. Slight change in 
+0.17.0    * fastp step changed to use sequana-wrappers. Slight change in
             config file. The reverse and forward adapter options called
             rev and fwd have been dropped in favor of a single adapters option.
-            v0.17.0 config and schema are not compatible with previous 
+            v0.17.0 config and schema are not compatible with previous
             versions.
           * Update singularity containers and add new one for fastp
-0.16.1    * fix bug in feature counts automatic strand balance detection. Was 
+0.16.1    * fix bug in feature counts automatic strand balance detection. Was
             always using the stranded case (2).
           * add singularity workflow for testing
           * fix documentation in config.yaml
-0.16.0    * star, salmon, bam_coverage are now in sequana wrappers, updated 
+0.16.0    * star, salmon, bam_coverage are now in sequana wrappers, updated
             the pipeline accordingly
-          * updated config file and schema to include resources inside the 
+          * updated config file and schema to include resources inside the
             config file (so as to use new --profile option)
           * set singularity images in all rules
-          * star wrappers has changed significantly to use star 
+          * star wrappers has changed significantly to use star
             recommandation. To keep using previous way, a legacy option
             is available and set to True in this version.
           * bamCoverage renamed in bam_coverage in the config file
           * multiqc_config removed redundant information and ordered
             the output in a coherent way (QC and then analysis)
-0.15.2    * Fix bowtie2 rule to use new wrappers. Use wrappers in 
+0.15.2    * Fix bowtie2 rule to use new wrappers. Use wrappers in
             add_read_group and mark_duplicates
 0.15.1    * Adapt to new bowtie2 align wrapper
 0.15.0    * fix typo reported in https://github.com/sequana/rnaseq/issues/12
@@ -199,7 +202,7 @@ Version   Description
             same genome directory.
           * Ribosomal is now estimated on the first 100,000 reads to speed up
             analysis
-          * --indexing and --force-indexing  options not required anymore. 
+          * --indexing and --force-indexing  options not required anymore.
             Indexing will be done automatically and not redone if present.
           * Use of the new sequana-wrappers repository
 0.13.0    * Major update to use the new sequana version and the RNADiff tools.
@@ -210,7 +213,7 @@ Version   Description
           * user interface has now a --skip-gff-check option. Better handling of
             input gff with more meaningful messages
           * integration of rseqc tool
-0.12.1    * indexing was always set to True in the config after 0.9.16 update. 
+0.12.1    * indexing was always set to True in the config after 0.9.16 update.
 0.12.0    * BUG fix: Switch mark_duplicates correctly beore feature counts
 0.11.0    * rnadiff one factor is simplified
           * When initiating the pipeline, provide information about the GFF
@@ -226,7 +229,7 @@ Version   Description
             created and used
           * fix the --do-igvtools and --do-bam-coverage with better doc
 0.10.0    * 9/12/2020
-          * Fixed bug in sequana/star_indexing for small genomes (v0.9.7). 
+          * Fixed bug in sequana/star_indexing for small genomes (v0.9.7).
             Changed the rnaseq requirements to benefit from this bug-fix that
             could lead to seg fault with star aligner for small genomes.
           * Report improved with strand guess and plot
@@ -235,32 +238,32 @@ Version   Description
           * In config file, bowtie section 'do' option is removed. This is now
             set automatically if rRNA_feature or rRNA_file is provided. This
             allows us to skip the rRNA mapping entirely if needed.
-          * fastq_screen should be functional. Default behaviour is off. If 
+          * fastq_screen should be functional. Default behaviour is off. If
             set only phiX174 will be search for. Users should build their own
             configuration file.
-          * star/bowtie1/bowtie2 have now their own sub-directories in the 
-            genome directory. 
+          * star/bowtie1/bowtie2 have now their own sub-directories in the
+            genome directory.
           * added --run option to start pipeline automatically (if you know
             what you are doing)
           * rnadiff option has now a default value (one_factor)
           * add strandness plot in the HTML summary page
-0.9.19    * Remove the try/except around tolerance (guess of strandness) to 
+0.9.19    * Remove the try/except around tolerance (guess of strandness) to
             make sure this is provided by the user. Final onsuccess benefits
             from faster GFF function (sequana 0.9.4)
-0.9.18    * Fix typo (regression bug) + add tolerance in schema + generic 
+0.9.18    * Fix typo (regression bug) + add tolerance in schema + generic
             title in multiqc_config. (oct 2020)
 0.9.17    * add the *tolerance* parameter in the feature_counts rule as a user
-            parameter (config and pipeline). 
-0.9.16    * Best feature_counts is now saved into rnadiff/feature_counts 
+            parameter (config and pipeline).
+0.9.16    * Best feature_counts is now saved into rnadiff/feature_counts
             directory and rnadiff scripts have been updated accordingly
           * the most probable feature count option is now computed more
             effectivily and incorporated inside the Snakemake pipeline (not in
-            the onsuccess) so that multiqc picks the best one (not the 3 
+            the onsuccess) so that multiqc picks the best one (not the 3
             results)
           * the target.txt file can be generated inside the pipeline if user
             fill the rnadiff/conditions section in the config file
           * indexing options are filled automatically when calling
-            sequana_rnaseq based on the presence/absence of the index 
+            sequana_rnaseq based on the presence/absence of the index
             of the aligner being used.
           * salmon now integrated and feature counts created (still WIP in
             sequana)
@@ -283,13 +286,13 @@ Version   Description
             analysis
 0.9.11    * Automatic guessing of the strandness of the experiment
 0.9.10    * Fix multiqc for RNAseQC rule
-0.9.9     * Fix RNAseQC rule, which is now available. 
+0.9.9     * Fix RNAseQC rule, which is now available.
           * Fix ability to use existing rRNA file as input
 0.9.8     * Fix indexing for bowtie1 to not be done if aligner is different
           * add new options: --feature-counts-options and --do-rnaseq-qc,
             --rRNA-feature
           * Based on the input GFF, we now check the validity of the rRNA
-            feature and feature counts options to check whether the feature 
+            feature and feature counts options to check whether the feature
             exists in the GFF
           * schema is now used to check the config file values
           * add a data test for testing and documentation
@@ -298,25 +301,25 @@ Version   Description
           * Possiblity to switch off cutadapt section
           * Fixing bowtie2 rule in sequana and update the pipeline accordingly
           * Include a schema file
-          * output-directory parameter renamed into output_directory (multiqc 
+          * output-directory parameter renamed into output_directory (multiqc
             section)
           * handle stdout correctly in fastqc, bowtie1, bowtie2 rules
 0.9.5     * Fixed https://github.com/sequana/sequana/issues/571
           * More cutadapt commands and sanity checks
           * Fixed bowtie2 options import in rnaseq.rules
-0.9.4  
-0.9.3     if a fastq_screen.conf is provided, we switch the fastqc_screen 
+0.9.4
+0.9.3     if a fastq_screen.conf is provided, we switch the fastqc_screen
           section ON automatically
 0.9.0     **Major refactorisation.**
 
-          * remove sartools, kraken rules. 
+          * remove sartools, kraken rules.
           * Indexing is now optional and can be set in the configuration.
           * Configuration file is simplified  with a general section to enter
-            the genome location and aligner. 
+            the genome location and aligner.
           * Fixed rules in  sequana (0.8.0) that were not up-to-date with
             several executables used in the  pipeline including picard,
             fastq_screen, etc. See Sequana Changelog for details with respect
-            to rules changes. 
-          * Copying the feature counts in main directory  ready to use for 
+            to rules changes.
+          * Copying the feature counts in main directory  ready to use for
             a differential analysis.
 ========= ====================================================================
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
 
 [tool.poetry]
 name = "sequana-rnaseq"
-version = "0.19.0"
+version = "0.19.1"
 description = "A RNAseq pipeline from raw reads to feature counts"
 authors = ["Sequana Team"]
 license = "BSD-3"

diff --git a/sequana_pipelines/rnaseq/cluster_config.json b/sequana_pipelines/rnaseq/cluster_config.json
diff --git a/sequana_pipelines/rnaseq/config.yaml b/sequana_pipelines/rnaseq/config.yaml
@@ -26,7 +26,7 @@ apptainers:
   igvtools: "https://zenodo.org/record/7022635/files/igvtools_2.12.0.img"
   graphviz: "https://zenodo.org/record/7928262/files/graphviz_7.0.5.img"
   multiqc: "https://zenodo.org/record/10205070/files/multiqc_1.16.0.img"
-
+  rnaseqc: "https://zenodo.org/record/5799564/files/rnaseqc_2.35.0.img"
 
 # =========================================== Sections for the users
 
@@ -370,6 +370,8 @@ rnaseqc:
     do: false
     gtf_file:
     options: --coverage
+    resources:
+        mem: 8G
 
 
 # if be_file not provided, try to create one on the fly