circDetector (CD) : circular RNAs detection and annotation

Description:

circDetector (CD) is a computational tool for detecting and annotation of circular RNAs (circRNAs) from Total RNA-Seq data. This tool is implemented with the Snakemake workflow management system allowing reproducible and scalable data analyses. CD was developped to identify circRNAs from reads mapped to the reference genome with the STAR tool (Spliced Transcripts Alignment to a Reference) in Single-End (SE). It consists of identifying reads with a circular junction (chimeric reads) from the chimeric.out.junction files provided by STAR. CD also provides a file reporting all statistics of STAR-SE mapping. Mapping informations were extracted from the STAR file Log.final.out.

Note: The documentation of the CD tool is in progress.

  -r1 R1_INPUT_FILE, --r1_input_file R1_INPUT_FILE
                        R1 input file name
  -r2 R2_INPUT_FILE, --r2_input_file R2_INPUT_FILE
                        R2 input file name
  -min_cr CR_THRESHOLD, --cr_threshold CR_THRESHOLD
                        Minimum number of chimeric reads
  -tol TOLERANCE, --tolerance TOLERANCE
                        Tolerance of different positions to start and end
  -fmt {bed,gtf}, --output_file_format {bed,gtf}
                        Output file format : 0-based (bed) or 1-based (gtf)
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
                        Output file path
  --verbose             Print more info

Example of a command executed by Snakemake:

python3 scripts/circRNA_detection.py -r1 -r2 -o circ_rnas.bed

Rule mappingstat:

CD provides a file reporting all statistics of STAR-SE mapping. Mapping informations were extracted from the STAR file Log.final.out.

Input: Example of a Log.final.out file:

Started job on |       Oct 26 05:21:11
                            Started mapping on |       Oct 26 05:29:19
                                   Finished on |       Oct 26 05:52:39
      Mapping speed, Million of reads per hour |       121.89

                         Number of input reads |       47400458
                     Average input read length |       150
                                   UNIQUE READS:
                  Uniquely mapped reads number |       33941471
                       Uniquely mapped reads % |       71.61%
                         Average mapped length |       149.32
                      Number of splices: Total |       8294615
           Number of splices: Annotated (sjdb) |       7882009
                      Number of splices: GT/AG |       8204363
                      Number of splices: GC/AG |       58632
                      Number of splices: AT/AC |       5362
              Number of splices: Non-canonical |       26258
                     Mismatch rate per base, % |       0.57%
                        Deletion rate per base |       0.02%
                       Deletion average length |       2.11
                       Insertion rate per base |       0.02%
                      Insertion average length |       1.74
                            MULTI-MAPPING READS:
       Number of reads mapped to multiple loci |       3381777
            % of reads mapped to multiple loci |       7.13%
       Number of reads mapped to too many loci |       28860
            % of reads mapped to too many loci |       0.06%
                                 UNMAPPED READS:
 Number of reads unmapped: too many mismatches |       0
      % of reads unmapped: too many mismatches |       0.00%
           Number of reads unmapped: too short |       9943085
                % of reads unmapped: too short |       20.98%
               Number of reads unmapped: other |       18180
                    % of reads unmapped: other |       0.04%
                                 CHIMERIC READS:
                      Number of chimeric reads |       129065
                           % of chimeric reads |       0.27%

Output: reports/mapping_stat.tsv is a tabulated file containing all the informations of the input file (column) for each sample (row).

Rule annotation:

-circ CIRC_RNA_FILE, --circ_rna_file CIRC_RNA_FILE
                        Circular RNA file path
  -fmt {bed,gtf}, --annot_format {bed,gtf}
                        Annotation file format
  -annot ANNOTATION_FILE, --annotation_file ANNOTATION_FILE
                        Annotation file path
  -oe EXONIC_OUTPUT_FILE, --exonic_output_file EXONIC_OUTPUT_FILE
                        Exonic output file path
  -oi INTRONIC_OUTPUT_FILE, --intronic_output_file INTRONIC_OUTPUT_FILE
                        Intronic output file path
  -o OUTPUT, --output OUTPUT
                        Annotation output file path
  --verbose             Print more info

Example of a command executed by Snakemake:

python3 scripts/circRNA_annotation.py -circ circ_rnas.bed -annot -o annotation_circRNAs.tsv

Note: Manual post-processing of data is required to remove short circRNAs.

Example Commands:

Launching the pipeline locally:

snakemake -p --jobs 8

Launching the pipeline on the Genotoul cluster:

sbatch circ_rnas.sh

Testing the detection rule locally for a single sample:

snakemake pig_testis_neonat-1/auzeville.bed -p --cores 1

Note: Make sure the circrnaenv environment is in this folder.

Contact & Support

For any bug report, feature request, or questions please fill out an issue through circRNA_project's issue page.

Copyright and License

This software is released under GNU General Public License (v3.0).

Name		Name	Last commit message	Last commit date
Latest commit History 255 Commits
biblio		biblio
scripts		scripts
testdata		testdata
AnnotationRules.md		AnnotationRules.md
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
circ_rnas.sh		circ_rnas.sh
config.yaml		config.yaml
metadata.tsv		metadata.tsv
workflow.svg		workflow.svg
workflow_CD.png		workflow_CD.png

Column	Type	Description
1	string	species (bos_taurus, sus_scrofa)
2	string	tissue (testis, liver)
3	string	sex (male, female)
4	string	sample (single, orient)
5	string	sample_unit (sample uniq name)
6	string	animal_name (specific name for each individual)
7	string	mapdir (file path to mapping files)

Column	Type	Description
1	string	sample (concatenation of species, tissue and animal_name)
2	string	sample_unit (sample uniq name)
3	string	species (bos_taurus = cow, sus_scrofa = pig)
4	string	sex (male, female)
5	string	mapdir (file path to mapping files)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

circDetector (CD) : circular RNAs detection and annotation

Description:

Table of contents:

Installation:

Compilation and configuration:

Required tools:

Setting the environment:

Preparing the config file:

Preparing the sample file:

Snakemake worflow:

Rule detection:

Rule mappingstat:

Rule annotation:

Example Commands:

Contact & Support

Copyright and License

About

Releases

Packages

Contributors 3

Languages

License

ccerutti88/circDetector

Folders and files

Latest commit

History

Repository files navigation

circDetector (CD) : circular RNAs detection and annotation

Description:

Table of contents:

Installation:

Compilation and configuration:

Required tools:

Setting the environment:

Preparing the config file:

Preparing the sample file:

Snakemake worflow:

Rule detection:

Rule mappingstat:

Rule annotation:

Example Commands:

Contact & Support

Copyright and License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages