DrjBreakpointFinder is a pipeline adapted from the software Cassis to discover Direct Repeat Junctions (DRJ) of proviral segments and to precisely localize the breakpoint or excision site inside the DRJ sequences.
DrjBreakpointFinder is a Genscale and BIPAA tool, developed by:
- Stéphanie Robin
- Claire Lemaitre
git clone git@github.com:stephanierobin/DrjBreakpointFinder.git
- perl
- R
- bash
- BLAST
- clustalw
- emboss
To activate conda environments :
conda activate myenv
sh pipeline.sh -r reads.fa -g genome.fa -i input_directory -o output_directory
Example with the small test dataset:
sh pipeline.sh -r test/reads.fasta -g test/genome.fasta -i test -o test/drj
reads.fasta
: a fasta file containing reads from virus sequencing (circular form of the proviral segments).genome.fasta
: a fasta file containing the host genome sequence.
An output directory containing :
- Seven subdirectories :
blast
: Megablast resultsreadLength
: table containing the lengths of readsbreakpoint
: mismatch vectors, for each read-bac tripletdrjPairs_alignments
: mismatch vectors, for each pair of DRJsdrjPairs_all_segments
: intermediate resultsdrjPairs_figures
: figures summarizing results, for each pair of DRJsdrjPairs_merged_segments
: intermediate results
- a file
drjPairs_confirmed.tab
containing the confirmed DRJ pairs coordinates, and the number of reads which allowed identification of these DRJ pairs.
If you use DrjBreakpointFinder, please cite:
Legeai F., Santos B.F., Robin S., Bretaudeau A., Dikow R.B., Lemaitre C., Jouan V., Ravallec M., Drezen J-M., Tagu D., Gyapay G., Zhou X., Liu Shanlin, Webb B.A., Brady S.G., and Volkoff A-N. 2019. Conserved and specific genomic features of endogenous polydnaviruses revealed by whole genome sequencing of two ichneumonid wasps. Preprint on BioRxiv: https://www.biorxiv.org/content/10.1101/861310v1.
DrjBreakpointFinder is inspired from the software Cassis, in particular for the precise identification of the breakpoint (or excision site) location, for more details on the algorithms, see:
Lemaitre C., Tannier E., Gautier C., Sagot M.-F.. Precise detection of rearrangement breakpoints in mammalian genomes. BMC Bioinformatics, 2008 9(1):286.
Baudet C., Lemaitre C., Dias Z., Gautier C., Tannier E. and Sagot M-F. 2010. Cassis: Detection of genomic rearrangement breakpoints. Bioinformatics, 2010 26(15):1897-1898.
To contact a developer, request help, or for any feedback on DrjBreakpointFinder, please use the issue form of github: https://github.com/stephanierobin/DrjBreakpointFinder/issues