automating DCC circular RNA detection
-
dcc_starter.pl runs the whole DCC pipeline for one sample (two reads, samplename); matrix has to be created manually after it finished.
-
auto_automaker.pl runs DCC for how many lines are in the input file for it, summarizes for each group and all samples into a matrix.
-
let find_circ/godfather.pl handle it all, see find_circ/godfather.pl
- shows the DCC commands that have benn automated with this tool here. however, you can change those if you prefer other filtering parameters
- can be ignored otherwise
- STAR Aligner installed
- DCC installed
- mapping files for annotation (see auto_find_circ for links and instructions + you will need the STAR hg19 reference genome and in .gtf format)
- two reads for each sample, a samplename and a groupname for auto_automaker.pl
- copy the samplesheet into the parent dir
- go into the dir where these scripts are
- perl auto_automaker.pl samplesheet (auto_automaker looks for a samplesheet in the parent directory, where all the infiles need to be aswell)
- all dirs and files in these dirs will be created in the parent directory of the scripts, the script- containing dir will have no additional files after execution.
- the two input .fastq files will be deleted after the run was finished. so keep a copy somewhere else (auto_find_circ does not delete them, so maybe there)
~ head infiles_for_auto_automaker.txt
lineonefile1 linetwofile1 samplename1 group1
lineonefile2 linetwofile2 samplename2 group1
the group will lead to auto_automaker making a directory named after the group where all the resulting .csv files will be copied into, catted into one big .csv file and then run matrixmaker.pl with this as an input and then start matrixtwo.pl with this as an input
expect about ~ 20 minutes runtime per sample with 10 CPUs
- for each sample, it creates a directory where all outputfiles from dcc and the three(!) STAR alignemnts will be
- additionally if you utilize groups it copies the .tsv files from dcc_outreader.pl into there and creates a matrixmaker.pl matrix
- and all processed.csv files will also get copied into the Day_Month dir created in the parent dir and two matrices will be created that contaion information fromm al samples in the samplesheet