Skip to content

TrabucchiLab/miRBShunter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

miRBShunter

A pipeline to identify miRNA binding sites from Ago2-CLIP data by de novo motif finding.

Please always use the last release!

*** Requirements. ***

Required Python libraries are:

  • Bio.Seq

  • wx (for the GUI)

Required programs installed and available in the path:

Genome sequences in fasta files are required as well. You can download them from UCSC web site or you can find them in Homer folder.

*** Installation. ***

Just simply untar the package in any destination folder:

     tar zxvf miRBShunter_v1.0.tar ./

You'll find four folders:

./miRBShunter_scripts: contains all scripts included in the pipeline

./example_data: contains input files needed to run the pipeline

./example_temp and ./example_output: folders to be used to run the pipeline on the example files

*** Using the miRBShunter pipeline. ***

miRBShunter could be run either via command line (shell) or using the user-friendly GUI. To run the GUI easily type:

     python miRBShunter_GUI.py

Before running the GUI, the users may want to check if the library wx is already installed.

Or you can use the command line script miRBShunter_script.py.

Input files:

peaks bed file: Coordinates of peaks found by peak calling tool. The file must contain 6 columns (chr, st, end, pval, reads, strand).

miRNA sequences file: a file with the most expressed miRNA in fasta format (DNA encoded mandatory). Alphabet to use: A,C,G,T.

Examples files are provided in the example_data folder.

     usage: python ./miRBShunter_script.py [options] [peaksbedfile] [mirnafastafile]

Options:

      -h, --help            show this help message and exit

      -o OUTDIR, --outdir=OUTDIR
                            directory for output files. Default is current directory.
      
      -t TEMPDIR, --tempdir=TEMPDIR
                            directory for temporary files. Default is current directory.
                            
      -d GENOMEFASTADIR, --genomefastadir=GENOMEFASTADIR
                            directory with genome fasta file. Default is current directory.
                            
      -b FASTABG, --fastaBG=FASTABG
                            file with background sequences in fasta format. If file is not provided, sequence scrumble will be done. Check Homer parameters for more details.
      
      -p PVALUE, --pvalue=PVALUE
                            pvalue threshold for peaks selection. Default is 0.001.
                            
      -r READS, --reads=READS
                            minimum number of reads for peaks selection. Default is 5.
                            
      -g GENOME, --genome=GENOME
                            Genome of the organism, either mm10 or hg19. Default mm10.
      
      -f PREFIX, --prefix=PREFIX
                            Prefix name for output files. Default is test.

Example procedure:

      python ./miRBShunter_script.py -o example_output/ -t example_temp/ -p 0.1 -r 2 -d /home/silvia/programs/homer/data/genomes/mm10/ -g mm10 -f test -b ./example_data/scrambleBg.fasta ./example_data/example_file_peaks.bed ./example_data/mirna_fasta_seq.txt

Output files:

*.anno: peaks annotated with Homer

*.annstats: statistics on annotated peaks

meme_motif: folder containing the motif selected with the combined score in meme format

results_table.txt: A table containing the information about the heteroduplex and the duplex score. The table can be loaded on excel.

For questions/comments please drop us an email: silvia.bottini@unice.fr

About

A pipeline to identify miRNA binding sites from Ago2-CLIP data by de novo motif finding

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published