-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
microRNAseq analysis using bcbio for non model organisms #2427
Comments
Hi,
Thanks for the questions.
Sadly, for non-model organism, the only analysis that will run is seqcluster that will generate small RNA loci over the genome and the expression of them (seqcluster/counts.tsv) that you can use with DESeq2. As well you can visualize this data with: https://github.com/lpantano/seqclusterViz <https://github.com/lpantano/seqclusterViz>, you’ll need to download the repo, open the index.html file and load the seqclusterViz/seqcluster.db file.
Mirdeep won’t work on plants, for that we’ll need a plant prediction tool, I know of one but is not integrated, I can work on that, but it will take a month or so. When you got the trimming happens, maybe mirdeep runs successfully and you get some prediction, not sure if I’ll trust so much that.
To make the trimming happen you need to activate `trim_reads : True` under algorithm in the yaml file.
If you know there is a similar species in mirbase, let me know and I can help with set in up the files for that.
Thanks for trying bcbio, happy to help to get the trimming working and the annotation with mirbase if you know of a similar species.
Cheers
… On Jun 29, 2018, at 12:41 AM, WimSpee ***@***.***> wrote:
Hi,
Do you expect that the microRNAseq analysis capability provided by bcbio would make sense for analysis of microRNAseq data of non model organisms?
I am trying to see if I can process the Capsicum annuum microRNAseq data generated in this project using bcbio:
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA177852 <https://www.ncbi.nlm.nih.gov/bioproject/PRJNA177852>
I am new to microRNAseq analysis so I am not really sure how to run this analysis and I am also not sure what kind of output I should expect.
The following is the yaml file that I am using:
upload:
dir: ../final
details:
- analysis: smallRNA-seq
algorithm:
aligner: star # any other aligner is supported.
# change adapter according project
# adapters: ["TGGAATTCTCGGGTGC"]
expression_caller: [ seqcluster, mirdeep2]
# expression_caller: [trna, seqcluster, mirdeep2, mirge] Read docs to know how to use
# miRge tools: https://bcbio-nextgen.readthedocs.io/en/latest/contents/pipelines.html#smallrna-seq
# species: hsa
genome_build: my_ref
#resources:
# atropos:
# options: ["-u 4", "-u -4"]
# mirge:
# options: ["-lib $PATH_TO_LIBS_FOLDER"]
This is the log file produced by the analysis.
[2018-06-27T18:55Z] grid_controller: System YAML configuration: /workspace/my_user/tmp_bcbio_1.1.0_development/data_dir/galaxy/bcbio_system.yaml
[2018-06-27T18:56Z] grid_controller: Timing: organize samples
[2018-06-27T18:56Z] grid_controller: ipython: organize_samples
[2018-06-27T18:56Z] exeuction_node_20: Using input YAML configuration: /leading_dir/config/DA_1164_samples-merged.
yaml
[2018-06-27T18:56Z] exeuction_node_20: Checking sample YAML configuration: /leading_dir/config/DA_1164_samples-mer
ged.yaml
[2018-06-27T18:56Z] exeuction_node_20: Testing minimum versions of installed programs
[2018-06-27T18:56Z] grid_controller: ipython: prepare_sample
[2018-06-27T18:56Z] grid_controller: Timing: adapter trimming
[2018-06-27T18:56Z] grid_controller: ipython: trim_srna_sample
[2018-06-27T19:26Z] exeuction_node_20: Collapsing /leading_dir/work/trimmed/DA_1164_01/DA_1164_01.clean.fastq.gz
with --min_size 16 --min 1
[2018-06-27T20:17Z] exeuction_node_20: Collapsing /leading_dir/work/trimmed/DA_1164_02/DA_1164_02.clean.fastq.gz
with --min_size 16 --min 1
[2018-06-27T21:00Z] exeuction_node_20: Collapsing /leading_dir/work/trimmed/DA_1164_03/DA_1164_03.clean.fastq.gz
with --min_size 16 --min 1
[2018-06-27T21:42Z] exeuction_node_20: Collapsing /leading_dir/work/trimmed/DA_1164_04/DA_1164_04.clean.fastq.gz
with --min_size 16 --min 1
[2018-06-27T22:20Z] exeuction_node_20: Collapsing /leading_dir/work/trimmed/DA_1164_05/DA_1164_05.clean.fastq.gz
with --min_size 16 --min 1
[2018-06-27T22:55Z] exeuction_node_20: Collapsing /leading_dir/work/trimmed/DA_1164_06/DA_1164_06.clean.fastq.gz
with --min_size 16 --min 1
[2018-06-27T23:31Z] exeuction_node_20: Collapsing /leading_dir/work/trimmed/DA_1164_07/DA_1164_07.clean.fastq.gz
with --min_size 16 --min 1
[2018-06-28T00:28Z] exeuction_node_20: Collapsing /leading_dir/work/trimmed/DA_1164_08/DA_1164_08.clean.fastq.gz
with --min_size 16 --min 1
[2018-06-28T01:11Z] exeuction_node_20: Collapsing /leading_dir/work/trimmed/DA_1164_09/DA_1164_09.clean.fastq.gz
with --min_size 16 --min 1
[2018-06-28T01:54Z] exeuction_node_20: Collapsing /leading_dir/work/trimmed/DA_1164_10/DA_1164_10.clean.fastq.gz
with --min_size 16 --min 1
[2018-06-28T02:13Z] grid_controller: Timing: prepare
[2018-06-28T02:13Z] grid_controller: ipython: seqcluster_prepare
[2018-06-28T03:05Z] exeuction_node_24: Prepare seqs.fastq with -minl 17 -maxl 40 -minc 2 --min_shared 0.1
[2018-06-28T03:08Z] grid_controller: Timing: alignment
[2018-06-28T03:08Z] grid_controller: ipython: srna_alignment
[2018-06-28T03:08Z] exeuction_node_24: Aligning lane DA_1164_01 with star aligner
[2018-06-28T03:11Z] exeuction_node_24: mirdeep2 Rfam file not instaled. Skipping...
[2018-06-28T03:11Z] grid_controller: Timing: small RNA annotation
[2018-06-28T03:11Z] grid_controller: ipython: srna_annotation
[2018-06-28T03:12Z] grid_controller: Timing: cluster
[2018-06-28T03:12Z] grid_controller: ipython: seqcluster_cluster
[2018-06-28T04:59Z] grid_controller: Timing: quality control
[2018-06-28T04:59Z] grid_controller: ipython: pipeline_summary
[2018-06-28T04:59Z] exeuction_node_20: QC: DA_1164_01 fastqc
I am not sure how to specify that dnapi should be run for de-novo adapter detection followed by adapter trimming. As far as I can tell dnapi was not used for adapter trimming. The fastqc part of the multiqQC report shows that of the 50bp reads the last 25 bp is almost 100% adapters.
As far as I can tell Capsicum annuum is not in mirbase. Therefore I did not enter a 3 letter species code. I am not sure if it makes sense to just enter the species code of a somewhat related species
http://www.mirbase.org/cgi-bin/mirna_summary.pl?org=sly <http://www.mirbase.org/cgi-bin/mirna_summary.pl?org=sly>
Or that I better just don't provide a species code.
The analysis did not seem to produce much results. See the file list at the bottom of this comment. Then again I am also not sure what to expect.
The lack of output might in part be because mirdeep2 Rfam not being installed/found. Should I have done that myself?
[2018-06-28T03:08Z] exeuction_node_24: Aligning lane DA_1164_01 with star aligner
[2018-06-28T03:11Z] exeuction_node_24: mirdeep2 Rfam file not instaled. Skipping...
What I kind of expect as output for an microRNAseq analysis is:
identification/ filtering of known/discovered non microRNA sequences (either biological (e.g. other RNA's) or adapters)
identification of know mircroRNA sequences from mirbase or similar
per sample alignment BAM files of the microRNA sequences (not sure if this should run against the genome or transcriptome (or both). And I am not sure if these alignments identify target loci/mRNAs or microRNA precursur loci/mRNAs (or both))
microRNA target mRNA/gene prediction
microRNA quantification
Do you think it is possible to get the above results using bcbio for microRNAseq data of a non model organism?
How would I then do that using bcbio? Is the yaml that I use correct? Should I add tRNA as an expression caller?
Since I am new to microRNAseq the bcbio microRNAseq documentation is also a bit short me.
I would also very much appreciate it if you can point to me a recent sort of best practice method / review paper that describes the method(s) that bcbio in general tries to provide for microRNAseq analysis.
Thank you very much!
final/
final/DA_1164_05
final/DA_1164_05/qc
final/DA_1164_05/qc/fastqc
final/DA_1164_05/qc/fastqc/fastqc_report.html
final/DA_1164_05/qc/fastqc/fastqc_data.txt
final/DA_1164_05/qc/fastqc/DA_1164_05.zip
final/DA_1164_05/qc/fastqc/Per_base_sequence_quality.tsv
final/DA_1164_05/qc/fastqc/Per_tile_sequence_quality.tsv
final/DA_1164_05/qc/fastqc/Per_sequence_quality_scores.tsv
final/DA_1164_05/qc/fastqc/Per_base_sequence_content.tsv
final/DA_1164_05/qc/fastqc/Per_sequence_GC_content.tsv
final/DA_1164_05/qc/fastqc/Per_base_N_content.tsv
final/DA_1164_05/qc/fastqc/Sequence_Length_Distribution.tsv
final/DA_1164_05/DA_1164_05-ready.trimming_stats
final/DA_1164_04
final/DA_1164_04/qc
final/DA_1164_04/qc/fastqc
final/DA_1164_04/qc/fastqc/fastqc_report.html
final/DA_1164_04/qc/fastqc/fastqc_data.txt
final/DA_1164_04/qc/fastqc/DA_1164_04.zip
final/DA_1164_04/qc/fastqc/Per_base_sequence_quality.tsv
final/DA_1164_04/qc/fastqc/Per_tile_sequence_quality.tsv
final/DA_1164_04/qc/fastqc/Per_sequence_quality_scores.tsv
final/DA_1164_04/qc/fastqc/Per_base_sequence_content.tsv
final/DA_1164_04/qc/fastqc/Per_sequence_GC_content.tsv
final/DA_1164_04/qc/fastqc/Per_base_N_content.tsv
final/DA_1164_04/qc/fastqc/Sequence_Length_Distribution.tsv
final/DA_1164_04/DA_1164_04-ready.trimming_stats
final/DA_1164_09
final/DA_1164_09/qc
final/DA_1164_09/qc/fastqc
final/DA_1164_09/qc/fastqc/fastqc_report.html
final/DA_1164_09/qc/fastqc/fastqc_data.txt
final/DA_1164_09/qc/fastqc/DA_1164_09.zip
final/DA_1164_09/qc/fastqc/Per_base_sequence_quality.tsv
final/DA_1164_09/qc/fastqc/Per_tile_sequence_quality.tsv
final/DA_1164_09/qc/fastqc/Per_sequence_quality_scores.tsv
final/DA_1164_09/qc/fastqc/Per_base_sequence_content.tsv
final/DA_1164_09/qc/fastqc/Per_sequence_GC_content.tsv
final/DA_1164_09/qc/fastqc/Per_base_N_content.tsv
final/DA_1164_09/qc/fastqc/Sequence_Length_Distribution.tsv
final/DA_1164_09/DA_1164_09-ready.trimming_stats
final/DA_1164_08
final/DA_1164_08/qc
final/DA_1164_08/qc/fastqc
final/DA_1164_08/qc/fastqc/fastqc_report.html
final/DA_1164_08/qc/fastqc/fastqc_data.txt
final/DA_1164_08/qc/fastqc/DA_1164_08.zip
final/DA_1164_08/qc/fastqc/Per_base_sequence_quality.tsv
final/DA_1164_08/qc/fastqc/Per_tile_sequence_quality.tsv
final/DA_1164_08/qc/fastqc/Per_sequence_quality_scores.tsv
final/DA_1164_08/qc/fastqc/Per_base_sequence_content.tsv
final/DA_1164_08/qc/fastqc/Per_sequence_GC_content.tsv
final/DA_1164_08/qc/fastqc/Per_base_N_content.tsv
final/DA_1164_08/qc/fastqc/Sequence_Length_Distribution.tsv
final/DA_1164_08/DA_1164_08-ready.trimming_stats
final/DA_1164_07
final/DA_1164_07/qc
final/DA_1164_07/qc/fastqc
final/DA_1164_07/qc/fastqc/fastqc_report.html
final/DA_1164_07/qc/fastqc/fastqc_data.txt
final/DA_1164_07/qc/fastqc/DA_1164_07.zip
final/DA_1164_07/qc/fastqc/Per_base_sequence_quality.tsv
final/DA_1164_07/qc/fastqc/Per_tile_sequence_quality.tsv
final/DA_1164_07/qc/fastqc/Per_sequence_quality_scores.tsv
final/DA_1164_07/qc/fastqc/Per_base_sequence_content.tsv
final/DA_1164_07/qc/fastqc/Per_sequence_GC_content.tsv
final/DA_1164_07/qc/fastqc/Per_base_N_content.tsv
final/DA_1164_07/qc/fastqc/Sequence_Length_Distribution.tsv
final/DA_1164_07/DA_1164_07-ready.trimming_stats
final/DA_1164_06
final/DA_1164_06/qc
final/DA_1164_06/qc/fastqc
final/DA_1164_06/qc/fastqc/fastqc_report.html
final/DA_1164_06/qc/fastqc/fastqc_data.txt
final/DA_1164_06/qc/fastqc/DA_1164_06.zip
final/DA_1164_06/qc/fastqc/Per_base_sequence_quality.tsv
final/DA_1164_06/qc/fastqc/Per_tile_sequence_quality.tsv
final/DA_1164_06/qc/fastqc/Per_sequence_quality_scores.tsv
final/DA_1164_06/qc/fastqc/Per_base_sequence_content.tsv
final/DA_1164_06/qc/fastqc/Per_sequence_GC_content.tsv
final/DA_1164_06/qc/fastqc/Per_base_N_content.tsv
final/DA_1164_06/qc/fastqc/Sequence_Length_Distribution.tsv
final/DA_1164_06/DA_1164_06-ready.trimming_stats
final/DA_1164_01
final/DA_1164_01/qc
final/DA_1164_01/qc/fastqc
final/DA_1164_01/qc/fastqc/fastqc_report.html
final/DA_1164_01/qc/fastqc/fastqc_data.txt
final/DA_1164_01/qc/fastqc/DA_1164_01.zip
final/DA_1164_01/qc/fastqc/Per_base_sequence_quality.tsv
final/DA_1164_01/qc/fastqc/Per_tile_sequence_quality.tsv
final/DA_1164_01/qc/fastqc/Per_sequence_quality_scores.tsv
final/DA_1164_01/qc/fastqc/Per_base_sequence_content.tsv
final/DA_1164_01/qc/fastqc/Per_sequence_GC_content.tsv
final/DA_1164_01/qc/fastqc/Per_base_N_content.tsv
final/DA_1164_01/qc/fastqc/Sequence_Length_Distribution.tsv
final/DA_1164_01/qc/small-rna
final/DA_1164_01/qc/small-rna/DA_1164_01.txt
final/DA_1164_01/DA_1164_01-ready.trimming_stats
final/DA_1164_03
final/DA_1164_03/qc
final/DA_1164_03/qc/fastqc
final/DA_1164_03/qc/fastqc/fastqc_report.html
final/DA_1164_03/qc/fastqc/fastqc_data.txt
final/DA_1164_03/qc/fastqc/DA_1164_03.zip
final/DA_1164_03/qc/fastqc/Per_base_sequence_quality.tsv
final/DA_1164_03/qc/fastqc/Per_tile_sequence_quality.tsv
final/DA_1164_03/qc/fastqc/Per_sequence_quality_scores.tsv
final/DA_1164_03/qc/fastqc/Per_base_sequence_content.tsv
final/DA_1164_03/qc/fastqc/Per_sequence_GC_content.tsv
final/DA_1164_03/qc/fastqc/Per_base_N_content.tsv
final/DA_1164_03/qc/fastqc/Sequence_Length_Distribution.tsv
final/DA_1164_03/DA_1164_03-ready.trimming_stats
final/DA_1164_02
final/DA_1164_02/qc
final/DA_1164_02/qc/fastqc
final/DA_1164_02/qc/fastqc/fastqc_report.html
final/DA_1164_02/qc/fastqc/fastqc_data.txt
final/DA_1164_02/qc/fastqc/DA_1164_02.zip
final/DA_1164_02/qc/fastqc/Per_base_sequence_quality.tsv
final/DA_1164_02/qc/fastqc/Per_tile_sequence_quality.tsv
final/DA_1164_02/qc/fastqc/Per_sequence_quality_scores.tsv
final/DA_1164_02/qc/fastqc/Per_base_sequence_content.tsv
final/DA_1164_02/qc/fastqc/Per_sequence_GC_content.tsv
final/DA_1164_02/qc/fastqc/Per_base_N_content.tsv
final/DA_1164_02/qc/fastqc/Sequence_Length_Distribution.tsv
final/DA_1164_02/DA_1164_02-ready.trimming_stats
final/DA_1164_10
final/DA_1164_10/qc
final/DA_1164_10/qc/fastqc
final/DA_1164_10/qc/fastqc/fastqc_report.html
final/DA_1164_10/qc/fastqc/fastqc_data.txt
final/DA_1164_10/qc/fastqc/DA_1164_10.zip
final/DA_1164_10/qc/fastqc/Per_base_sequence_quality.tsv
final/DA_1164_10/qc/fastqc/Per_tile_sequence_quality.tsv
final/DA_1164_10/qc/fastqc/Per_sequence_quality_scores.tsv
final/DA_1164_10/qc/fastqc/Per_base_sequence_content.tsv
final/DA_1164_10/qc/fastqc/Per_sequence_GC_content.tsv
final/DA_1164_10/qc/fastqc/Per_base_N_content.tsv
final/DA_1164_10/qc/fastqc/Sequence_Length_Distribution.tsv
final/DA_1164_10/DA_1164_10-ready.trimming_stats
final/2018-06-28_DA_1164_samples-merged
final/2018-06-28_DA_1164_samples-merged/programs.txt
final/2018-06-28_DA_1164_samples-merged/bcbio-nextgen.log
final/2018-06-28_DA_1164_samples-merged/bcbio-nextgen-commands.log
final/2018-06-28_DA_1164_samples-merged/project-summary.yaml
final/2018-06-28_DA_1164_samples-merged/report
final/2018-06-28_DA_1164_samples-merged/report/srna_report.rmd
final/2018-06-28_DA_1164_samples-merged/report/summary.csv
final/2018-06-28_DA_1164_samples-merged/multiqc
final/2018-06-28_DA_1164_samples-merged/multiqc/multiqc_report.html
final/2018-06-28_DA_1164_samples-merged/multiqc/report
final/2018-06-28_DA_1164_samples-merged/multiqc/report/metrics
final/2018-06-28_DA_1164_samples-merged/multiqc/report/metrics/DA_1164_08_bcbio.txt
final/2018-06-28_DA_1164_samples-merged/multiqc/report/metrics/DA_1164_04_bcbio.txt
final/2018-06-28_DA_1164_samples-merged/multiqc/report/metrics/DA_1164_07_bcbio.txt
final/2018-06-28_DA_1164_samples-merged/multiqc/report/metrics/DA_1164_06_bcbio.txt
final/2018-06-28_DA_1164_samples-merged/multiqc/report/metrics/DA_1164_02_bcbio.txt
final/2018-06-28_DA_1164_samples-merged/multiqc/report/metrics/DA_1164_05_bcbio.txt
final/2018-06-28_DA_1164_samples-merged/multiqc/report/metrics/DA_1164_10_bcbio.txt
final/2018-06-28_DA_1164_samples-merged/multiqc/report/metrics/DA_1164_01_bcbio.txt
final/2018-06-28_DA_1164_samples-merged/multiqc/report/metrics/DA_1164_09_bcbio.txt
final/2018-06-28_DA_1164_samples-merged/multiqc/report/metrics/DA_1164_03_bcbio.txt
final/2018-06-28_DA_1164_samples-merged/multiqc/multiqc_config.yaml
final/2018-06-28_DA_1164_samples-merged/multiqc/multiqc_data
final/2018-06-28_DA_1164_samples-merged/multiqc/multiqc_data/multiqc_data_final.json
final/2018-06-28_DA_1164_samples-merged/multiqc/list_files_final.txt
final/2018-06-28_DA_1164_samples-merged/seqcluster
final/2018-06-28_DA_1164_samples-merged/seqcluster/log
final/2018-06-28_DA_1164_samples-merged/seqcluster/log/run.log
final/2018-06-28_DA_1164_samples-merged/seqcluster/log/trace.log
final/2018-06-28_DA_1164_samples-merged/seqcluster/seqs_rmlw.bam_cov.tsv
final/2018-06-28_DA_1164_samples-merged/seqcluster/read_stats.tsv
final/2018-06-28_DA_1164_samples-merged/seqcluster/cluster.bed
final/2018-06-28_DA_1164_samples-merged/seqcluster/list_obj.pk
final/2018-06-28_DA_1164_samples-merged/seqcluster/list_obj_red.pk
final/2018-06-28_DA_1164_samples-merged/seqcluster/counts.tsv
final/2018-06-28_DA_1164_samples-merged/seqcluster/size_counts.tsv
final/2018-06-28_DA_1164_samples-merged/seqcluster/positions.bed
final/2018-06-28_DA_1164_samples-merged/seqcluster/counts_sequence.tsv
final/2018-06-28_DA_1164_samples-merged/seqcluster/seqcluster.json
final/2018-06-28_DA_1164_samples-merged/seqclusterViz
final/2018-06-28_DA_1164_samples-merged/seqclusterViz/log
final/2018-06-28_DA_1164_samples-merged/seqclusterViz/log/run.log
final/2018-06-28_DA_1164_samples-merged/seqclusterViz/log/trace.log
final/2018-06-28_DA_1164_samples-merged/seqclusterViz/profiles
final/2018-06-28_DA_1164_samples-merged/seqclusterViz/profiles/344
final/2018-06-28_DA_1164_samples-merged/seqclusterViz/profiles/5
final/2018-06-28_DA_1164_samples-merged/seqclusterViz/seqcluster.db
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#2427>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABi_HPT3EnjSkz6UUAtjeWebu5pIAmf-ks5uBdodgaJpZM4U8lIW>.
|
Hi Lorena Pantano. Thank you for the information. I did not know that plant specific tools were needed. Do you mean any of these two tools? The first paper mentions that different tools are needed because of that the miRNA precursors are different / longer in plants than in animals.
Do you know if there are other reasons plant specific miRNA tools are needed? I will try to use / look at the seqcluster results. I will try with Also I will try the analysis with Solanum lycopersicum (mirbase SLY) as the known miRNA data set. That species is some what close (also in the nightshade family), and the miRNA seqeunces are conserved in plants according to one of the above papers. Do I need to do anything to make use of the the SLY miRNA known sequences? Do you know how and by who the sequences for a species get added in mirbase? It would be nice if the microRNA seq functionality of bcbio works for plants. I kind of hoped it would / did not expect a need plant specific tools. At the same time I understand your primary focus is on other species, thus it would only make sense to me if it's not to much work or we could do part of it. Thanks again for the information. |
Hi,
Yes, those are godo examples.
I would be happy to add this to bcbio. What it would help is if you use some of these tools and tell us the command line you use. We try to implement tools that we can test with some data and we know kind of give good results. If you do that and find some, I’ll be happy to add it, for sure.
About using this other species with bcbio, that should work. If you have installed the hg19 and mm10 genome you could generate the same files and put it in the right location.
I am assuming you have the genome set up, and you can locate genome_name/build_name/seq folder used by bcbio when you ran the small RNA seq pipeline.
You need to add the following to the genome-resources.yaml file:
srnaseq:
srna_transcripts: ../srnaseq/srna-transcripts.gtf
mirbase_hairpin: ../srnaseq/hairpin.fa
mirbase_mature: ../srnaseq/mature.fa
In the srnaseq folder, that is at the same level than the seq folder, you need these files:
From mirbase you can download the mature.fa and hairpin.fa and the miRNA.str (ftp://mirbase.org/pub/mirbase/CURRENT/ <ftp://mirbase.org/pub/mirbase/CURRENT/>)
You’ll need to prepare you’ll files like this:
zcat hairpin.fa.gz | awk '{if ($0~/>sly/){name=$0; print name} else if ($0~/^>/){name=0};if (name!=0 && $0!~/^>/){print $0;}}' | sed 's/U/T/g' > hairpin.fa
zcat mature.fa.gz | awk '{if ($0~/>sly/){name=$0; print name} else if ($0~/^>/){name=0};if (name!=0 && $0!~/^>/){print $0;}}' | sed 's/U/T/g' > mature.fa
zcat miRNA.str.gz | awk '{if ($0~/sly/)print $0}' > miRNA.str
You can use a custom gtf from your species, so if you have a genome ensemble and there is a gene annotation (ftp://ftp.ensemblgenomes.org/pub/release-39/plants/gtf/solanum_lycopersicum), you can put that one there as the srna-transcripts.gtf. That file is used by seqcluster to annotate the clusters found. I think it should work from the ensembl database if you are using that genome.
If you get the mirna part and the cluster working you can use this package to load all the data with this package:
https://lpantano.github.io/bcbioSmallRna/reference/loadSmallRnaRun.html
Beside you have this kind of template for a quick QC analysis and how to get the count data and annotation:
https://github.com/lpantano/bcbioSmallRna/blob/master/inst/rmarkdown/templates/srnaseq/skeleton/skeleton.Rmd <https://github.com/lpantano/bcbioSmallRna/blob/master/inst/rmarkdown/templates/srnaseq/skeleton/skeleton.Rmd>
I am working on this currently, so It is a good time to start using it.
I hope this helps.
Cheers
… On Jul 4, 2018, at 10:25 AM, WimSpee ***@***.***> wrote:
Hi Lorena Pantano.
Thank you for the information. I did not know that plant specific tools were needed. Do you mean any of these two tools?
miRPlant: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-275 <https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-275>
miRDeep-P: https://academic.oup.com/bioinformatics/article/27/18/2614/181153 <https://academic.oup.com/bioinformatics/article/27/18/2614/181153>
The first paper mentions that different tools are needed because of that the miRNA precursors are different / longer in plants than in animals.
The most challenging problem in identifying novel plant miRNA is to find a suitable genomic region as a miRNA precursor candidate (to test whether it forms hairpins) because the majority of precursor miRNA in plants are between 100-200 bp [4], which is much longer than those in animals.
Do you know if there are other reasons plant specific miRNA tools are needed?
I will try to use / look at the seqcluster results.
I will try with trim_reads : True .
Also I will try the analysis with Solanum lycopersicum (mirbase SLY) as the known miRNA data set. That species is some what close (also in the nightshade family), and the miRNA seqeunces are conserved in plants according to one of the above papers.
http://www.mirbase.org/cgi-bin/mirna_summary.pl?org=sly <http://www.mirbase.org/cgi-bin/mirna_summary.pl?org=sly>
Do I need to do anything to make use of the the SLY miRNA known sequences?
Do you know how and by who the sequences for a species get added in mirbase?
It would be nice if the microRNA seq functionality of bcbio works for plants. I kind of hoped it would / did not expect a need plant specific tools. At the same time I understand your primary focus is on other species, thus it would only make sense to me if it's not to much work or we could do part of it.
Thank again for the information.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#2427 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABi_HLRgQ3rv51mlkTjyxlXivd10lzhzks5uDNA_gaJpZM4U8lIW>.
|
Hi @lpantano . I tried to run the same analysis with This resulted in the following error during adapter removal:
This seems to be sample specific. Other samples seem not to run into this error.
For this sample I am also not sure why |
Hi, sorry about this. It seems that the tool we used to predict the adapter is not working there. If you know the 3' adapter, I'll suggest to add the adapter to the https://github.com/bcbio/bcbio-nextgen/blob/master/config/templates/illumina-srnaseq.yaml#L8 If you don't know you can ask the sequencing core for that. In this case, I'll suggest to start from scratch the analysis. Let me know if that helps. |
Thanks, closing this as it seems like its been answered. |
Hi,
Do you expect that the microRNAseq analysis capability provided by bcbio would make sense for analysis of microRNAseq data of non model organisms?
I am trying to see if I can process the Capsicum annuum microRNAseq data generated in this project using bcbio:
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA177852
I am new to microRNAseq analysis so I am not really sure how to run this analysis and I am also not sure what kind of output I should expect.
The following is the yaml file that I am using:
This is the log file produced by the analysis.
I am not sure how to specify that dnapi should be run for de-novo adapter detection followed by adapter trimming. As far as I can tell dnapi was not used for adapter trimming. The fastqc part of the multiqQC report shows that of the 50bp reads the last 25 bp is almost 100% adapters.
As far as I can tell Capsicum annuum is not in mirbase. Therefore I did not enter a 3 letter species code. I am not sure if it makes sense to just enter the species code of a somewhat related species
http://www.mirbase.org/cgi-bin/mirna_summary.pl?org=sly
Or that I better just don't provide a species code.
The analysis did not seem to produce much results. See the file list at the bottom of this comment. Then again I am also not sure what to expect.
The lack of output might in part be because mirdeep2 Rfam not being installed/found. Should I have done that myself?
What I kind of expect as output for an microRNAseq analysis is:
Do you think it is possible to get the above results using bcbio for microRNAseq data of a non model organism?
How would I then do that using bcbio? Is the yaml that I use correct? Should I add tRNA as an expression caller?
Since I am new to microRNAseq the bcbio microRNAseq documentation is also a bit short me.
I would also very much appreciate it if you can point to me a recent sort of best practice method / review paper that describes the method(s) that bcbio in general tries to provide for microRNAseq analysis.
Thank you very much!
The text was updated successfully, but these errors were encountered: