Skip to content
This repository has been archived by the owner on Mar 16, 2022. It is now read-only.

How to use SRA downloaded files in Falcon #428

Closed
ls2017 opened this issue Aug 4, 2016 · 5 comments
Closed

How to use SRA downloaded files in Falcon #428

ls2017 opened this issue Aug 4, 2016 · 5 comments
Labels

Comments

@ls2017
Copy link

ls2017 commented Aug 4, 2016

I have a downloaded dataset from SRA, and converted it to *.fastq, sth. like this:

@SRR1168519.1 length=302
ATTTTTGTCTGTCCGATTCTGATAGCAGGC
GCATATCAGATGAATCTGATGAGTCAACACTGGTTGGTTCGTTGCTCAGTAGTTATGTTCGTGTGGAGCGTCGTATTGGTATCGAGTCTGATTGTCAGTCATCGATGGTCATTAGTCACGTCCTTCCAGTAGTTCGTATCAACATGCTTCACTATTCTTGTTGTTGTAGATGTTATTCGTATTAGTGTGAGTGTCAGTAGTTACGCGTACAGTATCGGGATTTCGTAGCAGCGCGCGGCGTTGCGGAGTCAAGATTCATGGCTGGACTACGG
+SRR1168519.1 length=302
!"!!!"#$"##!!!"!!"!"#""""#$#"!"!""!!!!!""%"""!"!"#""!#"!!!"!#"!#!!!"!!!"""!!!!"""#!!"#"!"!""!"!!!!""#!!!""!!!"!#!"###"#""!"!!!##!#!#!"!"""!"$$!!"#"$""#"!!"!!#"!!#!!!"!"""!!""%#"$#"$"#"!!!"!!!!!"!!!"!"!"!$#%&%%$"""""""!#"!"!!""##"$!!!!!!!$$!!!!!#!!"!!!!%!"$"!!"""!!!!!!!"!!!!!!$$#"!"!!!"!$$#"!$!!!""!"""

After using Falcon-formatter for format conversion, it does NOT work in Falcon.

And it looks like that the fasta files require strict formatting with the information of movie, time of run start, SMRT barcode, etc. and should look like this (copied from ecoli example):

m140913_050931_42139_c100713652400000001823152404301535_s1_p0/9/1607_26058 RQ=0.831
TGGCATCTCATAAAGCCGCGCGGACGGGCAATAGCACTGGTTCGATTGTCTGGTGTTTATTCCCGGCTGT
TGGGCTGAGTTTGTGATCCCGGTGAACTTCTCGCATGCCGACAGCATCATGATCGGTGCGCTGTCTCCCT
GGCAAATAGAAGTTGTTCAATAACGCGCGCGACTGGCCGTTGGCCTCGGGCGGTTAGCGATGCATCGATG
TTTGCTGGGCTGCTAATTGTGCCCGATAATATGGTTGGTTCGGCACTAAACGACCAGCAAAAAAAAGCGT
GGGAGAACAGATGAAATTATTTACGCGGTAGTTCGTTTCGCCGCTGGCGGATTGTGATTTTGCTGGCTTG
GTCTTACCGTTTTCCTCTACGCGGCCCAATGCTGAGCTGGGTATCTATTCGTTATACGGCTCTGAAGGCT

My question: Can I make up some dummy variables equivalent for ">m140913_050931_42139_c100713652400000001823152404301535_s1_p0/9/1607_26058" to make Falcon work properly?

Or is there another way to dump*.sra file I downloaded to make it work properly in Falcon?

@ls2017 ls2017 changed the title How to SRA downloaded files in Falcon How to use SRA downloaded files in Falcon Aug 4, 2016
@pb-cdunn
Copy link

pb-cdunn commented Aug 4, 2016

My question: Can I make up some dummy variables equivalent for ">m140913_050931_42139_c100713652400000001823152404301535_s1_p0/9/1607_26058" to make Falcon work properly?

Yes. This is a restriction in DAZZ_DB/fasta2DB. The header must match >movie/well/blah plus comments if any. All reads from the same movie should be together in the file. well is an integer. blah is ignored.

@pb-jchin
Copy link
Contributor

@jingqinwu you will need to download the bax.h5 files and use pls2fasta to convert to proper fasta. SRA's fasta output does not encode proper information for assembly (yet).

@rhallPB
Copy link

rhallPB commented Sep 11, 2016

Depending on how the files were uploaded they may or may not contain the data needed to correctly format them. Some useful reading: http://microbe.net/2015/01/20/submit-data-to-ncbis-short-read-archive/ http://seqanswers.com/forums/showthread.php?t=56466. If the data isn't in the SRA I would suggest contacting the authors of the study.

@ls2017
Copy link
Author

ls2017 commented Sep 13, 2016

@pb-jchin @rhallPB Many thanks for your suggestions.

@pb-jlandolin
Copy link

See related issue here: pb-jlandolin/PacbioToSRA#2

If they were uploaded by PacBio, they should have links to the original bax.h5 files. You can click on the SRR id, then click on the "Download" tab, and download the original bax.h5 files instead of the .sra files:

screen shot 2016-09-16 at 2 42 49 pm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants