Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kraken Standard Database #45

Closed
gkarthik opened this issue Jun 23, 2016 · 4 comments
Closed

Kraken Standard Database #45

gkarthik opened this issue Jun 23, 2016 · 4 comments

Comments

@gkarthik
Copy link

Hello,
I'm trying to build the standard kraken database. I gave the job a memory of 140gb and running on 16 threads.

I get this error after kraken has downloaded the gi to taxid mapping file and the taxonomy dump from NCBI.

Found jellyfish v1.1.11
--2016-06-23 16:32:01--  ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz
           => “gi_taxid_nucl.dmp.gz.2”
Resolving ftp.ncbi.nih.gov... 130.14.250.10, 2607:f220:41e:250::12
Connecting to ftp.ncbi.nih.gov|130.14.250.10|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/taxonomy ... done.
==> SIZE gi_taxid_nucl.dmp.gz ... 1405001825
==> PASV ... done.    ==> RETR gi_taxid_nucl.dmp.gz ... done.
Length: 1405001825 (1.3G) (unauthoritative)

100%[=================================================================>] 1,405,001,825 7.42M/s   in 4m 42s  

2016-06-23 16:36:45 (4.74 MB/s) - “gi_taxid_nucl.dmp.gz.2” saved [1405001825]

Downloaded GI to taxon map
--2016-06-23 16:36:45--  ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
           => “taxdump.tar.gz”
Resolving ftp.ncbi.nih.gov... 130.14.250.12, 2607:f220:41e:250::12
Connecting to ftp.ncbi.nih.gov|130.14.250.12|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/taxonomy ... done.
==> SIZE taxdump.tar.gz ... 36397155
==> PASV ... done.    ==> RETR taxdump.tar.gz ... done.
Length: 36397155 (35M) (unauthoritative)

100%[===================================================================>] 36,397,155  10.2M/s   in 5.2s    

2016-06-23 16:36:51 (6.62 MB/s) - “taxdump.tar.gz” saved [36397155]

Downloaded taxonomy tree data

gzip: gi_taxid_nucl.dmp.gz: unexpected end of file

Is there something I'm missing?

Thanks.

@tseemann
Copy link

Looks like your download was incomplete or corrupt.

Try downloading the .md5 file from ftp://ftp.ncbi.nih.gov/pub/taxonomy/ and check it was downloaded correctly.

@gkarthik
Copy link
Author

gkarthik commented Jul 7, 2016

I downloaded the md5 and it was downloaded correctly.
I also tried downloading gi_taxid_nucl.dmp.gz from ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz and running gzip to unpack the gz file and it works perfectly.

Is there any other reason like the version of kraken(I'm using 0.10.6) that could cause this error?

@tseemann
Copy link

@gkarthik it's because there was a previously download file with the same name, so it saved it as a new name with .1 and then .2 extension.

See 2016-06-23 16:36:45 (4.74 MB/s) - “gi_taxid_nucl.dmp.gz.2” saved [1405001825] in your first post.

@jenniferlu717
Copy link
Collaborator

I dont recall the reason for the error but i do remember that it should not interfere with your database build. We have updated the method of downloading the taxonomy so this should no longer cause an issue with the newest version of Kraken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants