Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't re-download taxonomy files if already there #5

Closed
tseemann opened this issue Jun 27, 2018 · 3 comments
Closed

Don't re-download taxonomy files if already there #5

tseemann opened this issue Jun 27, 2018 · 3 comments

Comments

@tseemann
Copy link

I was doing kraken-build --db XXX --download-taxonomy and the 4th FTP stalled, so i restarted it, and it started re-downloading everything from scratch (it takes 10 hours to get them from NCBI over FTP to my uni, they throttle FTP)

https://github.com/DerrickWood/kraken2/blob/master/scripts/download_taxonomy.sh#L24-L27

  1. Could you use this wget option to avoid re-download?
    Or only download if if [ ! -r $FILE ]; then wget ... ; fi ?
       --continue
           Continue getting a partially-downloaded file.  This is useful when you want to finish up a
           download started by a previous instance of Wget, or by another program.
  1. Do you know the Aspera URL for them? And use ascp if it is installed?
@DerrickWood
Copy link
Owner

I think the right approach there is to have separate dlflag files for each file, or actually store a checkpoint value somewhere so that downloads start after the last complete step. wget -c can run into issues if the remote copy has changed, which I'd expect to happen often enough that it would be confusing, and the -r file test wouldn't check to ensure the download was complete/successful.

I don't know the Aspera URL (and haven't worked with Aspera much myself, as might be evident).

@tseemann
Copy link
Author

I love the rsync part though - nice touch.

I realised after filing that you use touch-files for the other stuff. I guess it's just a matter of extending it to those 4 files. I suspect most people in EU / USA get fast FTP to NCBI.

@DerrickWood
Copy link
Owner

OK, I went and just did the touch file thing with the others in 98ac708 - there's a few other wgets in other scripts that are going to need to be wrapped in the same fashion, but I'll handle that in another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants