Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download Certain Files from SRA Using Aspera #109

Merged
merged 12 commits into from
Jan 31, 2018
Merged

Download Certain Files from SRA Using Aspera #109

merged 12 commits into from
Jan 31, 2018

Conversation

Miserlou
Copy link
Contributor

This adds support for Aspera rather FTP when downloading from certain SRA URLs. This results in an extremely significant download speed improvement.

A test is included. This test may fail on the UPenn network because of how Aspera works, but it works on Circle and from EC2.

Also included in this PR is the ability to add the word "noslow" into a commit message to avoid running the super long test during a build.

Copy link
Contributor

@kurtwheeler kurtwheeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks pretty good. I'm pretty happy with how little code had to change for this :)

command: ./workers/run_tests.sh
- run:
command:
echo $(git log --format=oneline -n 1 $CIRCLE_SHA1); if [[ $(git log --format=oneline -n 1 $CIRCLE_SHA1) = *"noslow"* ]]; then echo "Skipping slow tests.."; ./workers/run_tests.sh --exclude-tag=slow; else echo "Running all tests.."; ./workers/run_tests.sh; fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you either break this into multiple lines or put it into a script? This is really hard to read.

# Install Aspera
ADD --chown=user:user https://download.asperasoft.com/download/sw/cli/3.7.7/aspera-cli-3.7.7.608.927cce8-linux-64-release.sh aspera-cli-3.7.7.608.927cce8-linux-64-release.sh
RUN sh aspera-cli-3.7.7.608.927cce8-linux-64-release.sh
RUN rm aspera-cli-3.7.7.608.927cce8-linux-64-release.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stuff isn't going to change very often. Can you put this into the Dockerfile.base? This will also DRY up the duplication between this file and Dockerfile.tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only one I'm leaving the same for now as there is no documentation for how ccdl/data_refinery_worker_base was built and published, so I think refactoring/documenting that is outside the scope of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

--link nomad:nomad \
-i dr_worker_tests python3 manage.py test "$1" --no-input
# ex: data_refinery_workers.processors.test_salmon.SalmonTestCase.test_success
# ex: data_refinery_workers.downloaders.test_sra.DownloadSraTestCase.test_aspera_downloader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this file? It looks almost identical to workers/run_tests.sh, which is already capable of running a single test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

)
batch.save()

# This is converted to us Aspera
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this comment mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh maybe "to use Aspera"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified

name="ERR036000_1.fastq.gz",
internal_location="IlluminaHiSeq2000/SALMON",
batch=batch)
# This is converted to us Aspera
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

""" Download file dispatcher. Dispatches to the FTP or Aspera downloader """

# SRA files have Apsera downloads.
if 'ftp.sra.ebi.ac.uk' in file.download_url and not force_ftp:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there files we download that do not have this in their URL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. At least from the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants