-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download Certain Files from SRA Using Aspera #109
Conversation
…ta-refinery into miserlou/aspera
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks pretty good. I'm pretty happy with how little code had to change for this :)
.circleci/config.yml
Outdated
command: ./workers/run_tests.sh | ||
- run: | ||
command: | ||
echo $(git log --format=oneline -n 1 $CIRCLE_SHA1); if [[ $(git log --format=oneline -n 1 $CIRCLE_SHA1) = *"noslow"* ]]; then echo "Skipping slow tests.."; ./workers/run_tests.sh --exclude-tag=slow; else echo "Running all tests.."; ./workers/run_tests.sh; fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you either break this into multiple lines or put it into a script? This is really hard to read.
# Install Aspera | ||
ADD --chown=user:user https://download.asperasoft.com/download/sw/cli/3.7.7/aspera-cli-3.7.7.608.927cce8-linux-64-release.sh aspera-cli-3.7.7.608.927cce8-linux-64-release.sh | ||
RUN sh aspera-cli-3.7.7.608.927cce8-linux-64-release.sh | ||
RUN rm aspera-cli-3.7.7.608.927cce8-linux-64-release.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stuff isn't going to change very often. Can you put this into the Dockerfile.base? This will also DRY up the duplication between this file and Dockerfile.tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only one I'm leaving the same for now as there is no documentation for how ccdl/data_refinery_worker_base
was built and published, so I think refactoring/documenting that is outside the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
workers/run_individual_test.sh
Outdated
--link nomad:nomad \ | ||
-i dr_worker_tests python3 manage.py test "$1" --no-input | ||
# ex: data_refinery_workers.processors.test_salmon.SalmonTestCase.test_success | ||
# ex: data_refinery_workers.downloaders.test_sra.DownloadSraTestCase.test_aspera_downloader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this file? It looks almost identical to workers/run_tests.sh
, which is already capable of running a single test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
) | ||
batch.save() | ||
|
||
# This is converted to us Aspera |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this comment mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh maybe "to use Aspera"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified
name="ERR036000_1.fastq.gz", | ||
internal_location="IlluminaHiSeq2000/SALMON", | ||
batch=batch) | ||
# This is converted to us Aspera |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
""" Download file dispatcher. Dispatches to the FTP or Aspera downloader """ | ||
|
||
# SRA files have Apsera downloads. | ||
if 'ftp.sra.ebi.ac.uk' in file.download_url and not force_ftp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there files we download that do not have this in their URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. At least from the tests.
This adds support for Aspera rather FTP when downloading from certain SRA URLs. This results in an extremely significant download speed improvement.
A test is included. This test may fail on the UPenn network because of how Aspera works, but it works on Circle and from EC2.
Also included in this PR is the ability to add the word "noslow" into a commit message to avoid running the super long test during a build.