Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting credentials passed via IAM roles fails on beefy instance types #16

Closed
hannes-ucsc opened this issue Sep 24, 2015 · 7 comments
Closed
Assignees
Labels

Comments

@hannes-ucsc
Copy link
Contributor

This would allow s3am to be run on EC2 without a .boto.

@hannes-ucsc
Copy link
Contributor Author

Can't reproduce the issue. s3am already works on toil-box. Will try on a toil cluster next.

@hannes-ucsc
Copy link
Contributor Author

Works on a cluster too.

It appears that s3am already works with credentials coming from IAM roles. This is actually a Boto feature and as such should apply to Toil, too.

@jvivian Why do you need a .boto on toil nodes? Please reply and assign back to me.

@jvivian
Copy link

jvivian commented Sep 29, 2015

I'll try and reproduce this tomorrow.

@jvivian
Copy link

jvivian commented Sep 30, 2015

Here is how I reproduced this error:

Spawned a toil-cluster via cgcloud (key_config contains master.key and config.txt):
cgcloud create-toil-cluster --master-instance-type m3.large -t=c3.8xlarge -s=1 --shared-dir=key_config/

Applied the bind mount fix for docker:
sudo service docker.io stop && sleep 10 && sudo mv /var/lib/docker /mnt/ephemeral/var/lib/ && sudo mkdir /var/lib/docker && sudo mount --bind /mnt/ephemeral/var/lib/docker /var/lib/docker && sudo service docker.io start

rsynced over the toil and launch script to /home/mesosbox/ on the master:
https://github.com/BD2KGenomics/toil-scripts/blob/master/batch-alignment/batch_align.py
https://github.com/BD2KGenomics/toil-scripts/blob/master/batch-alignment/launch_alignment_mesos.sh

I prayed to the dark lord Cthulu and launched the pipeline:
time ./launch_alignment_mesos.sh

When I got the S3AM_upload step in the pipeline, it failed with the following series of errors:
http://pastebin.com/P5xstWGP

Copied over my .boto file to /home/mesosbox/ and /home/ubuntu/ and reran the pipeline with --restart. Restart occurred successfully and my job completed without further error.

Ran s3am cancel cgl-driver-projects-encrypted wcdt/exome_bams/DTB-005-BL-T.bam to clear the pending upload.

@hannes-ucsc
Copy link
Contributor Author

It might be the same problem as in

https://groups.google.com/forum/#!topic/boto-users/bq0tMxNbjCg

which describes a intermittent problem. From the trace back in the pastebin I can tell that the authentication must have succeeded during initial stages of s3am, since the failure is happening during a part upload.

@jvivian, I think you are running too many instances of s3am or that each instance has too many children. s3am already parallelizes transfers using as many children as there are cores. With IAM roles, each child needs to obtain the credentials by requesting the EC2 metadata endpoint via HTTP. If you run many s3am instances with many children each that might simply overload the metadata endpoint or hit some throttling. Reduce the number of children using s3am's --download-slots and --upload-slot options or reduce the number of s3am instances by specifying a larger number of cores for each s3am job in Toil. Let's see if that helps.

@jvivian
Copy link

jvivian commented Oct 1, 2015

S3AM was only called one time via subprocess and this was on a single master/slave setup so there were no other parallel calls being made.

@hannes-ucsc
Copy link
Contributor Author

I see. I will try to repro on a c3.8xlarge where s3am will use 32 concurrent part uploads, each incurring a metadata endpoint request. Maybe 32 is enough to cause problems. If that's he case I may want to extract the the credentials from the parent process' boto connection and somehow inject them into the child processes.

@hannes-ucsc hannes-ucsc changed the title Add support for getting credentials passed via IAM roles Getting credentials passed via IAM roles fails on beefy instance types Jan 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants