Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock with concurrent credentials fetch #137

Closed
kennyjwilli opened this issue Mar 25, 2020 · 2 comments
Closed

Deadlock with concurrent credentials fetch #137

kennyjwilli opened this issue Mar 25, 2020 · 2 comments

Comments

@kennyjwilli
Copy link

#130 attempted to fix a deadlock issue while fetching credentials for aws-api calls. This fix does not work in the general case.

Dependencies

com.cognitect.aws/api       {:mvn/version "0.8.445"}
com.cognitect.aws/endpoints {:mvn/version "1.1.11.607"}
com.cognitect.aws/sts       {:mvn/version "741.2.504.0"}

Description with failing test case

We have a SQS message processing service that runs 10 processing threads. All 10 of those threads have gotten deadlocked on calls to aws-api. A message processing thread makes calls to our customers' AWS accounts using IAM AssumeRole. Each customer has a different IAM Role ARN and ExternalId which results in the aws-api calls using different CredentialProvider. As stated before, these calls occur in parallel.

If you have 4 or more CredentialsProviders running fetch at the same time, you will hit a deadlock.

I created a repro here.

I think the problem is the same as #130. Its fix only made it less likely to occur. In my repro, this is what is happening in each thread.

| invoke DescribeInstances
  | fetch-creds (AssumeRole)
    | invoke STS AssumeRole
      | fetch-creds (implicit)

The fetch-creds calls go through the async-fetch-pool which is size 4. If you launch N threads to run the above flow, where N >= async-fetch-pool size, you will get a deadlock. This occurs because the inner fetch-creds call is sitting in the async-fetch-pool's queue. The queue will never get processed since the executor is waiting on all the outer AssumeRole fetch-creds calls to finish.

@dchelimsky
Copy link
Contributor

Thank you @kennyjwilli for the thorough report.

@dchelimsky dchelimsky changed the title Deadlock with parallel credentials fetch Deadlock with concurrent credentials fetch Mar 27, 2020
@dchelimsky
Copy link
Contributor

Fixed in 0.8.456

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants