Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom endpoint for SQS (VPC Endpoint) does not work. #2114

Closed
hhk1989 opened this issue Sep 3, 2019 · 20 comments · Fixed by #2156
Closed

Custom endpoint for SQS (VPC Endpoint) does not work. #2114

hhk1989 opened this issue Sep 3, 2019 · 20 comments · Fixed by #2156

Comments

@hhk1989
Copy link

@hhk1989 hhk1989 commented Sep 3, 2019

Issue description

In my account, I have a VPC Interface Endpoint for SQS and a lambda function in the same VPC. The lambda job (Ruby 2.5 runtime) tries to send an SQS message to an SQS queue that the lambda execution role has permissions to, by using the VPC Endpoint. For this, I do something like the following

sqs_client = Aws::SQS::Client.new(credentials: ####, endpoint: 'https://vpce-<id>.sqs.us-west-2.amazonaws.com')
sqs_client.send_message(queue_url: 'https://sqs.us-west-2.amazonaws.com/<account_id>/test-queue', message_body: 'test')

I notice that the lambda always times out at the send_message call.

I notice that I am able to make other calls like STS (for assume_role) and SNS (for publish), from the lambda using their respective VPC Endpoints for STS and SNS. The VPC Endpoints are all configured the same (security groups, etc.) using terraform code.

I also noticed that I am able to make the send_message call from an AWS SDK CLI, in an EC2 instance in the same VPC. So, the following does work.

aws sqs send-message --queue-url 'https://sqs.us-west-2.amazonaws.com/<account_id>/test-queue ' --endpoint-url 'https://vpce-<id>.sqs.us-west-2.vpce.amazonaws.com ' --message-body "this works"

This makes me believe that setting the custom endpoint for SQS is what is not working here.

Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version

aws-sdk-sqs 1.11.0

Version of Ruby, OS environment

Ruby 2.5 lambda runtime.
Amazon Linux (https://docs.aws.amazon.com/lambda/latest/dg/lambda-ruby.html)

@cjyclaire

This comment has been minimized.

Copy link
Contributor

@cjyclaire cjyclaire commented Sep 4, 2019

Thanks for the information!

Some Quick follow-up questions that might help debugging

  1. could you try set :http_wire_trace to true at the client so http wire log information can be available in lambda logs?
  2. using cli is outside from lambda environment, are you also seeing the timeout if making the call with Ruby SDK outside from lambda(such as from a local script?), this can help decoupling if it has to do with lambda environment configuration.
@hhk1989

This comment has been minimized.

Copy link
Author

@hhk1989 hhk1989 commented Sep 4, 2019

Hello, thanks for your response.

  1. I set :http_wire_trace as indicated and got the following response.
sqs_client = Aws::SQS::Client.new(
        credentials: Aws::Credentials.new(
            resp.credentials.access_key_id,
            resp.credentials.secret_access_key,
            resp.credentials.session_token,
        ),
        endpoint: 'https://vpce-<id>.sqs.us-west-2.vpce.amazonaws.com',
        http_wire_trace: true
)

sqs_client.send_message({
        queue_url: queue_url,
        message_body: 'test'
})

Output: opening connection to sqs.us-west-2.amazonaws.com:443....

So, it looks like the aws-ruby-sdk is not honoring the custom_endpoint and connecting to sqs.amazonaws.com

  1. Yes, it did time out when I ran from the Ruby SDK, outside of lambda. I tried again, with aws-sdk-sqs (1.22.0), and :http_wire_trace: true and saw the same output. opening connection to sqs.us-west-2.amazonaws.com:443...
@cjyclaire

This comment has been minimized.

Copy link
Contributor

@cjyclaire cjyclaire commented Sep 4, 2019

Hmmm that's weird, cannot reproduce locally
image

as you can see there, it's making call to foo.com

@cjyclaire

This comment has been minimized.

Copy link
Contributor

@cjyclaire cjyclaire commented Sep 4, 2019

Ahh I see, it's for :queue_url, we have the plugin here for updating it

@cjyclaire

This comment has been minimized.

Copy link
Contributor

@cjyclaire cjyclaire commented Sep 4, 2019

A bit clarification here, when :queue_url is provided, SDK will use that information to take care of endpoint and signing regions under the wood.
Quick question: Is your queue url also in the same vpc? could you try update queue url with the endpoint pattern and let me know if it works?

@hhk1989

This comment has been minimized.

Copy link
Author

@hhk1989 hhk1989 commented Sep 4, 2019

So, I tried replacing the https://sqs.us-west-2.amazonaws.com with the https endpoint of the SQS VPC Interface endpoint and I get the following error.

{
  "errorMessage": "Credential should be scoped to a valid region, not 'sqs'. ",
  "errorType": "Function<Aws::SQS::Errors::SignatureDoesNotMatch>",
  "stackTrace": [
    "/var/runtime/gems/aws-sdk-core-3.47.0/lib/seahorse/client/plugins/raise_response_errors.rb:15:in `call'",
    "/var/runtime/gems/aws-sdk-core-3.47.0/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:20:in `call'",
    "/var/runtime/gems/aws-sdk-core-3.47.0/lib/aws-sdk-core/plugins/idempotency_token.rb:17:in `call'",
    "/var/runtime/gems/aws-sdk-core-3.47.0/lib/aws-sdk-core/plugins/param_converter.rb:24:in `call'",
    "/var/runtime/gems/aws-sdk-core-3.47.0/lib/aws-sdk-core/plugins/response_paging.rb:10:in `call'",
    "/var/runtime/gems/aws-sdk-core-3.47.0/lib/seahorse/client/plugins/response_target.rb:23:in `call'",
    "/var/runtime/gems/aws-sdk-core-3.47.0/lib/seahorse/client/request.rb:70:in `send_request'",
    "/var/runtime/gems/aws-sdk-sqs-1.11.0/lib/aws-sdk-sqs/client.rb:1715:in `send_message'",
    "/var/task/lambda_function.rb:34:in `lambda_handler'"
  ]
}

Even when I try just a sqs_client.list_queues (where queue_url is not a param), I get

{
  "errorMessage": "Access to the resource https://vpce-<id>.sqs.us-west-2.vpce.amazonaws.com/ is denied.",
  "errorType": "Function<Aws::SQS::Errors::AccessDenied>",
  "stackTrace": [
    "/var/runtime/gems/aws-sdk-core-3.47.0/lib/seahorse/client/plugins/raise_response_errors.rb:15:in `call'",
    "/var/runtime/gems/aws-sdk-core-3.47.0/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:20:in `call'",
    "/var/runtime/gems/aws-sdk-core-3.47.0/lib/aws-sdk-core/plugins/idempotency_token.rb:17:in `call'",
    "/var/runtime/gems/aws-sdk-core-3.47.0/lib/aws-sdk-core/plugins/param_converter.rb:24:in `call'",
    "/var/runtime/gems/aws-sdk-core-3.47.0/lib/aws-sdk-core/plugins/response_paging.rb:10:in `call'",
    "/var/runtime/gems/aws-sdk-core-3.47.0/lib/seahorse/client/plugins/response_target.rb:23:in `call'",
    "/var/runtime/gems/aws-sdk-core-3.47.0/lib/seahorse/client/request.rb:70:in `send_request'",
    "/var/runtime/gems/aws-sdk-sqs-1.11.0/lib/aws-sdk-sqs/client.rb:1194:in `list_queues'",
    "/var/task/lambda_function.rb:38:in `lambda_handler'"
  ]
}

Btw, let me know if you require more information from my end for this (since I noticed that the needs-response tag is still attached to the PR). Thanks for your help with this.

@hhk1989

This comment has been minimized.

Copy link
Author

@hhk1989 hhk1989 commented Sep 4, 2019

We are planning on using the Private DNS feature of VPC Interface Endpoints (https://docs.aws.amazon.com/vpc/latest/userguide/vpce-interface.html), so this is not an issue for callers from within the VPC where the endpoints are located. (since we don't have to set the custom endpoint anyway in this case).

But it is potentially a problem for us for SQS callers from our DC, as we are still trying to figure out how to work with the private DNS feature and support both callers who want to go through the Private Link and those who just want to use the public SQS endpoint.

@Zpa48

This comment has been minimized.

Copy link

@Zpa48 Zpa48 commented Sep 6, 2019

We have the same problem. We are unable to access the SQS VPC endpoint from the on-premisses servers via DirectConnect.
When setting the queue_url the SDK queue_urls.rb plugin will extract the wrong region from the url queue_urls.rb. Instead of the region it extracts the string:

sqs

Then we get the following error:
Aws::SQS::Errors::ServiceError ... retrying SQS request with exponential backoff {:queue=>"sqs-endpoint-testing", :sleep_time=>1, :error=>#<Aws::SQS::Errors::SignatureDoesNotMatch: Credential should be scoped to a valid region, not 'sqs'. >} .
If I change the code in queue_urls.rb to if region = url.to_s.split('.')[2] is works as expected.

@cjyclaire

This comment has been minimized.

Copy link
Contributor

@cjyclaire cjyclaire commented Sep 17, 2019

Thanks for the information! In the long term, we probably should provide an option to disable the plugin per client, for quick workaround for the signing region, could you try set :sigv4_region for the client as well? such as sigv4_region: 'us-west-2'

Mind trying that and let me know if it works? thoughts?

@Zpa48

This comment has been minimized.

Copy link

@Zpa48 Zpa48 commented Sep 19, 2019

If I remove the changes from queue_urls.rb (#2114 (comment)) and set the sigv4_region for the client like:
aws_options_hash[:sigv4_region] = "eu-west-1"
@logger.info("AWS Options", :aws_options_hash => aws_options_hash)
Aws::SQS::Client.new(aws_options_hash)

Log output:
AWS Options {:aws_options_hash=>{:credentials=>#<Aws::Credentials access_key_id="XXXXX">, :sigv4_region=>"eu-west-1", :region=>"eu-west-1", :endpoint=>"https://vpce-yyyyyyy-zzzzz.sqs.eu-west-1.vpce.amazonaws.com"}}
I still get the same error:
Aws::SQS::Errors::ServiceError ... retrying SQS request with exponential backoff {:queue=>"sqs-endpoint-testing-onprem", :sleep_time=>1, :error=>#<Aws::SQS::Errors::SignatureDoesNotMatch: Credential should be scoped to a valid region, not 'sqs'. >}

Maybe I'm not setting the sigv4_region in the proper way.
Note: I'm using ruby SDK via the logstash SQS input plugin, just made minor changes to the plugin.

@cjyclaire

This comment has been minimized.

Copy link
Contributor

@cjyclaire cjyclaire commented Oct 17, 2019

Apologies for the late reply, thanks for those informations.
I think SDK should be considering an easy option for disabling this plugin for different environment, will track this on our list.

Meanwhile a quick hack workaround:

klient = Aws::SQS::Client
klient.remove_plugin(Aws::SQS::Plugins::QueueUrls)
sqs_client = klient.new( # my options )
@mattbooks

This comment has been minimized.

Copy link

@mattbooks mattbooks commented Oct 28, 2019

That plugin always sets the endpoint to the queue_url — I'm a little confused about why this is categorized as a feature request, the endpoint configuration (which is documented) does not work at all unless you jump through hoops to disable this plugin.

@cjyclaire

This comment has been minimized.

Copy link
Contributor

@cjyclaire cjyclaire commented Oct 28, 2019

This is a feature request for allowing this to work easier in Lamda enviornment with VPC setting/proxy, where client is not talking to default AWS endpoint etc.. If you don't have trouble using default behavior for SQS currently, you don't need to change anything either :)

@walterking

This comment has been minimized.

Copy link

@walterking walterking commented Oct 29, 2019

but when you arent using the default endpoint(such as in a datacenter where we cant), the behavior is broken because it ignores what you set, hence it should be a bug

@cjyclaire

This comment has been minimized.

Copy link
Contributor

@cjyclaire cjyclaire commented Oct 29, 2019

@walterking @mattbooks Apologies for misinterpreted your original comments, I thought you were asking whether you would need to skip this, it was originally tagged as feature request as it was for special environment where default AWS endpoint is not followed.

Fully understood that broken endpoint experience is frustrating, I have PR opened for addressing this issue, which is now under review

@hhk1989

This comment has been minimized.

Copy link
Author

@hhk1989 hhk1989 commented Oct 29, 2019

Awesome, thanks @cjyclaire. We did see a similar issue in using the SQS SDK in Java - looks like this is the equivalent plugin there. (https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-sqs/src/main/java/com/amazonaws/services/sqs/QueueUrlHandler.java)

Do you need me to open a new issue there or is this something that will be fixed in Java as well ?

@cjyclaire

This comment has been minimized.

Copy link
Contributor

@cjyclaire cjyclaire commented Oct 29, 2019

@hhk1989 Thanks so much for your patience! I'd appreciate it if you could open an issue in their repo so they can be notified timely :)
I believe this the a cross SDK and CLI behavior, I'll bring the message to the team as well.

@hhk1989

This comment has been minimized.

Copy link
Author

@hhk1989 hhk1989 commented Oct 29, 2019

Thanks @cjyclaire. As referenced, I have opened aws/aws-sdk-java#2135 for Java. It's interesting that the Go SDK does seem to work correctly as intended :)

@cjyclaire

This comment has been minimized.

Copy link
Contributor

@cjyclaire cjyclaire commented Oct 29, 2019

Thanks! Good to know, thanks for the update also :D

@cjyclaire

This comment has been minimized.

Copy link
Contributor

@cjyclaire cjyclaire commented Nov 7, 2019

With more information gathered from AWS support case, PR #2156 is addressing the fix.
SDK should support VPC Endpoint Queue URL pattern, do the correct region parsing, after the PR fix, you should be able to do (instead of hardcode the endpoint per client):

sqs = Aws::SQS::Client.new(region: 'my-region')
sqs.send_message(
  queue_url: "https://vpcs-xxx-yyy.sqs.my-region.vpce .....",
  ...
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.