Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQS calls times out on EC2 with .NET, not boto3 #1339

Closed
esomir opened this issue Jul 9, 2019 · 6 comments
Closed

SQS calls times out on EC2 with .NET, not boto3 #1339

esomir opened this issue Jul 9, 2019 · 6 comments
Labels
bug This issue is a bug. closed-for-staleness module/sdk-core response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.

Comments

@esomir
Copy link

esomir commented Jul 9, 2019

I'm using the .NET AWS SDK for receiving messages on SQS, and I am experiencing some issues.
The call consistently times out on my EC2 instance, but works fine locally.

My EC2 uses IAM role based authentication on the instance whereas locally I use AWS tokens. I've tried AWS tokens on the AmazonSQSClient as well on the EC2, and it's still timing out. So I don't think it's an issue of receiving the permissions.

I run .NET Core in a Docker container, and it has access to the metadata endpoint for getting the IAM info:

root@f5772e5e1b60:/# curl http://169.254.169.254/latest/meta-data/iam/info
{
  "Code" : "Success",
  "LastUpdated" : "2019-07-09T06:47:54Z",
  "InstanceProfileArn" : "arn:aws:iam::XXXXXXXX:instance-profile/SQSSendSubscribeTrainingWebsite",
  "InstanceProfileId" : "XXXXXXX"
}

On the EC2, when I use boto3 do run the same operations, it works perfectly. So it's isolated to the .NET Core implementation.

.NET Core (works locally,but not on EC2):

static async Task Main(string[] args)
        {
            var sqsClient = new AmazonSQSClient("","",new AmazonSQSConfig()
            {
                RegionEndpoint = RegionEndpoint.EUWest1,
                ServiceURL = "http://sqs.eu-west-1.amazonaws.com"
            });
            
            var response = await sqsClient.ReceiveMessageAsync(new ReceiveMessageRequest()
            {
                QueueUrl = "https://sqs.eu-west-1.amazonaws.com/XXXXXXXXXXX/training-tool-extraction",
                AttributeNames = new List<string>(){"SentTimestamp"},
                MessageAttributeNames = new List<string>(){"All"}
            });
            
            Console.WriteLine(response.HttpStatusCode);
        }

Output:

Unhandled Exception: System.OperationCanceledException: The operation was canceled.
   at System.Net.Http.HttpClient.HandleFinishSendAsyncError(Exception e, CancellationTokenSource cts)
   at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Amazon.Runtime.HttpWebRequestMessage.GetResponseAsync(CancellationToken cancellationToken) in D:\JenkinsWorkspaces\trebuchet-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\HttpHandler\_mobile\HttpRequestMessageFactory.cs:line 524
   at Amazon.Runtime.Internal.HttpHandler`1.InvokeAsync[T](IExecutionContext executionContext) in D:\JenkinsWorkspaces\trebuchet-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\HttpHandler\HttpHandler.cs:line 175
   at Amazon.Runtime.Internal.Unmarshaller.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.SQS.Internal.ValidationResponseHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.ErrorHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.EndpointDiscoveryHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.EndpointDiscoveryHandler.InvokeAsync[T](IExecutionContext executionContext) in D:\JenkinsWorkspaces\trebuchet-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\Handlers\EndpointDiscoveryHandler.cs:line 79
   at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext) in D:\JenkinsWorkspaces\trebuchet-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\Handlers\CredentialsRetriever.cs:line 98
   at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext) in D:\JenkinsWorkspaces\trebuchet-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\RetryHandler\RetryHandler.cs:line 137
   at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.ErrorCallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.MetricsHandler.InvokeAsync[T](IExecutionContext executionContext)
   at test.Program.Main(String[] args) in /test/Program.cs:line 22
   at test.Program.<Main>(String[] args)

Boto3 (works locally and on EC2):

sqs = boto3.client('sqs', region_name='eu-west-1')

queue_url = 'https://sqs.eu-west-1.amazonaws.com/XXXXXXXXXX/training-tool-extraction'

response = sqs.receive_message(
    QueueUrl=queue_url,
    AttributeNames=[
        'SentTimestamp'
    ],
    MessageAttributeNames=[
        'All'
    ]
)

print(response)

Output:

{'ResponseMetadata': {'RequestId': 'XXXX-XXX-XXXX-XXXX-XXXX', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'XXXX-XXXX-XXXX-XXXXX-XXXXX', 'date': 'Tue, 09 Jul 2019 07:09:23 GMT', 'content-type': 'text/xml', 'content-length': '240'}, 'RetryAttempts': 0}}

Any ideas on what can cause this? I don't think it's an issue resolving the URL, as I have it working in Boto3?

<Project Sdk="Microsoft.NET.Sdk">

    <PropertyGroup>
        <OutputType>Exe</OutputType>
        <TargetFramework>netcoreapp2.2</TargetFramework>
    </PropertyGroup>

    <ItemGroup>
      <PackageReference Include="AWSSDK.SQS" Version="3.3.100.43" />
    </ItemGroup>

</Project>
@klaytaybai
Copy link
Contributor

I was hoping this was a VPC Endpoint issue, but I don't think it is if you are succeeding with Boto. Which EC2 instance type are you using?

@klaytaybai klaytaybai added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jul 9, 2019
@esomir
Copy link
Author

esomir commented Jul 10, 2019

I was hoping this was a VPC Endpoint issue, but I don't think it is if you are succeeding with Boto. Which EC2 instance type are you using?

I was hoping that too, and I can't say for sure that it isn't an error from our side. We have VPC endpoints for SQS, so that should be set up correctly.

Our deployments are using AWS Fargate, but once I deployed to our staging environment I noticed that these calls began to fail. So I set up an EC2 (t3.medium) to debug it further (i.e. it's not just a problem on our EC2 instance).

I've tried both with the implicit auth (through the metadata endpoint and the EC2 IAM role) as well as providing access and secret keys. Both fails.

I've also tried using several of the VPC SQS endpoints we have.

Again, everything works perfectly local, so we only experienced the problem when deployed to AWS.

I'm not sure if boto3 does something else behind the scenes, but it really seems odd that this works.

@esomir
Copy link
Author

esomir commented Jul 10, 2019

I'm not sure what boto3 does, but it gets a valid address for the endpoint from somewhere.

Pinging the two VPC endpoints doesn't result in anything:

$ ping vpce-XXXXXXXX-XXXXXXXX.sqs.eu-west-1.vpce.amazonaws.com
PING vpce-XXXXXXXX-XXXXXXXX.sqs.eu-west-1.vpce.amazonaws.com (10.0.0.4) 56(84) bytes of data.
^C
--- vpce-XXXXXXXX-XXXXXXXX.sqs.eu-west-1.vpce.amazonaws.com ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3053ms

$ ping sqs.eu-west-1.amazonaws.com
PING sqs.eu-west-1.amazonaws.com (10.0.0.4) 56(84) bytes of data.
^C
--- sqs.eu-west-1.amazonaws.com ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2024ms

I tried sniffing the boto3 calls, and it immediately calls an IP address. This is a valid SQS endpoint. How does it resolve this?

Running the aws sqs cli command with this IP works perfectly:

$ aws sqs send-message --region eu-west-1 --endpoint-url http://XX.XX.XXX.XXX/ --queue-url http://XX.XX.XXX.XXX/XXXXXXXX/training-tool-extraction --message-body "Hello from Amazon SQS."

{
    "MD5OfMessageBody": "XXXXXXXX", 
    "MessageId": "XXXX-XXXX-XXXX-XXXX-XXXX"
}

I still cannot run the .NET call with that IP. It throws: Amazon.SQS.AmazonSQSException: Credential should be scoped to a valid region, not 'us-east-1'.

If I use the legacy endpoint (eu-west-1.queue.amazonaws.com), it works PERFECTLY in the .NET SDK:

namespace test
{
    class Program
    {
        static async Task Main(string[] args)
        {
            Console.WriteLine("starting...");
            var sqsClient = new AmazonSQSClient(new AmazonSQSConfig()
            {
                RegionEndpoint = RegionEndpoint.EUWest1,
                ServiceURL = "https://eu-west-1.queue.amazonaws.com"
            });

            var response = await sqsClient.ReceiveMessageAsync(new ReceiveMessageRequest()
            {
                QueueUrl = "https://eu-west-1.queue.amazonaws.com/XXXXXXXXXX/training-tool-extraction",
                AttributeNames = new List<string>(){"SentTimestamp"},
                MessageAttributeNames = new List<string>(){"All"}
            });

            Console.WriteLine(response.HttpStatusCode);
        }
    }
}

@klaytaybai
Copy link
Contributor

Thanks for that extra info. I'm not sure why the legacy endpoint would be working and the new format isn't, but that should be very helpful in the investigation.

@klaytaybai klaytaybai added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Jul 10, 2019
@klaytaybai klaytaybai removed the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Dec 2, 2019
@ashishdhingra ashishdhingra added bug This issue is a bug. module/sdk-custom needs-triage This issue or PR still needs to be triaged. module/sdk-core and removed module/sdk-custom labels Aug 7, 2020
@hunanniu hunanniu added the B label Oct 15, 2020
@ashishdhingra
Copy link
Contributor

Hi @esomir,

Good afternoon.

Please let us know the EC2 instance load and error frequency while you are getting this error. There is already a bug #1438 linked to this issue which talks about similar issue.

Thanks,
Ashish

@ashishdhingra ashishdhingra added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-triage This issue or PR still needs to be triaged. labels Oct 15, 2020
@github-actions
Copy link

This issue has not recieved a response in 2 weeks. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Oct 30, 2020
@github-actions github-actions bot added closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Nov 6, 2020
@github-actions github-actions bot closed this as completed Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. closed-for-staleness module/sdk-core response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.
Projects
None yet
Development

No branches or pull requests

4 participants