Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-sigv4 does not work with URLs containing "=" and other special characters #13754

Closed
ashtuchkin opened this issue May 23, 2024 · 3 comments
Closed

Comments

@ashtuchkin
Copy link

I did this

I'm trying to use --aws-sigv4 to download S3 objects containing "=" in their key. Example data/asset_id=my-asset/dt=2024-05-22/data.parquet - this is called hive partitioning scheme and is popular in data lakes.

Here's how I call it:

BUCKET="lasca-prod"
KEY="data/asset_id=my-asset/dt=2024-05-22/data.parquet"
curl --aws-sigv4 "aws:amz:us-west-2:s3" \
    --user "${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}" \
    -H "x-amz-security-token: ${AWS_SESSION_TOKEN}" \
    "https://s3.us-west-2.amazonaws.com/$BUCKET/$KEY"

Curl 8.8 returns the following error (formatted for readability):

<?xml version="1.0" encoding="UTF-8"?>
<Error>
    <Code>SignatureDoesNotMatch</Code>
    <Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message>
    <AWSAccessKeyId>snip</AWSAccessKeyId>
    <StringToSign>AWS4-HMAC-SHA256
20240523T061547Z
20240523/us-west-2/s3/aws4_request
ce199afb9ea57628628a3d3254934e1ee480977872d06b3da40baa5ae4d2f9a7</StringToSign>
    <SignatureProvided>snip</SignatureProvided>
    <StringToSignBytes>snip</StringToSignBytes>
    <CanonicalRequest>GET
/lasca-prod/data/asset_id%3Dmy-asset/dt%3D2024-05-22/data.parquet

host:s3.us-west-2.amazonaws.com
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20240523T061547Z
x-amz-security-token:snip 

host;x-amz-content-sha256;x-amz-date;x-amz-security-token
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855</CanonicalRequest>
  ...
</Error>

Note that the path in CanonicalRequest is supposed to be quoted ("=" changed to "%3D"), whereas curl is not quoting it (confirmed by debug tracing). AWS docs say we need to apply their special URIEncode function to all the components of the canonical request.

I've created a basic patch to see if this is the actual problem and after applying it, the signature is accepted by AWS S3:

diff --git a/lib/http_aws_sigv4.c b/lib/http_aws_sigv4.c
index 98cc033..18567b4 100644
--- a/lib/http_aws_sigv4.c
+++ b/lib/http_aws_sigv4.c
@@ -423,6 +423,19 @@ static int compare_func(const void *a, const void *b)

 #define MAX_QUERYPAIRS 64

+static CURLcode canon_path(struct Curl_easy *data, const char *path, struct dynbuf *dq)
+{
+  CURLcode result = CURLE_OK;
+  for (const char *p = path; *p && !result; p++) {
+    if (*p == '=') {
+        result = Curl_dyn_addn(dq, "%3D", 3);
+    } else {
+        result = Curl_dyn_addn(dq, p, 1);
+    }
+  }
+  return result;
+}
+
 static CURLcode canon_query(struct Curl_easy *data,
                             const char *query, struct dynbuf *dq)
 {
@@ -540,6 +553,7 @@ CURLcode Curl_output_aws_sigv4(struct Curl_easy *data, bool proxy)
   struct dynbuf canonical_headers;
   struct dynbuf signed_headers;
   struct dynbuf canonical_query;
+  struct dynbuf canonical_path;
   char *date_header = NULL;
   Curl_HttpReq httpreq;
   const char *method = NULL;
@@ -569,6 +583,7 @@ CURLcode Curl_output_aws_sigv4(struct Curl_easy *data, bool proxy)
   /* we init those buffers here, so goto fail will free initialized dynbuf */
   Curl_dyn_init(&canonical_headers, CURL_MAX_HTTP_HEADER);
   Curl_dyn_init(&canonical_query, CURL_MAX_HTTP_HEADER);
+  Curl_dyn_init(&canonical_path, CURL_MAX_HTTP_HEADER);
   Curl_dyn_init(&signed_headers, CURL_MAX_HTTP_HEADER);

   /*
@@ -696,6 +711,10 @@ CURLcode Curl_output_aws_sigv4(struct Curl_easy *data, bool proxy)
   date[sizeof(date) - 1] = 0;

   result = canon_query(data, data->state.up.query, &canonical_query);
+  if(result)
+    goto fail;
+
+  result = canon_path(data, data->state.up.path, &canonical_path);
   if(result)
     goto fail;
   result = CURLE_OUT_OF_MEMORY;
@@ -708,7 +727,7 @@ CURLcode Curl_output_aws_sigv4(struct Curl_easy *data, bool proxy)
                   "%s\n" /* SignedHeaders */
                   "%.*s",  /* HashedRequestPayload in hex */
                   method,
-                  data->state.up.path,
+                  Curl_dyn_ptr(&canonical_path),
                   Curl_dyn_ptr(&canonical_query) ?
                   Curl_dyn_ptr(&canonical_query) : "",
                   Curl_dyn_ptr(&canonical_headers),

This seems similar to a known problem 16.1 aws-sigv4 does not sign requests with * correctly. I see that there was a PR to fix it, but it's now dead.

My intent with creating this issue is 1) to say that the scope of the bug is not limited to "*", but applies to any URL with any characters except alphanumeric and -, ., _ and ~, 2) hopefully resurrect the push to fix it.

I expected the following

The command above should work as-is.

curl/libcurl version

curl 8.8.0 (x86_64-pc-linux-gnu) libcurl/8.8.0 OpenSSL/3.3.0 zlib/1.3.1 brotli/1.1.0 zstd/1.5.6 c-ares/1.28.1 libidn2/2.3.7 libpsl/0.21.5 libssh2/1.11.0 nghttp2/1.62.1 nghttp3/1.3.0
Release-Date: 2024-05-22
Protocols: dict file ftp ftps gopher gophers http https imap imaps ipfs ipns mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp ws wss
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTP3 HTTPS-proxy IDN IPv6 Largefile libz NTLM PSL SSL threadsafe TLS-SRP TrackMemory UnixSockets zstd

operating system

Linux 5.15.0-1058-aws #64~20.04.1-Ubuntu SMP Tue Apr 9 11:12:27 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

@ashtuchkin
Copy link
Author

ashtuchkin commented May 23, 2024

Note, I've also tried to quote the path manually, e.g. using KEY="data/asset_id%3Dmy-asset/dt%3D2024-05-22/data.parquet" and is was signed correctly without any changes. Unfortunately, AWS S3 then says that there's no file.

@bagder
Copy link
Member

bagder commented May 24, 2024

/cc @outscale-mgo

outscale-mgo added a commit to outscale-mgo/curl that referenced this issue May 28, 2024
I refactor canon_query, so I could use the encoding part
of the function to use it in the path.

As the path doesn't encode '/', but encode '=', I had to
add some conditions to know If I was doing the query or path encoding.

Also, instead of adding a `bool in_path` variable, I use `bool *found_equals`
to know if the function was called for the query or path,
as found_equals is used only in query_encoding.

fix: curl#13754

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
outscale-mgo added a commit to outscale-mgo/curl that referenced this issue May 28, 2024
I refactor canon_query, so I could use the encoding part
of the function to use it in the path.

As the path doesn't encode '/', but encode '=', I had to
add some conditions to know If I was doing the query or path encoding.

Also, instead of adding a `bool in_path` variable, I use `bool *found_equals`
to know if the function was called for the query or path,
as found_equals is used only in query_encoding.

fix: curl#13754

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
outscale-mgo added a commit to outscale-mgo/curl that referenced this issue May 28, 2024
I refactor canon_query, so I could use the encoding part
of the function to use it in the path.

As the path doesn't encode '/', but encode '=', I had to
add some conditions to know If I was doing the query or path encoding.

Also, instead of adding a `bool in_path` variable, I use `bool *found_equals`
to know if the function was called for the query or path,
as found_equals is used only in query_encoding.

fix: curl#13754

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
outscale-mgo added a commit to outscale-mgo/curl that referenced this issue May 28, 2024
I refactor canon_query, so I could use the encoding part
of the function to use it in the path.

As the path doesn't encode '/', but encode '=', I had to
add some conditions to know If I was doing the query or path encoding.

Also, instead of adding a `bool in_path` variable, I use `bool *found_equals`
to know if the function was called for the query or path,
as found_equals is used only in query_encoding.

fix: curl#13754

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
outscale-mgo added a commit to outscale-mgo/curl that referenced this issue May 28, 2024
I refactor canon_query, so I could use the encoding part
of the function to use it in the path.

As the path doesn't encode '/', but encode '=', I had to
add some conditions to know If I was doing the query or path encoding.

Also, instead of adding a `bool in_path` variable, I use `bool *found_equals`
to know if the function was called for the query or path,
as found_equals is used only in query_encoding.

fix: curl#13754

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
outscale-mgo added a commit to outscale-mgo/curl that referenced this issue May 28, 2024
I refactor canon_query, so I could use the encoding part
of the function to use it in the path.

As the path doesn't encode '/', but encode '=', I had to
add some conditions to know If I was doing the query or path encoding.

Also, instead of adding a `bool in_path` variable, I use `bool *found_equals`
to know if the function was called for the query or path,
as found_equals is used only in query_encoding.

fix: curl#13754

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
outscale-mgo added a commit to outscale-mgo/curl that referenced this issue May 28, 2024
I refactor canon_query, so I could use the encoding part
of the function to use it in the path.

As the path doesn't encode '/', but encode '=', I had to
add some conditions to know If I was doing the query or path encoding.

Also, instead of adding a `bool in_path` variable, I use `bool *found_equals`
to know if the function was called for the query or path,
as found_equals is used only in query_encoding.

fix: curl#13754

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
@bagder bagder closed this as completed in 768909d May 29, 2024
@ashtuchkin
Copy link
Author

Thank you! Confirmed master branch works for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants