New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty Accept-Encoding: Header doesn't override CURLOPT_ACCEPT_ENCODING on 2nd request with reused handle #785

Closed
rcanavan opened this Issue Apr 29, 2016 · 15 comments

Comments

Projects
None yet
3 participants
@rcanavan

The test program below (modified from examples/httpcustomheader.c) attempts to send 2 requests to a host, and reuse the curl handle. On the second request, it attempts to disable CURLOPT_ACCEPT_ENCODING by explicitly setting the Accept-Encoding:-header without a value.

This works as expected (i.e. no Accept-Encoding header is sent in the request) only if a "fresh" handle is used or no previous request with that handle has ever actually sent a request with the Accept-Encoding header set via CURLOPT_ACCEPT_ENCODING.

#include <stdio.h>
#include <curl/curl.h>
CURL *curl;

void request(char * url, int compress)
{
  CURLcode res;

  if(curl) {
    struct curl_slist *chunk = NULL;
    curl_easy_setopt(curl, CURLOPT_URL, url);
    res = curl_easy_setopt(curl, CURLOPT_ACCEPT_ENCODING, "");
    if (compress == 0) {
        chunk = curl_slist_append(chunk, "Accept-Encoding: ");
    }
    res = curl_easy_setopt(curl, CURLOPT_HTTPHEADER, chunk);
    //curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);

    res = curl_easy_perform(curl);
    if(res != CURLE_OK) fprintf(stderr, "curl_easy_perform() failed: %s\n", curl_easy_strerror(res));

    curl_easy_reset(curl);
    curl_slist_free_all(chunk);
  }
}

int main(void)
{
  curl_global_init(CURL_GLOBAL_ALL);
  curl = curl_easy_init();
  request("http://localhost:14080/headers.php", 1);
  request("http://localhost:14080/headers.php", 0);
  curl_easy_cleanup(curl);
  return 0;
}

@bagder bagder added the HTTP label Apr 29, 2016

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Apr 29, 2016

Member

The problem here is that CURLOPT_ACCEPT_ENCODING set to an empty string has a meaning other than please send a blank header. The man page says:

If a zero-length string is set like "", then an Accept-Encoding: header containing all built-in supported encodings is sent.

So you're really asking for two mutually exclusive things in that request and it's not immediately clear which of the two instructions that should override the other...

Member

bagder commented Apr 29, 2016

The problem here is that CURLOPT_ACCEPT_ENCODING set to an empty string has a meaning other than please send a blank header. The man page says:

If a zero-length string is set like "", then an Accept-Encoding: header containing all built-in supported encodings is sent.

So you're really asking for two mutually exclusive things in that request and it's not immediately clear which of the two instructions that should override the other...

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder May 1, 2016

Member

Could we perhaps make this clearer in the documentation?

Member

bagder commented May 1, 2016

Could we perhaps make this clearer in the documentation?

@bagder bagder added the documentation label May 1, 2016

@frenche

This comment has been minimized.

Show comment
Hide comment
@frenche

frenche May 1, 2016

Contributor

Perhaps it would help to mention that one may set CURLOPT_ACCEPT_ENCODING to a NULL pointer in order to disable it (both sending the header and auto decoding) in case it was previously enabled on the given easy handle.

BTW, I find the statement "identity, which does nothing" to be a bit misleading, as setting 'identity' does in fact enable auto decoding (e.g. if the header was overridden, or if the server sends gzip despite what the client has specified) and regardless, sending the header with 'identity' only isn't exactly nothing.
Also worth mentioning that the content length header received may not match the actual length of the data in case auto decoding is enabled (I reached the above understanding by troubleshooting a customer issue with our app).

Contributor

frenche commented May 1, 2016

Perhaps it would help to mention that one may set CURLOPT_ACCEPT_ENCODING to a NULL pointer in order to disable it (both sending the header and auto decoding) in case it was previously enabled on the given easy handle.

BTW, I find the statement "identity, which does nothing" to be a bit misleading, as setting 'identity' does in fact enable auto decoding (e.g. if the header was overridden, or if the server sends gzip despite what the client has specified) and regardless, sending the header with 'identity' only isn't exactly nothing.
Also worth mentioning that the content length header received may not match the actual length of the data in case auto decoding is enabled (I reached the above understanding by troubleshooting a customer issue with our app).

bagder added a commit that referenced this issue May 1, 2016

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder May 1, 2016

Member

Ok, I committed an update to this man page, have a look at CURLOPT_ACCEPT_ENCODING for the new version.

Better? Anything lacking now?

Member

bagder commented May 1, 2016

Ok, I committed an update to this man page, have a look at CURLOPT_ACCEPT_ENCODING for the new version.

Better? Anything lacking now?

frenche added a commit to frenche/curl that referenced this issue May 1, 2016

Follow up clarification on CURLOPT_ACCEPT_ENCODING
Clarify that CURLOPT_ACCEPT_ENCODING has to be explicitly disabled
only if it was previously enabled (as the default is NULL).

Mention possible content-length mismatch with sum of bytes reported
by write callbacks when auto decoding is enabled.

See #785
@frenche

This comment has been minimized.

Show comment
Hide comment
@frenche

frenche May 1, 2016

Contributor

Yea, but take a look at little adjustments: frenche@7a329b8

Contributor

frenche commented May 1, 2016

Yea, but take a look at little adjustments: frenche@7a329b8

bagder added a commit that referenced this issue May 1, 2016

CURLOPT_ACCEPT_ENCODING.3: Follow-up clarification
Mention possible content-length mismatch with sum of bytes reported
by write callbacks when auto decoding is enabled.

See #785
@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder May 1, 2016

Member

Thanks! I merged the second part of that change. I'm not comfortable with adding the "if it was set to something else before you can disable it again with" text. That is sort of implied if you need to disable it, and I would rather not add that extra text for every man page description we have that tells how to switch an option back to default state.

Member

bagder commented May 1, 2016

Thanks! I merged the second part of that change. I'm not comfortable with adding the "if it was set to something else before you can disable it again with" text. That is sort of implied if you need to disable it, and I would rather not add that extra text for every man page description we have that tells how to switch an option back to default state.

@frenche

This comment has been minimized.

Show comment
Hide comment
@frenche

frenche May 2, 2016

Contributor

Make sense, thanks!

Contributor

frenche commented May 2, 2016

Make sense, thanks!

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder May 2, 2016

Member

@rcanavan, you happy with this explanation or are you looking for something else?

Member

bagder commented May 2, 2016

@rcanavan, you happy with this explanation or are you looking for something else?

@frenche

This comment has been minimized.

Show comment
Hide comment
@frenche

frenche May 2, 2016

Contributor

Hmm, I've looked again at @rcanavan's example and gave it a quick run, and the behavior actually seem strange indeed.
It looks like if a connection is reused then setting empty "Accept-Encoding:" does not disable sending of the header (if I forbid reused it'd work as expected).
I'll try to take a closer look later on.

Contributor

frenche commented May 2, 2016

Hmm, I've looked again at @rcanavan's example and gave it a quick run, and the behavior actually seem strange indeed.
It looks like if a connection is reused then setting empty "Accept-Encoding:" does not disable sending of the header (if I forbid reused it'd work as expected).
I'll try to take a closer look later on.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder May 2, 2016

Member

But again, you can't both set CURLOPT_ACCEPT_ENCODING (to non-NULL) and ask for disabling the header.

Member

bagder commented May 2, 2016

But again, you can't both set CURLOPT_ACCEPT_ENCODING (to non-NULL) and ask for disabling the header.

@frenche

This comment has been minimized.

Show comment
Hide comment
@frenche

frenche May 2, 2016

Contributor

But the doc of CURLOPT_HTTPHEADER says:
If you add a header that is otherwise generated and used by libcurl internally, your added one will be used instead. If you add a header with no content as in 'Accept:' (no data on the right side of the colon), the internally used header will get disabled.

And it does work as one would expect if a connection is not reused.

Contributor

frenche commented May 2, 2016

But the doc of CURLOPT_HTTPHEADER says:
If you add a header that is otherwise generated and used by libcurl internally, your added one will be used instead. If you add a header with no content as in 'Accept:' (no data on the right side of the colon), the internally used header will get disabled.

And it does work as one would expect if a connection is not reused.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder May 2, 2016

Member

Perhaps something like this makes it behave more deterministic?

diff --git a/lib/http.c b/lib/http.c
index f3805cc..2a7280d 100644
--- a/lib/http.c
+++ b/lib/http.c
@@ -1915,10 +1915,14 @@ CURLcode Curl_http(struct connectdata *conn, bool *done)
     conn->allocptr.accept_encoding =
       aprintf("Accept-Encoding: %s\r\n", data->set.str[STRING_ENCODING]);
     if(!conn->allocptr.accept_encoding)
       return CURLE_OUT_OF_MEMORY;
   }
+  else {
+    Curl_safefree(conn->allocptr.accept_encoding);
+    conn->allocptr.accept_encoding = NULL;
+  }

 #ifdef HAVE_LIBZ
   /* we only consider transfer-encoding magic if libz support is built-in */

   if(!Curl_checkheaders(conn, "TE:") &&
Member

bagder commented May 2, 2016

Perhaps something like this makes it behave more deterministic?

diff --git a/lib/http.c b/lib/http.c
index f3805cc..2a7280d 100644
--- a/lib/http.c
+++ b/lib/http.c
@@ -1915,10 +1915,14 @@ CURLcode Curl_http(struct connectdata *conn, bool *done)
     conn->allocptr.accept_encoding =
       aprintf("Accept-Encoding: %s\r\n", data->set.str[STRING_ENCODING]);
     if(!conn->allocptr.accept_encoding)
       return CURLE_OUT_OF_MEMORY;
   }
+  else {
+    Curl_safefree(conn->allocptr.accept_encoding);
+    conn->allocptr.accept_encoding = NULL;
+  }

 #ifdef HAVE_LIBZ
   /* we only consider transfer-encoding magic if libz support is built-in */

   if(!Curl_checkheaders(conn, "TE:") &&
@frenche

This comment has been minimized.

Show comment
Hide comment
@frenche

frenche May 2, 2016

Contributor

This works, and it looks correct as well.

Contributor

frenche commented May 2, 2016

This works, and it looks correct as well.

@rcanavan

This comment has been minimized.

Show comment
Hide comment
@rcanavan

rcanavan May 2, 2016

That patch does indeed make libcurl behave consistently, not only accross reused connections, but also when adding values to Accept-Encoding: as opposed to trying to unset it. And it is exactly this consistency that I was looking for, and the assurance that other headers, when set first with a value via CURLOPT_HTTPHEADER and then unset, behave just as consistently when reusing a handle.

rcanavan commented May 2, 2016

That patch does indeed make libcurl behave consistently, not only accross reused connections, but also when adding values to Accept-Encoding: as opposed to trying to unset it. And it is exactly this consistency that I was looking for, and the assurance that other headers, when set first with a value via CURLOPT_HTTPHEADER and then unset, behave just as consistently when reusing a handle.

@bagder bagder closed this in 96eb9a8 May 2, 2016

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder May 2, 2016

Member

Yeah sorry for overseeing the consistency argument originally. I've just merged this.

Thanks both of you for your help!

Member

bagder commented May 2, 2016

Yeah sorry for overseeing the consistency argument originally. I've just merged this.

Thanks both of you for your help!

@lock lock bot locked as resolved and limited conversation to collaborators May 7, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.