Empty Accept-Encoding: Header doesn't override CURLOPT_ACCEPT_ENCODING on 2nd request with reused handle #785

Closed
rcanavan opened this Issue Apr 29, 2016 · 15 comments

Projects

None yet

3 participants

@rcanavan

The test program below (modified from examples/httpcustomheader.c) attempts to send 2 requests to a host, and reuse the curl handle. On the second request, it attempts to disable CURLOPT_ACCEPT_ENCODING by explicitly setting the Accept-Encoding:-header without a value.

This works as expected (i.e. no Accept-Encoding header is sent in the request) only if a "fresh" handle is used or no previous request with that handle has ever actually sent a request with the Accept-Encoding header set via CURLOPT_ACCEPT_ENCODING.

#include <stdio.h>
#include <curl/curl.h>
CURL *curl;

void request(char * url, int compress)
{
  CURLcode res;

  if(curl) {
    struct curl_slist *chunk = NULL;
    curl_easy_setopt(curl, CURLOPT_URL, url);
    res = curl_easy_setopt(curl, CURLOPT_ACCEPT_ENCODING, "");
    if (compress == 0) {
        chunk = curl_slist_append(chunk, "Accept-Encoding: ");
    }
    res = curl_easy_setopt(curl, CURLOPT_HTTPHEADER, chunk);
    //curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);

    res = curl_easy_perform(curl);
    if(res != CURLE_OK) fprintf(stderr, "curl_easy_perform() failed: %s\n", curl_easy_strerror(res));

    curl_easy_reset(curl);
    curl_slist_free_all(chunk);
  }
}

int main(void)
{
  curl_global_init(CURL_GLOBAL_ALL);
  curl = curl_easy_init();
  request("http://localhost:14080/headers.php", 1);
  request("http://localhost:14080/headers.php", 0);
  curl_easy_cleanup(curl);
  return 0;
}
@bagder bagder added the HTTP label Apr 29, 2016
@bagder
Member
bagder commented Apr 29, 2016

The problem here is that CURLOPT_ACCEPT_ENCODING set to an empty string has a meaning other than please send a blank header. The man page says:

If a zero-length string is set like "", then an Accept-Encoding: header containing all built-in supported encodings is sent.

So you're really asking for two mutually exclusive things in that request and it's not immediately clear which of the two instructions that should override the other...

@bagder
Member
bagder commented May 1, 2016

Could we perhaps make this clearer in the documentation?

@bagder bagder added the documentation label May 1, 2016
@frenche
Contributor
frenche commented May 1, 2016

Perhaps it would help to mention that one may set CURLOPT_ACCEPT_ENCODING to a NULL pointer in order to disable it (both sending the header and auto decoding) in case it was previously enabled on the given easy handle.

BTW, I find the statement "identity, which does nothing" to be a bit misleading, as setting 'identity' does in fact enable auto decoding (e.g. if the header was overridden, or if the server sends gzip despite what the client has specified) and regardless, sending the header with 'identity' only isn't exactly nothing.
Also worth mentioning that the content length header received may not match the actual length of the data in case auto decoding is enabled (I reached the above understanding by troubleshooting a customer issue with our app).

@bagder bagder added a commit that referenced this issue May 1, 2016
@bagder bagder CURLOPT_ACCEPT_ENCODING.3: clarified
As discussed in #785
ffd0e61
@bagder
Member
bagder commented May 1, 2016

Ok, I committed an update to this man page, have a look at CURLOPT_ACCEPT_ENCODING for the new version.

Better? Anything lacking now?

@frenche frenche added a commit to frenche/curl that referenced this issue May 1, 2016
@frenche frenche Follow up clarification on CURLOPT_ACCEPT_ENCODING
Clarify that CURLOPT_ACCEPT_ENCODING has to be explicitly disabled
only if it was previously enabled (as the default is NULL).

Mention possible content-length mismatch with sum of bytes reported
by write callbacks when auto decoding is enabled.

See #785
7a329b8
@frenche
Contributor
frenche commented May 1, 2016

Yea, but take a look at little adjustments: frenche@7a329b8

@bagder bagder added a commit that referenced this issue May 1, 2016
@frenche @bagder frenche + bagder CURLOPT_ACCEPT_ENCODING.3: Follow-up clarification
Mention possible content-length mismatch with sum of bytes reported
by write callbacks when auto decoding is enabled.

See #785
7987f5c
@bagder
Member
bagder commented May 1, 2016

Thanks! I merged the second part of that change. I'm not comfortable with adding the "if it was set to something else before you can disable it again with" text. That is sort of implied if you need to disable it, and I would rather not add that extra text for every man page description we have that tells how to switch an option back to default state.

@frenche
Contributor
frenche commented May 2, 2016

Make sense, thanks!

@bagder
Member
bagder commented May 2, 2016

@rcanavan, you happy with this explanation or are you looking for something else?

@frenche
Contributor
frenche commented May 2, 2016

Hmm, I've looked again at @rcanavan's example and gave it a quick run, and the behavior actually seem strange indeed.
It looks like if a connection is reused then setting empty "Accept-Encoding:" does not disable sending of the header (if I forbid reused it'd work as expected).
I'll try to take a closer look later on.

@bagder
Member
bagder commented May 2, 2016 edited

But again, you can't both set CURLOPT_ACCEPT_ENCODING (to non-NULL) and ask for disabling the header.

@frenche
Contributor
frenche commented May 2, 2016

But the doc of CURLOPT_HTTPHEADER says:
If you add a header that is otherwise generated and used by libcurl internally, your added one will be used instead. If you add a header with no content as in 'Accept:' (no data on the right side of the colon), the internally used header will get disabled.

And it does work as one would expect if a connection is not reused.

@bagder
Member
bagder commented May 2, 2016

Perhaps something like this makes it behave more deterministic?

diff --git a/lib/http.c b/lib/http.c
index f3805cc..2a7280d 100644
--- a/lib/http.c
+++ b/lib/http.c
@@ -1915,10 +1915,14 @@ CURLcode Curl_http(struct connectdata *conn, bool *done)
     conn->allocptr.accept_encoding =
       aprintf("Accept-Encoding: %s\r\n", data->set.str[STRING_ENCODING]);
     if(!conn->allocptr.accept_encoding)
       return CURLE_OUT_OF_MEMORY;
   }
+  else {
+    Curl_safefree(conn->allocptr.accept_encoding);
+    conn->allocptr.accept_encoding = NULL;
+  }

 #ifdef HAVE_LIBZ
   /* we only consider transfer-encoding magic if libz support is built-in */

   if(!Curl_checkheaders(conn, "TE:") &&
@frenche
Contributor
frenche commented May 2, 2016

This works, and it looks correct as well.

@rcanavan
rcanavan commented May 2, 2016

That patch does indeed make libcurl behave consistently, not only accross reused connections, but also when adding values to Accept-Encoding: as opposed to trying to unset it. And it is exactly this consistency that I was looking for, and the assurance that other headers, when set first with a value via CURLOPT_HTTPHEADER and then unset, behave just as consistently when reusing a handle.

@bagder bagder added a commit that closed this issue May 2, 2016
@bagder bagder http: make sure a blank header overrides accept_decoding
Reported-by: rcanavan
Assisted-by: Isaac Boukris
Closes #785
96eb9a8
@bagder bagder closed this in 96eb9a8 May 2, 2016
@bagder
Member
bagder commented May 2, 2016

Yeah sorry for overseeing the consistency argument originally. I've just merged this.

Thanks both of you for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment