crash: parse_response/2 #67

Closed
rflynn opened this Issue Apr 3, 2012 · 25 comments

Comments

Projects
None yet
3 participants

rflynn commented Apr 3, 2012

{'EXIT',
    {{function_clause,
         [{ibrowse_http_client,'-parse_response/2-lc$^0/1-0-',
              [<<>>],
              [{file,"src/ibrowse_http_client.erl"},
               {line,1012}]},
          {ibrowse_http_client,parse_response,2,
              [{file,"src/ibrowse_http_client.erl"},
               {line,1012}]},
          {ibrowse_http_client,handle_sock_data,2,
              [{file,"src/ibrowse_http_client.erl"},{line,285}]},
          {gen_server,handle_msg,5,
              [{file,"gen_server.erl"},{line,597}]},
          {proc_lib,init_p_do_apply,3,
              [{file,"proc_lib.erl"},{line,227}]}]},
     {gen_server,call,
         [<4646.13865.4>,
          {send_req,
              {{url,
                   "http://foohost:8010/foopath",
                   "foohost",8010,undefined,
                   undefined,
                   "/foopath",
                   http,hostname},
               [],head,<<>>,
               [{max_sessions,100},
                {max_pipeline_size,1000},
                {response_format,list}],
               30000}},
          30000]}}}}
Owner

cmullaparthi commented Apr 4, 2012

Thank you for the bug report. I see that you were trying to do a HEAD request? Can you please do the same with curl and send me the output? Alternatively, can you please turn tracing on in ibrowse and send me the output?

ibrowse:trace_on("foohost", 8010).

rflynn commented Apr 4, 2012

Thanks for the quick response. I am working on a reproducible test case.

rflynn commented Apr 4, 2012

I am having difficulty getting any trace output at all. It uses io:format and thus should go to the console, no?

Owner

cmullaparthi commented Apr 4, 2012

Can you send me your console screen dump please? Privately if you wish...
On Apr 4, 2012 4:23 PM, "Ryan Flynn" <
reply@reply.github.com>
wrote:

I am having difficulty getting any trace output at all. It uses io:format
and thus should go to the console, no?


Reply to this email directly or view it on GitHub:
#67 (comment)

rflynn commented Apr 4, 2012

Aha, I think I've found it:

2012-4-4_12:52:29:737 -- (foohost:8010) - Data recvd in state: get_header. Size: 144. <<"HTTP/1.1 400 Bad Request\r\nServer: Apache-Coyote/1.1\r\nTransfer-Encoding: chunked\r\n
Date: Wed, 04 Apr 2012 16:53:49 GMT\r\nConnection: close\r\n\r\n0\r\n\r\n">>
                                                                                                                                                                                             2012-4-4_12:52:29:738 -- (foohost:8010) - Recvd Header Data -> HTTP/1.1 400 Bad Request
Server: Apache-Coyote/1.1
Transfer-Encoding: chunked
Date: Wed, 04 Apr 2012 16:53:49 GMT
Connection: close
----
2012-4-4_12:52:29:738 -- (foohost:8010) - Recvd headers
--- Headers Begin ---
HTTP/1.1 400 Bad Request
Server: Apache-Coyote/1.1
Transfer-Encoding: chunked
Date: Wed, 04 Apr 2012 16:53:49 GMT
Connection: close
--- Headers End ---

2012-4-4_12:52:29:738 -- (foohost:8010) - HttpVsn: "HTTP/1.1" StatusCode: "400" Headers_1 -> [{"Server","Apache-Coyote/1.1"},{"Transfer-Encoding","chunked"},{"Date","Wed, 04 Apr 2012 16:53:49 GMT"},{"Connection","close"}]
2012-4-4_12:52:29:738 -- (foohost:8010) - Recvd Header Data -> 0
----
2012-4-4_12:52:29:738 -- (foohost:8010) - Recvd headers
--- Headers Begin ---
0
--- Headers End ---

2012-4-4_12:52:29:738 -- (foohost:8010) - HttpVsn: "HTTP/0.9" StatusCode: undefined Headers_1 -> <<>>
** exception exit: {{badmatch,{error,[]}},
...stack trace with just my stuff

As far as I can tell the body contents "0\r\n\r\n" are being treated as headers and blowing up.

In general it looks like parse_headers() can return results in forms other than [{X,Y},...] which the lower-case transformation on line 1012 requires... although there's also the issue of the body being parsed as a set of headers.

Owner

cmullaparthi commented Apr 4, 2012

Thank you for all this info. I'll try to put a fix later tonight unless you
beat me to it :-)
On Apr 4, 2012 7:26 PM, "Ryan Flynn" <
reply@reply.github.com>
wrote:

In general it looks like parse_headers() can return results in forms other
than [{X,Y},...] which the lower-case transformation on line 1012 requires.


Reply to this email directly or view it on GitHub:
#67 (comment)

Owner

cmullaparthi commented Apr 5, 2012

Hmm... I think this is because ibrowse is not expecting a body in the response because it is a HEAD request. And treating the remaining as a new response...which is wrong. Let me mull over the best way to fix this. I'll get back to you within the next day.

Owner

cmullaparthi commented Apr 5, 2012

I believe Apache is behaving a bit weirdly. It is not supposed to send anything in the body. Extract from the RFC 2616

4.4 Message Length
The transfer-length of a message is the length of the message-body as it appears in the message; that is, after any 
transfer-codings have been applied. When a message-body is included with a message, the transfer-length of that 
body is determined by one of the following (in order of precedence):

1. Any response message which “MUST NOT” include a message-body (such as the 1xx, 204, and 304 
responses and any response to a HEAD request) is always terminated by the first empty line after the header 
fields, regardless of the entity-header fields present in the message.

But then again, it is a bit ambiguous. And this being Apache, I guess I'll have to fix it on my side.

rflynn commented Apr 5, 2012

I agree this is erroneous behavior on the server side; though even disregarding that there's the issue of parse_headers/2 returning values not in [{Key,Val},...] form

Owner

cmullaparthi commented Apr 6, 2012

Fixed in version 3.0.4

Thanks for your help.

rflynn commented Apr 6, 2012

Commit 1fc0bc4 makes ibrowse not crash, but it also ignores all my HEAD responses and causes all requests to time out. An explicit error would be much more helpful instead of ignoring the problem.

Owner

cmullaparthi commented Apr 6, 2012

Hi, seems to work ok for me. Can you send some traces please?

1> ibrowse:start().
{ok,<0.33.0>}
2> ibrowse:send_req("http://intranet/messenger", [], head).
{ok,"401",
    [{"Content-Length","454"},
     {"Content-Type","text/html"},
     {"Server","Microsoft-IIS/6.0"},
     {"WWW-Authenticate","Negotiate"},
     {"WWW-Authenticate","NTLM"},
     {"X-Powered-By","ASP.NET"},
     {"Date","Fri, 06 Apr 2012 19:31:22 GMT"}],
    []}
3> ibrowse:send_req("http://www.bbc.co.uk", [], head, [], [{proxy_host, "proxy"}, {proxy_port, 8080}, {proxy_user, "XXXXXX"}, {proxy_password, "XXXXX"}]).     
{ok,"200",
    [{"Date","Fri, 06 Apr 2012 19:32:31 GMT"},
     {"Content-Length","34358"},
     {"Content-Type","text/html"},
     {"Cache-Control","private, max-age=15"},
     {"Server","Apache"},
     {"Etag","\"511052b1a62142771cbb947bda28f1d5\""},
     {"Set-Cookie",
      "BBC-UID=142f171fb454ec9f65e86c7cd1445f10b1a1e4f2b49411ae4a91248e0cf6c06c0; expires=Tue, 05-Apr-16 19:32:31 GMT; path=/; domain=bbc.co.uk"},
     {"X-Cache-Action","MISS"},
     {"X-Cache-Age","0"},
     {"X-LB-NoCache","true"},
     {"Vary","Cookie"},
     {"Via","1.1 hatproxy01 (NetCache NetApp/6.0.7)"}],
    []}

seth commented Jul 30, 2012

Hi,

We are running into this problem I believe. I think you cannot easily reproduce it because you need a server setting transfer-encoding: chunked for head requests.

The AWS S3 server has this behavior. I think your intended fix works, but not the one implemented in the patch. I've added a line comment, but basically the clause that handles the HEAD response with transfer encoding chunked needs a == not a =/= so that it will match.

Owner

cmullaparthi commented Jul 31, 2012

Thanks Seth, I'll fix later today.

seth commented Jul 31, 2012

Much appreciated! I'm not sure how easy it would be to work into the existing code, but given that responses to HEAD requests should never include a body, I wonder if it would make sense to further short-circuit the handling such that both transfer-encoding and content-length are ignored when processing the response to a HEAD request. This would make the client more robust to variances in server implementation.

Owner

cmullaparthi commented Aug 1, 2012

Ignoring the transfer-encoding and content-length means that it will break HTTP pipelining! So unfortunately we can't do that.

Owner

cmullaparthi commented Aug 1, 2012

Seth, the problem must be something else. I believe the code is correct. What is is doing is checking that it is dealing with a HEAD request and the response does NOT have Chunked encoding. In such a (correct) case, it will return the response to the HEAD request straight away. If it is the broken case, it continues on to line 1085 which will match and continue.

Can you turn tracing on and send me (offline if you wish) the results?

seth commented Aug 1, 2012

Hmm. I'm not sure I'm understanding your comments. If the server sends
transfer-encoding: chunked in the response headers of a HEAD request,
we should also return the response straight away. Do we agree on that?
Because the behavior I'm seeing is that in such a case ibrowse just
waits until a connection timeout fires.

I'll see about a trace or other reproducible (by others) example.

On Tue, Jul 31, 2012 at 11:13 PM, Chandrashekhar Mullaparthi
reply@reply.github.com
wrote:

Seth, the problem must be something else. I believe the code is correct. What is is doing is checking that it is dealing with a HEAD request and the response does NOT have Chunked encoding. In such a (correct) case, it will return the response to the HEAD request straight away. If it is the broken case, it continues on to line 1085 which will match and continue.

Can you turn tracing on and send me (offline if you wish) the results?


Reply to this email directly or view it on GitHub:
#67 (comment)

Seth Falcon | Development Lead | Opscode | @sfalcon

Owner

cmullaparthi commented Aug 1, 2012

No - we don't agree :-)

My point is that ibrowse cannot straight away send the response to the caller, if the server is saying there is something in the body.

I could of course hide this detail from the caller (i.e respond to the caller, but still keep waiting for the server to finish sending the response), but that just becomes ugly. Let's do that only if there is no other option.

The other option is since the server is also sending a "Connection: close" header, may be I can let that take precedence. Can you confirm that you are also seeing this header in the server response?

seth commented Aug 1, 2012

Maybe I'm not understanding what is supposed to happen according to the spec.

I've been reading this: http://pretty-rfc.herokuapp.com/RFC2616#HEAD. When responding to HEAD requests, the server MUST NOT return a message-body. Even though the headers SHOULD be identical to what would be sent with a GET.

So that implies, for example, that a server should send "transfer-encoding: chunked" if that's what it would do for a GET and should set "content-lengh: 46" if a GET would return a 46 byte response body. In either of those cases it cannot send a body and be conforming to HTTP/1.1.

My reading is that servers should send headers on a HEAD request that would otherwise (other VERB) indicate a response body, but never send a body.

The S3 server is /not/ sending a connection:close header:

> 2012/08/01 17:34:41.265616  length=289 from=0 to=288
HEAD /bucket/REDACTED HTTP/1.1\r
authorization: AWS XXXXXXXXXXXXX\r
date: Wed, 01 Aug 2012 17:34:41 GMT\r
Content-Type: \r
Host: localhost:9898\r
Content-Length: 0\r
\r
< 2012/08/01 17:34:41.298517  length=254 from=0 to=253
HTTP/1.1 404 Not Found\r
x-amz-request-id: YYYYYYYYYY\r
x-amz-id-2: ABCABC\r
Content-Type: application/xml\r
Transfer-Encoding: chunked\r
Date: Wed, 01 Aug 2012 17:34:41 GMT\r
Server: AmazonS3\r\r

On Aug 1, 2012, at 8:13 AM, Chandrashekhar Mullaparthi wrote:

No - we don't agree :-)

My point is that ibrowse cannot straight away send the response to the caller, if the server is saying there is something in the body.

I could of course hide this detail from the caller (i.e respond to the caller, but still keep waiting for the server to finish sending the response), but that just becomes ugly. Let's do that only if there is no other option.

The other option is since the server is also sending a "Connection: close" header, may be I can let that take precedence. Can you confirm that you are also seeing this header in the server response?


Reply to this email directly or view it on GitHub:
#67 (comment)

Owner

cmullaparthi commented Aug 2, 2012

Yes, your understanding is correct. But if you see the original bug report, (#67 (comment)), the server was actually sending a chunk with zero length! And this is what I was trying to fix. Now looks like it has broken the valid case :-(

Other than introducing an option specifically to cater for the broken web server case, I'm not sure what else I can do. Does your code work properly if you use 3.0.3?

seth commented Aug 2, 2012

Yes, my code works properly using 3.0.3.

@cmullaparthi cmullaparthi added a commit that referenced this issue Aug 3, 2012

@cmullaparthi cmullaparthi Bug fix for issue #67 52ce596
Owner

cmullaparthi commented Aug 3, 2012

I've pushed a new version with a fix for this. A new option has been added {workaround, head_response_with_body}

Major version upgraded only because anyone relying on the fix in 3.0.4 need to change their code to use the new option. It should just work for you. Thanks for the help in resolving this.

seth commented Aug 3, 2012

I added a comment to the code. Did you mean for the following two clauses to use the same exact code?

            _ when Method =:= head,
                   Head_response_with_body =:= true ->
                %% This is not supposed to happen, but it does. An
                %% Apache server was observed to send an "empty"
                %% body, but in a Chunked-Transfer-Encoding way,
                %% which meant there was still a body.
                %% Issue #67 on Github
                {_, Reqs_1} = queue:out(Reqs),
                send_async_headers(ReqId, StreamTo, Give_raw_headers, State_1),
                State_1_1 = do_reply(State_1, From, StreamTo, ReqId, Resp_format,
                                     {ok, StatCode, Headers_1, []}),
                cancel_timer(T_ref, {eat_message, {req_timedout, From}}),
                State_2 = reset_state(State_1_1),
                State_3 = set_cur_request(State_2#state{reqs = Reqs_1}),
                parse_response(Data_1, State_3);
    _ when Method =:= head ->
                {_, Reqs_1} = queue:out(Reqs),
                send_async_headers(ReqId, StreamTo, Give_raw_headers, State_1),
                State_1_1 = do_reply(State_1, From, StreamTo, ReqId, Resp_format,
                                     {ok, StatCode, Headers_1, []}),
                cancel_timer(T_ref, {eat_message, {req_timedout, From}}),
                State_2 = reset_state(State_1_1),
                State_3 = set_cur_request(State_2#state{reqs = Reqs_1}),
                parse_response(Data_1, State_3);

(the weird indenting is because there is a mix of tabs and spaces)

Owner

cmullaparthi commented Aug 7, 2012

This is embarrassing! Thanks for spotting that. I've made sure it is correct this time. Changed version to 4.0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment