Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grizzly provider fails to handle HEAD with Content-Length header #150

Closed
cowtowncoder opened this issue Oct 12, 2012 · 24 comments
Closed

Grizzly provider fails to handle HEAD with Content-Length header #150

cowtowncoder opened this issue Oct 12, 2012 · 24 comments

Comments

@cowtowncoder
Copy link
Contributor

I am trying to use Grizzly provider (v1.7.6), and noticed a timeout for simple HEAD request.
Since this is local test, with 15 second timeout, it looks like this is due to blocking.
Same does not happen with Netty provider.

My best guess to underlying problem is that Grizzly provider expects there to be content to read since Content-Length is returned. This would be incorrect assumption, since HTTP specification explicitly states that HEAD requests may contain length indicator, but there is never payload entity to return.

Looking at Netty provider code, I can see explicit handling for this use case, where connection is closed and any content flushes (in case server did send something).
I did not see similar handling in Grizzly provider, but since implementation code structure is very different it may reside somewhere else.

@rlubke
Copy link
Contributor

rlubke commented Oct 15, 2012

I checked the Grizzly code, and there is logic to deal with HEAD requests and response bodies. I tested locally with AHC and didn't see any issues here.

Can you create a test case that I can run or perhaps get a thread dump when the hang occurs?

@cowtowncoder
Copy link
Contributor Author

Just to be clear: my problem is not with HEAD with response bodies: I don't send response body (since spec forbids that). Rather, it's with no body, and Content-Length > 0, to indicate length one would get for GET.

I wish the project I am working on was public (it's at private github). AHC tests were pretty hard to follow last time I tried, but do you know of specific ones I could look at, as a starting point / template? I will go ahead myself and try to see -- challenge is that someone has refactored AHC codebase, so while it will be nicer separation, it is also quite different from last release (1.7.6)
I will try to see if I can find HEAD tests.

Hangup breakpoint might make sense, just need to raise timeout value to higher. Although with async operation it may be bit tricky.

@rlubke
Copy link
Contributor

rlubke commented Oct 15, 2012

I understood what you meant. My response probably wasn't clear. I meant to say that it shouldn't expect content in the case of a HEAD request and so it shouldn't hang waiting for content.

You might look at api/src/test/java/com/ning/http/client/async/AsyncProvidersTest.asyncStatusHEADTest() as a template.

@cowtowncoder
Copy link
Contributor Author

Ok gotcha. Just wanted to verify -- I often read bug reports too cursorily myself. :)

Thanks for the pointer & apologies for lack of test cases so far.

@cowtowncoder
Copy link
Contributor Author

Looking at code I have, there is one other possible explanation: work-around for (old?) Netty prob with handling of chunk boundaries may be interfering. Hope to refactor code, then go back to trying out Grizzly provider.

Btw, what would be the latest stable grizzly version to use? AHC seems to use use "grizzly-websockets", 2.2.16.

@rlubke
Copy link
Contributor

rlubke commented Oct 15, 2012

The latest is 2.2.19. AHC should work fine with it.

@cowtowncoder
Copy link
Contributor Author

Ok. Tried upgrading to 2.2.19 (for some reason, websockets one is needed). Added explicit deps to grizzly-framework, grizzly-http, to reduce likelihood of transitive deps being to some old version.
Still get the timeout on HEAD, PUT, GET, DELETE work. Now need to figure out where to put the breakpoint when running test...

@rlubke
Copy link
Contributor

rlubke commented Oct 15, 2012

The Grizzly WS library is needed for WS support within AHC. Grizzly WS is always released at the same time as the http and core frameworks, so the versions should always match.

Not sure if you need a breakpoint yet. If you extend the timeout, you could connect to the process using visualvm and obtain a thread dump.

@cowtowncoder
Copy link
Contributor Author

This probably won't help a lot, but actual call is done using this part of API:


    final String path = resourcePath(server.resourceEndpoint(), contentId);
    try {
        BoundRequestBuilder reqBuilder = _httpClient.prepareHead(path);
        reqBuilder = addStandardParams(reqBuilder, config.getClientId());
        Future<Response> futurama = _httpClient.executeRequest(reqBuilder.build());
        Response resp;
        try {
            resp = futurama.get(timeout, TimeUnit.MILLISECONDS);
        } catch (TimeoutException e) {
            return new AHCHeadCallResult(CallFailure.timeout(server, startTime, System.currentTimeMillis()));
        }
        int statusCode = resp.getStatusCode();
        // one thing first: handle standard headers, if any?
        handleHeaders(server, resp, startTime);
        // call ok?
        if (!IOUtil.isHTTPSuccess(statusCode)) {
            // if not, why not? Any well-known problems? (besides timeout that was handled earlier)
            // then the default fallback
            String msg = getExcerpt(resp);
            return new AHCHeadCallResult(CallFailure.general(server, statusCode, startTime, System.currentTimeMillis(), msg));
        }
        FluentCaseInsensitiveStringsMap headers = resp.getHeaders();
        String lenStr = headers.getFirstValue(VagabondConstants.HTTP_HEADER_CONTENT_LENGTH);            

and failure is via TimeoutException; timeout set at 15000 msecs (or whatever); timeout exception seems to be missing stack trace info during debugging.
Server does see HEAD request,

@cowtowncoder
Copy link
Contributor Author

Oh Crivens! I can now make test pass by adding two things:

  1. Changing timeout from 15000 msecs to 500 msecs
  2. Adding 2 System.err.println() calls on server

so I suspect this may be a race condition of some kind. And those have been found from AHC codebase from time to time. :-(

For whatever it's worth, request is handled by a JAX-RS end point.

@cowtowncoder
Copy link
Contributor Author

Oh. One realization on timeouts; this may be explained by fact that the FIRST HEAD call fails, but retry succeeds -- but this only happens when whole-operation timeout limit is not reached. With 15secs it is.
This would suggest that there is some state dependency in there.

@cowtowncoder
Copy link
Contributor Author

I spent some more time ensuring that server implementation does not do something stupid, and as far as I can see it does not. I am also looking at AHC tests; one of HEAD tests is disabled, but it seems broken and not related to issue I am seeing.

At this point I think that the problem has to be related to reuse of the underlying HTTP connection; in my test I do a sequence of:

  • GET to see that resource does not exist (giving 404)
  • PUT resource
  • GET to verify contents
  • HEAD to verify access to headers (including Content-Length)

The fact that if retries are enabled and second try for HEAD works (with a "fresh" connection, I assume, since timeout probably forces closing of the first connection) suggests that this is somehow related to state handling.

I will next try to modify a test to do something like above.

@cowtowncoder
Copy link
Contributor Author

Managed to write a unit test, similar Head302Test. After failing first, I managed to get it work; initial failure was not related (for HEAD, handler must still access the output stream). Test does GET, HEAD sequence with grizzly provider.

One additional difference is that test uses Async listener approach, whereas production code uses blocking executeRequest variant.

I will next try changing my HEAD handler to use bare async handler. Maybe I can at least work around the issue, even if I can not figure out what exactly is going on.

@cowtowncoder
Copy link
Contributor Author

Change to pure async handler thing did not solve the problem, but it does give bit more visibility into state changes.
Callbacks are received as expected:

  1. Status of 200
  2. Headers, including Content-Length

but then I see two Throwables: first being java.io.IOException("Remotely Closed"), and second java.io.EOFException.
Both are passed to 'onThrowable', which is kind of interesting. I guess they are "normally" just eaten by a grue or something (ugh).
I'll check out stack traces, in case they might give some idea as to what is failing.

@cowtowncoder
Copy link
Contributor Author

Here goes:


java.io.IOException: Remotely Closed
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$AsyncHttpClientTransportFilter$1.failed(GrizzlyAsyncHttpProvider.java:740)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.notifyFailure(DefaultFilterChain.java:559)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.throwChain(DefaultFilterChain.java:443)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:162)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:78)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:770)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:115)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:55)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:135)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)


and


java.io.EOFException
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.read(TCPNIOTransport.java:875)
at org.glassfish.grizzly.nio.transport.TCPNIOTransportFilter.handleRead(TCPNIOTransportFilter.java:74)
at org.glassfish.grizzly.filterchain.TransportFilter.handleRead(TransportFilter.java:173)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$AsyncHttpClientTransportFilter.handleRead(GrizzlyAsyncHttpProvider.java:753)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:265)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:200)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:134)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:78)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:770)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:115)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:55)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:135)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)


so these are 2 exceptions caught during processing of problematic HEAD request.

@cowtowncoder
Copy link
Contributor Author

Last comment of the day: timeout occurs for some reason 2 seconds after specified time (if I give 1500 msec, onThrowable gets called bit over 3500 msec after executing request). This on localhost, without real network access.
I guess there is probably some other additional internal timeout involved.

Also: in my test PUT plays a role; if it is commented out, failure does not occur. May try to add it in AHC test sequence tomorrow.

@rlubke
Copy link
Contributor

rlubke commented Oct 16, 2012

OK, great - will dig in further on my side. Thanks!

@rlubke
Copy link
Contributor

rlubke commented Oct 17, 2012

Could you please capture a wire trace of the http traffic for your failing case?

@cowtowncoder
Copy link
Contributor Author

This is from unit tests on localhost. Wonder what would be the easiest way to capture it? (sorry it has been a while since I used tcpdump). I can easily run the test on Eclipse, or probably from command line.

@rlubke
Copy link
Contributor

rlubke commented Oct 17, 2012

I'm a big fan of wireshark. Another good tool is ngrep.

@cowtowncoder
Copy link
Contributor Author

Ok, I think I got stuff captured, exported as XML. But github issue tracker has no attachment capabilities.
So I'll need to find a place to upload it in... let's see:

https://jira.codehaus.org/browse/JACKSON-435

(see the attachment)

@cowtowncoder
Copy link
Contributor Author

Data in question is just 12000 bytes of 0xAC (i.e. data is correct). But same happens with smaller payloads (120 bytes whatever).

@cowtowncoder
Copy link
Contributor Author

One thing I started thinking is that perhaps I am not reading response that PUT is returning; and that could mess up handling.

@rlubke
Copy link
Contributor

rlubke commented Aug 12, 2013

Closing this issue out as I'm unable to reproduce. Please feel free to re-open when you have time to revisit and have a test case we can run.

@rlubke rlubke closed this as completed Aug 12, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants