Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasync freezes using specialized clients on connecting to non-available server process #66

Closed
thabach opened this issue Sep 19, 2013 · 21 comments
Labels

Comments

@thabach
Copy link
Member

thabach commented Sep 19, 2013

Hi Jeanfrancois

When trying to connect to a server (based on Nettosphere) which is not up, the wasync client freezes when using the specialized clients (AtmosphereClient and SerializedClient) instead of throwing an exception. This is rooted in an unconditional future.get() in DefaultSocket::connect() :


} else {
            r.setUrl(request.uri().replace("ws", "http"));
            transportInUse.future(options.runtime().prepareRequest(r.build()).execute((AsyncHandler<String>) transportInUse));

            try {
                if (options.waitBeforeUnlocking() > 0) {
                    logger.info("Waiting {}, allowing the http connection to get handled by the server. To reduce the delay, make sure some bytes get written when the connection is suspendeded on the server", options.waitBeforeUnlocking());
                }
                if (request.queryString().containsKey("X-atmo-protocol")) {
                    f.get();
                } else {
                    f.get(options.waitBeforeUnlocking(), TimeUnit.MILLISECONDS);
                }
            } catch (Throwable t) {
                // Swallow  LOG ME
                logger.trace("", t);
            } finally {
                f.done();
            }
        }
        return this;

I am using the latest wasync Snapshot wasync-1.0.0-20130828.144509-85

I am not sure if putting the timeout on the blocking get() is the cure, so I did not submit a pull request this time around, regards Christian.

@jfarcand
Copy link
Member

Hi, added

    @Test
    public void testTimeoutAtmosphereClient() throws IOException, InterruptedException {
        final CountDownLatch latch = new CountDownLatch(1);
        AtmosphereClient client = ClientFactory.getDefault().newClient(AtmosphereClient.class);
        RequestBuilder request = client.newRequestBuilder()
                .method(Request.METHOD.GET)
                .uri(targetUrl)
                .trackMessageLength(true)
                .transport(transport());

        Socket socket = client.create();
        socket.on(new Function<ConnectException>() {

            @Override
            public void on(ConnectException t) {
                latch.countDown();
            }

        }).on(Event.CLOSE.name(), new Function<String>() {
            @Override
            public void on(String t) {
                logger.info("Connection closed");
            }
        }).open(request.build());

        assertTrue(latch.await(10, TimeUnit.SECONDS));

    }

But for me I do get the ConnectionException pretty fast. So it seems your use case is different, e.g no ConnectException is thrown, right?

@thabach
Copy link
Member Author

thabach commented Sep 19, 2013

Hi

here is the thing

  • when I do the final _loginSocket.open(request.build()) with the default client, I get an exception being thrown in my calling thread, as the result of calling open().
  • The same call with the special clients leads into the open() call to block forever.

The socket.on(new Function<IOException>()... is called in both cases.

In the debugger I see that I end up in the DefaultSocket line 188, which is a call to f.get(), that blocks forever, in the special clients case.

I just ran your Unit Test and it blocks forever on theopen() too...

I checked again and I am using the latest wasync wasync-1.0.0-20130828.144509-85.

Are there dependencies I need to check ? but DefaultSocket is in the wasync jar. So I am really confused now :) Christian.

@jfarcand
Copy link
Member

Ah you are using an old version. Pull again and let me know.

@thabach
Copy link
Member Author

thabach commented Sep 20, 2013

So I pulled again, and with your fix the behaviour is consistent between clients.

BUT (contrary to what I wrote above), open() does not throw at all on not being able to establish a connection. While this is consistent between clients now, it is unexpected to me. open() throws on non suitable transports, but does not throw on not being able to establish a connection (but only asynchronously notifies with a ConnectException via a socket::on callback).

Wouldn't it make more sense to :

     } else {
         f.get(options.waitBeforeUnlocking(), TimeUnit.MILLISECONDS);
     }
 } catch (Throwable t) {
      throw new IOException("Connection could not be established", t);
 } finally {
      f.done();
}

In addition the get(withTimeout) is throwing a TimeoutException once time elapses, but I would actually expect it to throw as soon as the ConnectionException is thrown (and signaled in the socket::on) in the AsyncHttpClient.

open() to me is inherently synchronous in nature. What ya think ?

@jfarcand
Copy link
Member

Salut, ya that make sense...but I really wanted to force the application to define Function to handle that case instead of throwing exception on open(). What is the issue of using Functin? Thanks!

@thabach
Copy link
Member Author

thabach commented Sep 24, 2013

Hi Jeanfrancois

there is nothing wrong with the asynchronous callback. To me open() is just too inherently synchronous an operation, establishing some initial conditions for a Socket to become operational, that I want to deal with its
potential failure in an asynchronous callback.

(I would also argue that oftentimes the asynchronous IOException handling on a connection that was established is different to handling an IOException indicating that an initial connection can't be established at all.)

I checked some other Netty stack implementations of mine and what I usually do is

            ChannelFuture future = _channel.connect(_remoteAddress);
            future.awaitUninterruptibly();
            if (!future.isSuccess()) {
                throw new SoupBinTCPConnectException("Could not establish a connection to address " + _remoteAddress + " !", future.getCause());
            }

So as an alternative to throwing in open(), you could design the API similar to Netty and return a Future (rather than the Socketitself) on which a client could choose to block for the asynchrounous execution of open(), waiting for it to either ...

  • succeed
  • throw an exception on failure directly
  • or allowing the exception to be retrieved from the future, which is only indicating failure rather than throwing directly (as is the case for the ChannelFuture of Netty).

Let me know what you think. ttyl.

@thabach
Copy link
Member Author

thabach commented Sep 24, 2013

Hi Jeanfrancois

with your fix to #66 I do run into serious problems now. The f.get(timeout, tu); which is a get(-1) leads to

  • an instant TimeOutException that is not signaled in the asynchronous function callback
  • does not wait in the get() for anything to be done/established, be it on the wasync or the Nettosphere side (like handshake and initial GET connection suspension).

It just leads to a connect() returning the Socket instantly, over which fire()s lead to undefined results.

Can't you change it to this instead ?

    if (request.queryString().containsKey("X-atmo-protocol")) {
                    f.get(options.waitBeforeUnlocking(), TimeUnit.MILLISECONDS);
                } else {
                    f.get(options.waitBeforeUnlocking(), TimeUnit.MILLISECONDS);
                }

Cheers, Christian.

@jfarcand
Copy link
Member

Hum...can you tell me how to reproduce this? It's strange all the tests are working fine and the sample....I will take a look as soon as I can. Sorry for all the issues. BTW the code above do the same in both case :-)

jfarcand added a commit that referenced this issue Sep 24, 2013
@thabach
Copy link
Member Author

thabach commented Sep 25, 2013

Hi Jeanfrancois

I think a lot of your tests work because of the waits you put following the open() before you start to fire(). I can provide you a test if you still think it`ll be beneficial later, but I want to argue from a pure logical standpoint first.

What I perform is an open() being followed by a fire().

  • With the original code before the fix, the open() did a blocking get() on a future representing the asynchronously executed "open-procedure" (which includes the handshake, if I am not mistaken). This was perfect, as we blocked on the socket for it to become ready for fire() calls.
  • With the fix, open() doesn't block on the future at all anymore but instantly returns, which leads my code to break on the subsequent fire(), as of the socket not being ready.

So the fix should be changed as done in your latest checkin, still doing a blocking get() but a time-constraint one.

So far so good, but I still have a problem with this everything being subject of race conditions.

Let's say that my call to open() returns. How am I now gonna determine if the "opening-procedure" was successful or not, the following is possible :

  • all fine, the get() did not time out, socket is ready to be fire()d on.
  • not quite ready yet, the socket timed out, the TimeOutException is swallowed in open() and the open() returns normally as in the first case.

This leaves me, the caller of open() having to find out, if the socket is ready or not. So one could argue that the caller needs to check if any asynchronous IOException was reported over the function callback mechanism.

If lucky, there is an IOException and the open() can be clearly considered having failed, but what if there is none ? Does that now mean that the socket is ready ? how could we tell ? we are only asynchronously called back on IOExceptions, so for how long would we wait in concluding that the socket is actually operational ?

I am of the strong opinion that we should have clear open() semantics as described in an earlier post.

open() should either :

  • return -> implying that the socket was successfully setup and ready to be fire()d on
  • throw an exception -> signaling either a timeout on opening, or an IOException (like server process unavailability) OR return a future to the caller -> on which the caller can block for the socket to become operational or error out.

With semantics like these the caller is not left with the heuristical / pretty impossible task of finding out if a socket is ready or not after calling open().

Lemme know what you think, cheers, Christian.

@jfarcand
Copy link
Member

OK good point. The issues are:

  • when the open is executed, if Atmosphere protocol is used, we must wait for the handshake to happens before marking the socket as ready. if the Atmosphere protocol is not used, we have no way to know if the socket connected or not => hence why I used the get() at the first place when Atmosphere protocol is used, and a timed get(time) when the are no way to know if the socket is connected or not. I don't think this can be modified
  • I do agree the behavior needs to change, we need to know if the open failed or not. My original design was to ask the application to pass a function. Now I do agree a Future would be a better idea, e.g let the application decide to block or not. Let me work on it now and see where I can go.

@jfarcand jfarcand reopened this Sep 25, 2013
@jfarcand
Copy link
Member

OK, fixed. Try 1.1.0-SNAPSHOT and let me know.

@thabach
Copy link
Member Author

thabach commented Sep 25, 2013

My tests are all working now, perfect ! Thx for being so responsive and open to my suggestions, I think the socket API did improve a lot with these changes (but I might be a bit biased, after all :) ) !

Cheers, Christian.

@jfarcand
Copy link
Member

@thabach You are always welcomed! I've also fixed an inconsistency with issue 68. Can you try the latest 1.1.0-SNAPSHOT and let me know if all works for you. I will go ahead and release 1.1.0 once I'm sure everything works on your side :-)

@thabach
Copy link
Member Author

thabach commented Sep 26, 2013

Hi Jeanfrancois

there seems to be a new problem now, in this snippet, different to RC5, the get() on the fire() future blocks forever.

        connection.connect('JohnD', 'PWD')
        println("after connect, connection is connected " + connection.isConnected())
        Future f = connection.send(new UnsequencedDataPacketBuilder().message([0, 1, 2, 3, 4, 5, 6, 7, 8, 9] as byte[]).build())
        f.get();

The code does open() the socket in connect(..) and then fire() in the send(), the call f.get() on the future returned from fire() blocks forever with the latest changes. Unfortunately I can only conclusively say that this works when downgrading to RC5, but not what changes introduced the freeze.

Let me know, I can work out a unit test later if you wish, Christian.

additional info: I am using a SerializedClient.

@jfarcand
Copy link
Member

@thabach Just to make sure. Do you get that problem with 1.0.0?

@jfarcand
Copy link
Member

@thabach Added a unit test. You can try to reproduce with it.

@jfarcand
Copy link
Member

@thabach Ok I'm able to reproduce with the SerializedClient. Fix coming!

@thabach
Copy link
Member Author

thabach commented Sep 26, 2013

I just tried now and with 1.0.0 it does not freeze on the fire() future's get() (but of course I had to accomodate the 1.0.0 open() semantics, i.e. heuristically wait after the return of the open() call, that is why I am prefering to run against RC5 :) ).

@thabach
Copy link
Member Author

thabach commented Sep 26, 2013

Great, looking forward to it, ready to test :)

@jfarcand
Copy link
Member

@thabach OK try latest version. Filled #69 to track the issue.

@thabach
Copy link
Member Author

thabach commented Sep 26, 2013

@jfarcand All works fine on my side now, good to go releasing 1.1.0 into wild :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants