Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

THRIFT-3737 Improve isOpen() to give accurate tcp connection status. #945

Closed

Conversation

cherrot
Copy link

@cherrot cherrot commented Mar 13, 2016

Origin issue: https://issues.apache.org/jira/browse/THRIFT-3737

To give user a more accurate connection state by invoking TSocket.isOpen().

The code refers to urllib3's implimentation.

@nsuke
Copy link
Member

nsuke commented Mar 13, 2016

Hi @cherrot thanks for looking into this.
This doesn't seem a right way to do it.
In their context, readable sockets may mean dropped connections.
It's not the case for us.

Also, I don't think we want to include 3rd party code for something as simple as this.
It would require us to put their MIT license into our NOTICE file etc.

@cherrot
Copy link
Author

cherrot commented Mar 13, 2016

readable sockets may mean dropped connections.

@nsuke Actually I didn't figure out what you mean. How can dropped connections (I mean, those in CLOSE_WAIT / CLOSE state) be OPENED connections?

When I'm calling this method, what I expect is getting the correct underlying connection state, not just telling me if I have closed it by hand. Or otherwise, how could I determine when I need reconnect the underlying connection in a semantic way, instead of sending to / reading from it directly?

Also, I referred to the 3rd party lib because I want to show that it's a common pattern to judge if a connection could be used, not because the LICENSE. I could reimplement this 10-line-code-block in my style if you guys prefer that :)

@nsuke
Copy link
Member

nsuke commented Mar 13, 2016

Actually I didn't figure out what you mean. How can dropped connections (I mean, those in CLOSE_WAIT / CLOSE state) be OPENED connections?

The method name is is_connection_dropped and negation of it is used for isOpen.

@asfbot
Copy link

asfbot commented Mar 13, 2016

Marcus Dushshantha Chandradasa on dev@thrift.apache.org replies:
Zz

@cherrot
Copy link
Author

cherrot commented Mar 13, 2016

Hi, after comparing the thrift lib for Java and C++ golang, I agree to obey the literal meaning of isOpen method by convention of Socket API. That is to say, a connection is open unless it was closed explicitly.

Now the problem is, since Thrift hides the underlying TSocket, unless using some tricks, I can't acces to the socket object from downstream application (say happybase in this case) to determine whether it should be closed. Then it becomes so easy for a normal user to create more and more connections until reaching the system limit.

So I hope TTransport could add an method calls something like shouldClose or isDropped, inside which, using something like select([sock], [], [], 0.0) to determine whether this socket should be closed. Or simply expose the TSocket object to the outer world.

FYI, according to http://code.activestate.com/recipes/408997-when-to-not-just-use-socketclose/#c5 :

Now, that incoming socket, which select just said was readable, has no data waiting. According to asyncore, twisted, and every other asynchronous package I've seen, when a socket is readable(), but has no data waiting, then the remote machine has disconnected.

@nsuke
Copy link
Member

nsuke commented Mar 13, 2016

I agree to obey the literal meaning of isOpen

To be clear, I'm not against introducing the feature you're suggesting, I've just commented on the approach.
In fact, as you may have noticed, C++ has peek() method that does "somewhat" similar thing.

FYI, according to http://code.activestate.com/recipes/408997-when-to-not-just-use-socketclose/#c5 :

It's same as the cases where read() returns 0 length, i.e. just plain old EOF, so you still need to check the available data.
The urllib3 code you took is assuming that any remaining data is invalid (it's true in their context).

@cherrot
Copy link
Author

cherrot commented Mar 14, 2016

Thanks for reminding me that @nsuke ;)
I'll commit & push a new implementation like the peek() method in C++ lib.

BTW do we need preserve the function name peek ? It seems confusing...

@cherrot cherrot force-pushed the THRIFT-3737.Accurate-connection-status branch from 241d58a to c32995d Compare March 14, 2016 10:06
@cherrot
Copy link
Author

cherrot commented Mar 15, 2016

...
Caused by: hudson.plugins.git.GitException: Command "git checkout -b master origin/master" returned status code 1:
stdout: lib/c_glib/src/thrift/c_glib/protocol/thrift_compact_protocol.c: needs merge
lib/c_glib/test/testcompactprotocol.c: needs merge

stderr: error: you need to resolve your current index first
...

It looks like Jenkins has some problem?

@nsuke
Copy link
Member

nsuke commented Mar 16, 2016

As further looking into your use case (happybase), I doubt it's what you wanted.
happybase does not seem to bother with transport state at all; by isOpen it only avoids "already opened" exceptions.
I'm all for reducing reinvented wheels in users' codebase if users are repeatedly implementing the same thing, but in this case, are there really those wheels ?

Note that almighty "this connection is fine" guarantee is very costly: even the current patch does not cover cases like invalidated file descriptor (i.e. closed socket) and half closed non-writable sockets.
Another crucial point to note is that even after a perfect guarantee, nothing is guaranteed a moment later, because we cannot share a lock/mutex with remote server, let alone with physical network.
So you still need the exact same TTranisportException error handling, and the most efficient way would probably be to try doing normal procedures anyway, catch exceptions and start over if errored.
I understand it looks stupid but I'd like to remind you of the word "keep it simple stupid".

@cherrot
Copy link
Author

cherrot commented Mar 17, 2016

happybase does not seem to bother with transport state at all; by isOpen it only avoids "already opened" exceptions.

As what I described in happybase issue#112, happybase use isOpen (after all there is no other method to do so) to check if it need to reopen the connection in its connection pool implementation. (See pool.py and connection.py). However in this use case, isOpen always returns True.

Consider the following case: suppose Thrift server would drop a connection which was idle for 2min, and the client side communicates with Thrift server every 1 to 5min. With the current implementation the client side would encounter extremely frequent network failures.

Note that almighty "this connection is fine" guarantee is very costly: even the current patch does not cover cases like invalidated file descriptor (i.e. closed socket) and half closed non-writable sockets.

Compared with the former case, this patch is quite efficient (no underlying network communication, no blocking).

And I don't think it breaks the KISS rule. At least you don't need to put your failover code everywhere.

Of course we cannot cover all the circumstances of a connection's possible state, but we can try our best to give a probably healthy connection for the moment. For other circumstances, it depends on the user's decision (to let it die or have a retry logic).

After all, what this patch does is providing a convenient way to test the connection. It never forces developers to use this in their application or libs.

@nsuke
Copy link
Member

nsuke commented Mar 18, 2016

happybase use isOpen (after all there is no other method to do so) to check if it need to reopen the connection

isOpen call seems to be for only lazy initialization but it might not the point we should discuss in here.
Just one more thing for happybase: from what I read in their except: block I have impression that they may like your contribution.

If happybase's pool needs something similar to urllib3's pool, it's perfectly possible without modifying Thrift: just store TSocket instance and use its handle attribute.
That way, the validation can be much simpler (and yet more complete) than this, because you can make a lot of assumptions which are specific to your use case.

@cherrot
Copy link
Author

cherrot commented Mar 21, 2016

A try-except block is surely needed when we do something relative to the physical world, but we are not discussing on that, are we? And a try-failed-retry logic is inefficient.

It's perfectly possible without modifying Thrift: just store TSocket instance and use its handle attribute.

Yes it is possible, but far from perfect: the TSocket object is initialized as a parameter for initializing the TTransport instance. Since TTransport does not want us to access the TSocket object (by setting it "private"), I think it has responsibility to maintain or at least tell the socket's state.

Furthermore, if this is not the approach you prefer, then why is there a peek method in C++ lib? If it can be used in the logic of the server side, it can surely be used in the client side.

@nsuke
Copy link
Member

nsuke commented Mar 25, 2016

And a try-failed-retry logic is inefficient.

System calls are much more inefficient than exceptions, at least for Python.
Also refer to EAFP https://docs.python.org/2/glossary.html .

the TSocket object is initialized as a parameter for initializing the TTransport instance.

It's the same as TTransport for TProtocol.
Nothing wrong with accessing TTransport or TSocket API directly, not through TProcotol.

@cherrot
Copy link
Author

cherrot commented Apr 3, 2016

Yes you are right, using isActive would invoke select/poll and recv system calls than a straight forward send.

As a lower layer library, I think it would be nice to provide a convenient API telling the underlying state. Users could decide whether to use this or not.

And what is the peek in C++ lib for? Is there anything wrong with it?

@cherrot
Copy link
Author

cherrot commented Jun 3, 2016

It seems that a sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) could solve most unhealthy connection problems ;)

@jeking3
Copy link
Contributor

jeking3 commented Oct 25, 2017

Reminder, this needs to move forward.

@nsuke
Copy link
Member

nsuke commented Oct 28, 2017

There's no real use case anymore. happybase does not use Apache Thrift any longer.

@jeking3
Copy link
Contributor

jeking3 commented Oct 28, 2017

If that is the case and the change is no longer desired, we need to close this out somehow.

@cherrot
Copy link
Author

cherrot commented Oct 29, 2017

OK I agree to close this PR.

@cherrot cherrot closed this Oct 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants