-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
THRIFT-4621 Add THeader for Python #1583
Conversation
4a76744
to
20605ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spladug thanks, it looks pretty nice overall !
import struct | ||
import zlib | ||
|
||
from thrift.compat import BufferIO, binary_to_str, str_to_binary, byte_index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: unused binary_to_str
and str_to_binary
(breaking style check CI job)
lib/py/src/server/TServer.py
Outdated
@@ -158,7 +176,15 @@ def serveClient(self, client): | |||
itrans = self.inputTransportFactory.getTransport(client) | |||
otrans = self.outputTransportFactory.getTransport(client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to create otrans
if using THeader ?
|
||
# for THeaderProtocol, we must use the same protocol instance for input | ||
# and output so that the response is in the same dialect that the | ||
# server detected the request was in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nicer to inform the users that we're ignoring outputProtocolFactory
if they explicitly provided one (self.inputProtocolFactory != self.outputProtocolFactory
would suffice to tell if they did ?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same things for other servers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, makes sense. I've added a check to the TServer
constructor to yell at the user if they explicitly pass a mix of THeaderProtocolFactory
and another factory. Is that what you had in mind?
self._client_type = THeaderClientType.HEADERS | ||
|
||
if not allowed_client_types: | ||
allowed_client_types = [THeaderClientType.HEADERS] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just make it default argument.
|
||
def set_header(self, key, value): | ||
assert isinstance(key, bytes), "header names must be bytes" | ||
assert isinstance(value, bytes), "header values must be bytes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raising TypeError
would be nicer, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make more sense to accept str
for py3 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see anything in the spec about the payloads needing to be text, so I didn't want to arbitrarily limit what the header values could be; being raw bytes allows e.g. a BinaryProtocol struct to be encoded in there if desired. In fact, at Reddit we actually do send an encoded struct in our headers right now.
return trans.read(length) | ||
|
||
|
||
def writeString(trans, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those 3 functions need at least _
prefix.
We could even define them inside function scope where we use them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since they're not closing around anything in the scope they're used, I felt it was cleaner/clearer to just put 'em out here. That OK?
self.flags = 0 | ||
self.sequence_id = 0 | ||
self._protocol_id = THeaderSubprotocolID.BINARY | ||
self.max_frame_size = 0x3FFFFFFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need _
prefix for this one since loosing up the limit can violate the protocol.
|
||
@property | ||
def cstringio_buf(self): | ||
if not self._has_ever_read: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what happens if we don't have this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to look into this more. It's in the FBThrift implementation and to be honest I don't have a firm enough grasp on how the accelerated protocols interact with this to understand what might go wrong without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yeah, as far as I can tell it's unnecessary since the C code will call refill (which will read a frame) if this buffer is empty. Cool. Removed.
payload = transform_fn(payload) | ||
if transform_id not in write_transforms: | ||
write_transforms.append(transform_id) | ||
self._write_transforms = write_transforms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's a good idea.
It gets in your way if, e.g., you want to save your request bandwidth with extra CPU time on client, and not want to spend CPU time on server and happy to fill more response bandwidth.
I'm inclined not to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'll drop it. I was just mimicking FBThrift here but agree with your reasoning.
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to put a brief comment about unsupported features (like HTTP, and maybe non-blocking server) to save future users' time, here or somewhere noticeable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback (and the super helpful prior art and tests!)
I think I've addressed all the comments so far. New changes are in new commits so you can see the differences more easily. I'll squash down the new commits before merge.
|
||
def set_header(self, key, value): | ||
assert isinstance(key, bytes), "header names must be bytes" | ||
assert isinstance(value, bytes), "header values must be bytes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see anything in the spec about the payloads needing to be text, so I didn't want to arbitrarily limit what the header values could be; being raw bytes allows e.g. a BinaryProtocol struct to be encoded in there if desired. In fact, at Reddit we actually do send an encoded struct in our headers right now.
payload = transform_fn(payload) | ||
if transform_id not in write_transforms: | ||
write_transforms.append(transform_id) | ||
self._write_transforms = write_transforms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'll drop it. I was just mimicking FBThrift here but agree with your reasoning.
|
||
# for THeaderProtocol, we must use the same protocol instance for input | ||
# and output so that the response is in the same dialect that the | ||
# server detected the request was in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, makes sense. I've added a check to the TServer
constructor to yell at the user if they explicitly pass a mix of THeaderProtocolFactory
and another factory. Is that what you had in mind?
return trans.read(length) | ||
|
||
|
||
def writeString(trans, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since they're not closing around anything in the scope they're used, I felt it was cleaner/clearer to just put 'em out here. That OK?
2e53e4b
to
88de7e5
Compare
|
||
See doc/specs/HeaderFormat.md for details of the wire format. | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, thanks
Client: py
c6634ed
to
fa9006b
Compare
Thanks! All squashed down :) |
Looks good to me as well, but need tests to pass. I can re-kick the Travis ones but I don't know if I can do that for Appveyor because someone else owns the Appveyor account. |
Appveyor has been working (now it's a download failure) before squash and Travis passed, so it should be OK. |
Client: py
This implements the same subset of THeaderProtocol that the C++ implementation does. Specifically, it supports TBinaryProtocol and TCompactProtocol payloads and allows those same protocols to come in without headers framed/unframed for backwards compatibility. It also supports the gzip transform.
Both Python 2 and Python 3 pass the full test suite (feature and cross) with C++ thrown in the mix:
The only newly added known failures for this are those using the HTTP transport.