Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

THRIFT-4621 Add THeader for Python #1583

Merged
merged 1 commit into from
Sep 1, 2018

Conversation

spladug
Copy link
Contributor

@spladug spladug commented Aug 22, 2018

Client: py

This implements the same subset of THeaderProtocol that the C++ implementation does. Specifically, it supports TBinaryProtocol and TCompactProtocol payloads and allows those same protocols to come in without headers framed/unframed for backwards compatibility. It also supports the gzip transform.

Both Python 2 and Python 3 pass the full test suite (feature and cross) with C++ thrown in the mix:

$ make cross
/usr/bin/python3 test/test.py --retry-count 5 --features .* --skip-known-failures --server cpp,c_glib,,,,,py,py3,,,,,,,,,,,
...
No unexpected failures.
0 failed of 40 tests in total.
/usr/bin/python3 test/test.py --retry-count 5 --skip-known-failures --server cpp,c_glib,,,,,py,py3,,,,,,,,,,, --client cpp,c_glib,,,,,py,py3,,,,,,,,,,, --regex ".*"
...
No unexpected failures.
0 failed of 712 tests in total.

The only newly added known failures for this are those using the HTTP transport.

--- a/test/known_failures_Linux.json
+++ b/test/known_failures_Linux.json
@@ -83,6 +83,8 @@
   "cpp-py3_compact-accelc_http-ip-ssl",
   "cpp-py3_compact_http-ip",
   "cpp-py3_compact_http-ip-ssl",
+  "cpp-py3_header_http-ip",
+  "cpp-py3_header_http-ip-ssl",
   "cpp-py3_json_http-ip",
   "cpp-py3_json_http-ip-ssl",
   "cpp-py3_multi-accel_http-ip",
@@ -101,6 +103,8 @@
   "cpp-py3_multic-multiac_http-ip-ssl",
   "cpp-py3_multic_http-ip",
   "cpp-py3_multic_http-ip-ssl",
+  "cpp-py3_multih-header_http-ip",
+  "cpp-py3_multih-header_http-ip-ssl",
   "cpp-py3_multij-json_http-ip",
   "cpp-py3_multij-json_http-ip-ssl",
   "cpp-py3_multij_http-ip",
@@ -113,6 +117,8 @@
   "cpp-py_compact-accelc_http-ip-ssl",
   "cpp-py_compact_http-ip",
   "cpp-py_compact_http-ip-ssl",
+  "cpp-py_header_http-ip",
+  "cpp-py_header_http-ip-ssl",
   "cpp-py_json_http-ip",
   "cpp-py_json_http-ip-ssl",
   "cpp-py_multi-accel_http-ip",
@@ -131,6 +137,8 @@
   "cpp-py_multic-multiac_http-ip-ssl",
   "cpp-py_multic_http-ip",
   "cpp-py_multic_http-ip-ssl",
+  "cpp-py_multih-header_http-ip",
+  "cpp-py_multih-header_http-ip-ssl",
   "cpp-py_multij-json_http-ip",
   "cpp-py_multij-json_http-ip-ssl",
   "cpp-py_multij_http-ip",
@@ -375,6 +383,8 @@
   "py-cpp_binary_http-ip-ssl",
   "py-cpp_compact_http-ip",
   "py-cpp_compact_http-ip-ssl",
+  "py-cpp_header_http-ip",
+  "py-cpp_header_http-ip-ssl",
   "py-cpp_json_http-ip",
   "py-cpp_json_http-ip-ssl",
   "py-d_accel-binary_http-ip",
@@ -396,6 +406,7 @@
   "py-hs_accelc-compact_http-ip",
   "py-hs_binary_http-ip",
   "py-hs_compact_http-ip",
+  "py-hs_header_http-ip",
   "py-hs_json_http-ip",
   "py-java_accel-binary_http-ip",
   "py-java_accel-binary_http-ip-ssl",
@@ -420,6 +430,8 @@
   "py3-cpp_binary_http-ip-ssl",
   "py3-cpp_compact_http-ip",
   "py3-cpp_compact_http-ip-ssl",
+  "py3-cpp_header_http-ip",
+  "py3-cpp_header_http-ip-ssl",
   "py3-cpp_json_http-ip",
   "py3-cpp_json_http-ip-ssl",
   "py3-d_accel-binary_http-ip",
@@ -441,6 +454,7 @@
   "py3-hs_accelc-compact_http-ip",
   "py3-hs_binary_http-ip",
   "py3-hs_compact_http-ip",
+  "py3-hs_header_http-ip",
   "py3-hs_json_http-ip",
   "py3-java_accel-binary_http-ip",
   "py3-java_accel-binary_http-ip-ssl",

@spladug spladug force-pushed the THRIFT-4621-THeader-for-Python branch from 4a76744 to 20605ce Compare August 22, 2018 20:37
Copy link
Member

@nsuke nsuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spladug thanks, it looks pretty nice overall !

import struct
import zlib

from thrift.compat import BufferIO, binary_to_str, str_to_binary, byte_index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: unused binary_to_str and str_to_binary (breaking style check CI job)

https://travis-ci.org/apache/thrift/jobs/419362288#L9859

@@ -158,7 +176,15 @@ def serveClient(self, client):
itrans = self.inputTransportFactory.getTransport(client)
otrans = self.outputTransportFactory.getTransport(client)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to create otrans if using THeader ?


# for THeaderProtocol, we must use the same protocol instance for input
# and output so that the response is in the same dialect that the
# server detected the request was in.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer to inform the users that we're ignoring outputProtocolFactory if they explicitly provided one (self.inputProtocolFactory != self.outputProtocolFactory would suffice to tell if they did ?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same things for other servers)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, makes sense. I've added a check to the TServer constructor to yell at the user if they explicitly pass a mix of THeaderProtocolFactory and another factory. Is that what you had in mind?

self._client_type = THeaderClientType.HEADERS

if not allowed_client_types:
allowed_client_types = [THeaderClientType.HEADERS]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just make it default argument.


def set_header(self, key, value):
assert isinstance(key, bytes), "header names must be bytes"
assert isinstance(value, bytes), "header values must be bytes"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising TypeError would be nicer, IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense to accept str for py3 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see anything in the spec about the payloads needing to be text, so I didn't want to arbitrarily limit what the header values could be; being raw bytes allows e.g. a BinaryProtocol struct to be encoded in there if desired. In fact, at Reddit we actually do send an encoded struct in our headers right now.

return trans.read(length)


def writeString(trans, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those 3 functions need at least _ prefix.
We could even define them inside function scope where we use them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since they're not closing around anything in the scope they're used, I felt it was cleaner/clearer to just put 'em out here. That OK?

self.flags = 0
self.sequence_id = 0
self._protocol_id = THeaderSubprotocolID.BINARY
self.max_frame_size = 0x3FFFFFFF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need _ prefix for this one since loosing up the limit can violate the protocol.


@property
def cstringio_buf(self):
if not self._has_ever_read:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what happens if we don't have this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look into this more. It's in the FBThrift implementation and to be honest I don't have a firm enough grasp on how the accelerated protocols interact with this to understand what might go wrong without it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK yeah, as far as I can tell it's unnecessary since the C code will call refill (which will read a frame) if this buffer is empty. Cool. Removed.

payload = transform_fn(payload)
if transform_id not in write_transforms:
write_transforms.append(transform_id)
self._write_transforms = write_transforms
Copy link
Member

@nsuke nsuke Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's a good idea.
It gets in your way if, e.g., you want to save your request bandwidth with extra CPU time on client, and not want to spend CPU time on server and happy to fill more response bandwidth.

I'm inclined not to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll drop it. I was just mimicking FBThrift here but agree with your reasoning.

# specific language governing permissions and limitations
# under the License.
#

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to put a brief comment about unsupported features (like HTTP, and maybe non-blocking server) to save future users' time, here or somewhere noticeable.

Copy link
Contributor Author

@spladug spladug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback (and the super helpful prior art and tests!)

I think I've addressed all the comments so far. New changes are in new commits so you can see the differences more easily. I'll squash down the new commits before merge.


def set_header(self, key, value):
assert isinstance(key, bytes), "header names must be bytes"
assert isinstance(value, bytes), "header values must be bytes"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see anything in the spec about the payloads needing to be text, so I didn't want to arbitrarily limit what the header values could be; being raw bytes allows e.g. a BinaryProtocol struct to be encoded in there if desired. In fact, at Reddit we actually do send an encoded struct in our headers right now.

payload = transform_fn(payload)
if transform_id not in write_transforms:
write_transforms.append(transform_id)
self._write_transforms = write_transforms
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll drop it. I was just mimicking FBThrift here but agree with your reasoning.


# for THeaderProtocol, we must use the same protocol instance for input
# and output so that the response is in the same dialect that the
# server detected the request was in.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, makes sense. I've added a check to the TServer constructor to yell at the user if they explicitly pass a mix of THeaderProtocolFactory and another factory. Is that what you had in mind?

return trans.read(length)


def writeString(trans, value):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since they're not closing around anything in the scope they're used, I felt it was cleaner/clearer to just put 'em out here. That OK?

@spladug spladug force-pushed the THRIFT-4621-THeader-for-Python branch from 2e53e4b to 88de7e5 Compare August 24, 2018 20:31

See doc/specs/HeaderFormat.md for details of the wire format.

"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks

@spladug spladug force-pushed the THRIFT-4621-THeader-for-Python branch from c6634ed to fa9006b Compare August 27, 2018 16:02
@spladug
Copy link
Contributor Author

spladug commented Aug 27, 2018

Thanks! All squashed down :)

@jeking3
Copy link
Contributor

jeking3 commented Aug 29, 2018

Looks good to me as well, but need tests to pass. I can re-kick the Travis ones but I don't know if I can do that for Appveyor because someone else owns the Appveyor account.

@nsuke
Copy link
Member

nsuke commented Sep 1, 2018

Appveyor has been working (now it's a download failure) before squash and Travis passed, so it should be OK.

@nsuke nsuke merged commit 66a44c5 into apache:master Sep 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants