UTF-8 handling #103

Closed
Lawouach opened this Issue Mar 25, 2012 · 6 comments

Comments

Projects
None yet
3 participants

Hi,

I'm using your utf8validator module (thanks a lot for it ;)) and much like you, I try to validate each frame. However, ws4py will fail the connection as soon as a frame contains an invalid payload whereas it seems autobahn will continue reading until the end frame even though it also validated the frame.

I'm not sur to understand what is the expected behavior. The spec is a bit unclear in regards to frame handling. So do we have to validate each frame's payload? Do we have to initiate the closing handshake when a frame has an invalid payload?

Thanks.

Contributor

zaphoyd commented Mar 25, 2012

The WebSocket specification requires only that messages with invalid payloads be rejected. This means a conforming implementation need not reject the message until all frames have been read. By incrementally validating each at each frame (or each TCP packet) an implementation can fail sooner and not waste resources buffering the remainder of the message.

RFC Section 7.1.7 "Fail the WebSocket Connection" talks more about the details of failing the connection. Specifically an implementation has two options in the case of receiving invalid payloads (such as invalid UTF8). One is to drop the TCP connection immediately. The other is to send a close frame with the invalid payload close code (1007) and then drop the TCP connection.

The Autobahn test suite considers the RFC definition of conformation with respect to failing the connection due to invalid payload data. The one exception is section 6.4 "Fail-Fast on invalid UTF8". This section explicitly tests whether or not an implementation is capable of failing earlier (i.e. on frame or TCP message boundaries). This behavior is not required by the spec but it can improve performance by not wasting resources buffering messages that it knows it will reject.

I will admit I'm a little fuzzy about "By incrementally validating each at each frame (or each TCP packet) an implementation can fail sooner and not waste resources buffering the remainder of the message."

How do you decide a payload is invalid when your frame is ending on a code point that is not a code point end (like case 6.2.3)?

Moreover I still don't get why some tests in 6.6 are marked as failing (well their termination) when it's the testsuite that takes a stand on how it should have been terminated.

http://www.defuze.org/oss/ws4py/testreports/servers/0.2.1/

Contributor

zaphoyd commented Mar 25, 2012

in the case of 6.2.3, the implementation would need to wait until the next frame/TCP packet to be able to determine if the message was invalid. i.e. message decoding state would need to be maintained across frame boundaries.

Regarding the 6.6.* tests, If you look at the wirelog, what is happening is that the server being tested is echoing the invalid UTF8 back (which should be a test fail) and the AB fuzzing client is performing the "fail on payload violation" steps. The termination test "fails" because the fuzzer had to initiate the fail when normally the server being tested should have.

In this case, the AB fuzzer should have some logic to detect and flag this case as a full test failure rather than test pass+termination failure. Either way, ws4py is failing the test and may need some adjustments to its UTF8 validation logic.

Owner

oberstet commented Mar 25, 2012

Yeah, would say the same as Peter.

"""In this case, the AB fuzzer should have some logic to detect and flag this case as a full test failure rather than test pass+termination failure."""

Yep, also correct. This needs improvement in the test suite case: detect that stuff was returned by the testee and make it a full fail (instead of wrong peer failed .. which is just a consequence of the fuzzer itself detecting the echo'ed invalid utf8).

This can be done today without new hooks in websocket.py since there is an option to turn off incoming payload validation. Then, if anything is echoed, the case can detect it and make it a full fail.

Right I found the issue. I was only relying on the first flag of the validate quadruplet. I wasn't inspecting the value of the second flag regarding ending on a end code point. I've fixed my usage of utf8validator and it works fine. All the autobahn tests are now green with ws4py :)

Owner

oberstet commented Mar 28, 2012

Great!! Welcome to the club .. we are quite exclusive;)

oberstet closed this Feb 10, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment