Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

twisted/web/server.py:190 #3059

Closed
Dmole opened this issue Aug 2, 2017 · 14 comments
Closed

twisted/web/server.py:190 #3059

Dmole opened this issue Aug 2, 2017 · 14 comments
Milestone

Comments

@Dmole
Copy link
Contributor

Dmole commented Aug 2, 2017

tribler_7rc2.app

An error occurred during the request:{  
   "error":{  
      "message":"",
      "code":"UnicodeDecodeError",
      "handled":false,
      "trace":[  
         "  File \"twisted/web/server.py\", line 190, in process\n",
         "  File \"twisted/web/server.py\", line 241, in render\n",
         "  File \"twisted/web/resource.py\", line 250, in render\n",
         "  File \"Tribler/Core/Modules/restapi/downloads_endpoint.py\", line 224, in render_GET\n",
         "  File \"json/__init__.py\", line 243, in dumps\n",
         "  File \"json/encoder.py\", line 207, in encode\n",
         "  File \"json/encoder.py\", line 270, in iterencode\n"
      ]
   }
}
@qstokkink qstokkink added this to the V7.0 milestone Aug 23, 2017
@qstokkink
Copy link
Contributor

Looking at the source data this should be caused by one of the following fields in the "downloads" response:

  1. ["downloads"]["files"][_]["name"]
  2. ["downloads"]["destination"]
  3. ["downloads"]["name"]

We'll have to make sure that our unit tests cover these three cases.

@qstokkink qstokkink self-assigned this Sep 1, 2017
@qstokkink
Copy link
Contributor

@devos50 I did some research on JSON grammar & standards and I don't think it would be a good idea after all to simply force JSON to handle non-text characters. A custom implementation or even encode_ascii=False, encoding='latin_1' can screw with a lot of JSON parsers and url-based forwarding of other platforms.

A solution would be to simply encode ALL strings as base64 (or some other format which can be represented as ASCII in JSON):

  1. We will never get UnicodeDecodeErrors again (good)
  2. JSON message size will increase, up to a maximum of 133% (bad)
  3. JSON parsing time will increase (bad)

The alternative would be to do this by hand and find all instances where a JSON field contains binary data. (which is what we're doing now):

  1. UnicodeDecodeErrors will pop up once in a while and will be a pain to debug (bad)
  2. JSON message size will be as small as possible (good)
  3. JSON parsing time will be as fast as possible (good)

What it boils down to is: do we want slow and reliable or fast and unstable? Personally I vote for slow and reliable. Your thoughts?

@synctext
Copy link
Member

synctext commented Sep 1, 2017

reliable at all cost!

What protocol is this in our architecture?

@devos50
Copy link
Contributor

devos50 commented Sep 1, 2017

@qstokkink encoding all strings as base64 is something I've been thinking about a year ago but my main objection against this is that it's not friendly for developers anymore. There are some people who really like the RESTful API and serving them a base64-encoded response will require them to do an additional step in the command line. So I'm slightly more in favor of the current approach and debugging the UnicodeDecodeErrors.

@qstokkink
Copy link
Contributor

@synctext This concerns our GUI 🔃 Core communication / our public REST API.

@devos50 Alright, since you made the GUI code I'll trust your judgement on this. I'll still put in one level of indirection though, such that the error report shows which of the fields could not be encoded (instead of the generic "oops something went wrong").

@devos50
Copy link
Contributor

devos50 commented Sep 1, 2017

@qstokkink I'm not talking about the complexity it adds for us (we can easily adopt our code to support this, it's only one or two lines). The problem is that our API is not 'clean' anymore in a sense that if I want to use the command line to control the Tribler instance (which I do quite often to test new endpoints or get information about existing ones), it's more complicated.

@qstokkink
Copy link
Contributor

@devos50 Exactly, I agree completely: JSON was made as a human readable data container for interaction with an application. Pumping binary strings into JSON is actually misuse of JSON.

However, since we found ourselves in this situation, we do need to deal with these accidental binary strings somehow. So the only thing I suggest to do, is to make it easier to debug when someone does send a binary string (which JSON gives a vague error for, as seen in the error of o.p.).

@devos50
Copy link
Contributor

devos50 commented Sep 1, 2017

@qstokkink that's an ok suggestion. Maybe we should have some generic way to send some key/value info to our servers (only for debugging purposes of course)? Then we can capture the UnicodeDecodeError and be done with it.

@qstokkink
Copy link
Contributor

qstokkink commented Sep 1, 2017

@devos50 You mean forward all failed requests to our servers?

Also, I extended the UnicodeDecodeError message, what I have right now is:

Traceback (most recent call last):
  File "/home/quinten/tribler/tribler/Tribler/Core/Utilities/json_util.py", line 47, in <module>
    dumps({'a': ['ABCDEF', '\xFF', '\xFE'], '\xFF': 1, 5: {2: u'\u1234\x00', 3: '\xFC', 4: (1, 2, 3, '\xFF')}})
  File "/home/quinten/tribler/tribler/Tribler/Core/Utilities/json_util.py", line 40, in dumps
    raise UnicodeDecodeError(e.encoding, str(obj), e.start, e.end, fmtmessage)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x7b in position 0: could not dump:
	dict[a]->list->str::'\xff'
	dict[a]->list->str::'\xfe'
	dict[5]->dict[3]->str::'\xfc'
	dict[5]->dict[4]->tuple->str::'\xff'
	dict::'\xff'

@Dmole
Copy link
Contributor Author

Dmole commented Sep 1, 2017

Improving the error details is good.
I have not looked at this bit of code but generally there are more things to consider depending on what the payload is:

  • encodeURI instead of base64Encode so only non-ascii is converted.
  • deflate before base64Encode can reduce instead of increase size.
  • use a type or content aware wrapper around your json lib to avoid a strict API
    • eg: any string starting with "db64e:" or "urie:" should be decoded~

@qstokkink
Copy link
Contributor

I created a unit test for this and shoved unicode into every possible string. This has probably already been fixed since the release of rc2. I'll create a PR for the unit test itself.
Otherwise, we will have the extended report as implemented in the previous PR.

@Dmole
Copy link
Contributor Author

Dmole commented Sep 5, 2017

"UnicodeDecodeError" seems like you would need to "shoved invalid unicode into every possible string" in order to test it...

@qstokkink
Copy link
Contributor

@Dmole As far as Python's json is concerned there are no invalid unicode strings, only invalid strings (0x80-0xFF will error out). Usually this error occurs when an endpoint improperly handles unicode in its fields. It seems that this is improper string formatting though. This makes the error nigh impossible to debug with just this error message (which is why we added #3075).

For now our only realistic option would be to wait until this error occurs again.

@qstokkink
Copy link
Contributor

qstokkink commented Nov 9, 2017

Finally got this one while downloading Ubuntu:
afbeelding

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 183, in process
    self.render(resrc)
  File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 234, in render
    body = resrc.render(self)
  File "/usr/lib/python2.7/dist-packages/twisted/web/resource.py", line 250, in render
    return m(request)
  File "/home/quinten/Documents/tribler/Tribler/Core/Modules/restapi/downloads_endpoint.py", line 225, in render_GET
    return json.dumps({"downloads": downloads_json})
  File "/home/quinten/Documents/tribler/Tribler/Core/Utilities/json_util.py", line 100, in dumps
    raise error
UnicodeDecodeError: 'utf8' codec can't decode byte 0x7b in position 0: could not dump:
	dict[downloads]->list->dict[peers]->list->dict[extended_version]->str::'\xb5Torrent 1.6.1'

qstokkink added a commit to qstokkink/tribler that referenced this issue Nov 9, 2017
@devos50 devos50 closed this as completed Nov 9, 2017
devos50 pushed a commit to devos50/tribler that referenced this issue Jan 18, 2018
devos50 pushed a commit to devos50/tribler that referenced this issue Jan 18, 2018
devos50 pushed a commit to devos50/tribler that referenced this issue Jan 18, 2018
mitchellolsthoorn pushed a commit to blockchain-storage/tribler that referenced this issue Jan 24, 2018
@Dmole Dmole mentioned this issue Mar 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants