Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary entries by class HeaderReader are strange #1827

Closed
1 of 3 tasks
bmxp opened this issue Nov 24, 2019 · 12 comments
Closed
1 of 3 tasks

Dictionary entries by class HeaderReader are strange #1827

bmxp opened this issue Nov 24, 2019 · 12 comments
Assignees
Labels

Comments

@bmxp
Copy link

bmxp commented Nov 24, 2019

❓ I'm submitting a ...

  • 🐞 bug report
  • 🐣 feature request
  • ❓ question about the decisions made in the repository

🐞 Describe the bug. What is the current behavior?

From cheroot server the class HeaderReader gives strange dict entries like {"B'Status'": b'302 Found' } when using a predefined dictionary like the one in cherrypy.serving.response.headers.

❓ What is the motivation / use case for changing the behavior?
The result should not contain these strange dictionary entries.

πŸ’‘ To Reproduce

Steps to reproduce the behavior:
Consider the following code:

import tempfile
import cherrypy
from cheroot import server

response = tempfile.SpooledTemporaryFile(max_size = 1000, mode = "w+b")

response.write(b'Status: 302 Found\r\nLocation: index.php?\r\nContent-type: text/html; charset=UTF-8\r\n\r\n')

emptydict ={} 
response.seek(0)
hr = server.HeaderReader()
headers = hr(response,cherrypy.serving.response.headers)
print(headers)

print(80*'-')

response.seek(0)
headers = hr(response,emptydict)
print(headers)


print(80*'-')

otherdict ={ 'Content-Type':'text/html'} 
response.seek(0)
headers = hr(response,otherdict)
print(headers)

This will result in output like

{'Content-Type': 'text/html', 'Server': 'CherryPy/18.4.0', 'Date': 'Sat, 23 Nov 2019 20:10:51 GMT', "B'Status'": b'302 Found', "B'Location'": b'index.php?', "B'Content-Type'": b'text/html; charset=UTF-8'}
--------------------------------------------------------------------------------
{b'Status': b'302 Found', b'Location': b'index.php?', b'Content-Type': b'text/html; charset=UTF-8'}
--------------------------------------------------------------------------------
{'Content-Type': 'text/html', b'Status': b'302 Found', b'Location': b'index.php?', b'Content-Type': b'text/html; charset=UTF-8'}

πŸ’‘ Expected behavior

The first printed dict is not ok IMHO. The second and third printed dicts show the expected entries in dict.
A lookup like if 'Status' in dict ... will thus fail and a workaround ist needed.

πŸ“‹ Environment

  • Cheroot version: 8.2.1
  • CherryPy version: 18.4.0
  • Python version: 3.7.3
  • OS: Debian 10 (buster)
  • Browser: not relevant

πŸ“‹ Additional context

@bmxp
Copy link
Author

bmxp commented Nov 24, 2019

Just a small followup:
The dict used by cherrypy.serving.response.headers is from cherrypy.lib.httputil.HeaderMap()
There is a member function:

@staticmethod
  def transform_key(key):
      return str(key).title()

This will cause a bytes object like b'status' to result in "B'Status'" which I would think is wrong. At this place I think it must be decided to first transform the bytes into a string using encoding and not using simply str()

@webknjaz
Copy link
Member

@jaraco this one is odd. CherryPy is py3-only and Cheroot isn't. This is probably one of the issues here. What do you think we should do?

@bmxp
Copy link
Author

bmxp commented Nov 25, 2019

The dict keys should be of type str regardless wether they were bytes or str.
b'Status' --> "Status"
'Status' --> "Status"
Thus when the transform_key(key) encounters a bytes object, it should be converted to a str using ISO-8859-1 decoding. But this would also apply to a value from header. So one would need to thoroughly check all cases in values, to all over cheroot / httputil

@bmxp
Copy link
Author

bmxp commented Nov 25, 2019

Maybe the right place to put a change from bytes to str would indeed be within HeaderReader of cheroot...

@webknjaz
Copy link
Member

I think the best thing to do is to add some validation to cheroot.server.HeaderReader.__call__() method. But I'm not sure about the performance penalty.

@jaraco
Copy link
Member

jaraco commented Nov 27, 2019

CherryPy is py3-only and Cheroot isn't. This is probably one of the issues here. What do you think we should do?

I'm fine with making cheroot Python 3-only, now that the pending pull requests have been merged/released and there's been no major issues reported. But I still don't think that will fix this issue. Since CherryPy is already Python 3-only and the user is testing on Python 3, the issue will remain even after cheroot becomes Python 3-only.

The real problem is that cherrypy.lib.httputil.HeaderMap is still dancing around the Python 2 concept of bytes/string duality. We should figure out what the intention was behind the HeaderMap (should the keys be text or bytes) and eliminate the duality. I'd prefer for the HeaderMap to present the header keys/values as (unicode) text, but much of CherryPy still assumes bytes for its interfaces, so it may be non-trivial to change that expectation.

@jaraco
Copy link
Member

jaraco commented Nov 27, 2019

The issue can be readily replicated thus:

>>> from cherrypy.lib.httputil import HeaderMap
>>> HeaderMap([(b'foo', b'bar')])
{"B'Foo'": b'bar'}

@jaraco
Copy link
Member

jaraco commented Nov 27, 2019

It looks like the origin of str(key).title() is very old. Probably that can be improved.

This ticket probably should be against cherrypy.

@jaraco jaraco transferred this issue from cherrypy/cheroot Nov 27, 2019
jaraco added a commit that referenced this issue Nov 27, 2019
@webknjaz
Copy link
Member

@jaraco I think we still need to add bytes-validation to cheroot.server.HeaderReader.__call__() too.

@jaraco
Copy link
Member

jaraco commented Nov 27, 2019

I think we still need to add bytes-validation to cheroot.server.HeaderReader.__call__() too.

Yes, I think so too. Having mixed bytes/text in the HeaderMap is bad form. Would you be willing to write up a ticket about that issue?

@webknjaz
Copy link
Member

I guess so.

@jaraco jaraco closed this as completed in 399c637 Nov 27, 2019
@webknjaz
Copy link
Member

After looking into it closer, I don't think that such validation belongs there because __call__() just adds headers into the given dict, it doesn't do anything with what already exists there. Such validation would need to be done in the calling code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants