Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary entries by class HeaderReader are strange #1827

Closed
bmxp opened this issue Nov 24, 2019 · 12 comments
Closed

Dictionary entries by class HeaderReader are strange #1827

bmxp opened this issue Nov 24, 2019 · 12 comments
Assignees
Labels
bug

Comments

@bmxp
Copy link

@bmxp bmxp commented Nov 24, 2019

❓ I'm submitting a ...

  • 🐞 bug report
  • 🐣 feature request
  • ❓ question about the decisions made in the repository

🐞 Describe the bug. What is the current behavior?

From cheroot server the class HeaderReader gives strange dict entries like {"B'Status'": b'302 Found' } when using a predefined dictionary like the one in cherrypy.serving.response.headers.

❓ What is the motivation / use case for changing the behavior?
The result should not contain these strange dictionary entries.

πŸ’‘ To Reproduce

Steps to reproduce the behavior:
Consider the following code:

import tempfile
import cherrypy
from cheroot import server

response = tempfile.SpooledTemporaryFile(max_size = 1000, mode = "w+b")

response.write(b'Status: 302 Found\r\nLocation: index.php?\r\nContent-type: text/html; charset=UTF-8\r\n\r\n')

emptydict ={} 
response.seek(0)
hr = server.HeaderReader()
headers = hr(response,cherrypy.serving.response.headers)
print(headers)

print(80*'-')

response.seek(0)
headers = hr(response,emptydict)
print(headers)


print(80*'-')

otherdict ={ 'Content-Type':'text/html'} 
response.seek(0)
headers = hr(response,otherdict)
print(headers)

This will result in output like

{'Content-Type': 'text/html', 'Server': 'CherryPy/18.4.0', 'Date': 'Sat, 23 Nov 2019 20:10:51 GMT', "B'Status'": b'302 Found', "B'Location'": b'index.php?', "B'Content-Type'": b'text/html; charset=UTF-8'}
--------------------------------------------------------------------------------
{b'Status': b'302 Found', b'Location': b'index.php?', b'Content-Type': b'text/html; charset=UTF-8'}
--------------------------------------------------------------------------------
{'Content-Type': 'text/html', b'Status': b'302 Found', b'Location': b'index.php?', b'Content-Type': b'text/html; charset=UTF-8'}

πŸ’‘ Expected behavior

The first printed dict is not ok IMHO. The second and third printed dicts show the expected entries in dict.
A lookup like if 'Status' in dict ... will thus fail and a workaround ist needed.

πŸ“‹ Environment

  • Cheroot version: 8.2.1
  • CherryPy version: 18.4.0
  • Python version: 3.7.3
  • OS: Debian 10 (buster)
  • Browser: not relevant

πŸ“‹ Additional context

@bmxp

This comment has been minimized.

Copy link
Author

@bmxp bmxp commented Nov 24, 2019

Just a small followup:
The dict used by cherrypy.serving.response.headers is from cherrypy.lib.httputil.HeaderMap()
There is a member function:

@staticmethod
  def transform_key(key):
      return str(key).title()

This will cause a bytes object like b'status' to result in "B'Status'" which I would think is wrong. At this place I think it must be decided to first transform the bytes into a string using encoding and not using simply str()

@webknjaz

This comment has been minimized.

Copy link
Member

@webknjaz webknjaz commented Nov 25, 2019

@jaraco this one is odd. CherryPy is py3-only and Cheroot isn't. This is probably one of the issues here. What do you think we should do?

@bmxp

This comment has been minimized.

Copy link
Author

@bmxp bmxp commented Nov 25, 2019

The dict keys should be of type str regardless wether they were bytes or str.
b'Status' --> "Status"
'Status' --> "Status"
Thus when the transform_key(key) encounters a bytes object, it should be converted to a str using ISO-8859-1 decoding. But this would also apply to a value from header. So one would need to thoroughly check all cases in values, to all over cheroot / httputil

@bmxp

This comment has been minimized.

Copy link
Author

@bmxp bmxp commented Nov 25, 2019

Maybe the right place to put a change from bytes to str would indeed be within HeaderReader of cheroot...

@webknjaz

This comment has been minimized.

Copy link
Member

@webknjaz webknjaz commented Nov 25, 2019

I think the best thing to do is to add some validation to cheroot.server.HeaderReader.__call__() method. But I'm not sure about the performance penalty.

@jaraco

This comment has been minimized.

Copy link
Member

@jaraco jaraco commented Nov 27, 2019

CherryPy is py3-only and Cheroot isn't. This is probably one of the issues here. What do you think we should do?

I'm fine with making cheroot Python 3-only, now that the pending pull requests have been merged/released and there's been no major issues reported. But I still don't think that will fix this issue. Since CherryPy is already Python 3-only and the user is testing on Python 3, the issue will remain even after cheroot becomes Python 3-only.

The real problem is that cherrypy.lib.httputil.HeaderMap is still dancing around the Python 2 concept of bytes/string duality. We should figure out what the intention was behind the HeaderMap (should the keys be text or bytes) and eliminate the duality. I'd prefer for the HeaderMap to present the header keys/values as (unicode) text, but much of CherryPy still assumes bytes for its interfaces, so it may be non-trivial to change that expectation.

@jaraco

This comment has been minimized.

Copy link
Member

@jaraco jaraco commented Nov 27, 2019

The issue can be readily replicated thus:

>>> from cherrypy.lib.httputil import HeaderMap
>>> HeaderMap([(b'foo', b'bar')])
{"B'Foo'": b'bar'}
@jaraco

This comment has been minimized.

Copy link
Member

@jaraco jaraco commented Nov 27, 2019

It looks like the origin of str(key).title() is very old. Probably that can be improved.

This ticket probably should be against cherrypy.

@jaraco jaraco transferred this issue from cherrypy/cheroot Nov 27, 2019
@triage-new-issues triage-new-issues bot added the triage label Nov 27, 2019
jaraco added a commit that referenced this issue Nov 27, 2019
@jaraco jaraco mentioned this issue Nov 27, 2019
1 of 6 tasks complete
@webknjaz

This comment has been minimized.

Copy link
Member

@webknjaz webknjaz commented Nov 27, 2019

@jaraco I think we still need to add bytes-validation to cheroot.server.HeaderReader.__call__() too.

@webknjaz webknjaz added the bug label Nov 27, 2019
@triage-new-issues triage-new-issues bot removed the triage label Nov 27, 2019
@jaraco

This comment has been minimized.

Copy link
Member

@jaraco jaraco commented Nov 27, 2019

I think we still need to add bytes-validation to cheroot.server.HeaderReader.__call__() too.

Yes, I think so too. Having mixed bytes/text in the HeaderMap is bad form. Would you be willing to write up a ticket about that issue?

@webknjaz

This comment has been minimized.

Copy link
Member

@webknjaz webknjaz commented Nov 27, 2019

I guess so.

@jaraco jaraco closed this in 399c637 Nov 27, 2019
@webknjaz

This comment has been minimized.

Copy link
Member

@webknjaz webknjaz commented Nov 27, 2019

After looking into it closer, I don't think that such validation belongs there because __call__() just adds headers into the given dict, it doesn't do anything with what already exists there. Such validation would need to be done in the calling code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.