New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accept language wip #335
Accept language wip #335
Conversation
This is a copy of the existing AcceptLanguage._match, with a name reflecting that we are moving away from the existing criterion for a match, and with additional documentation.
This is a copy of AcceptLanguage.__iter__, with tests (there were not any before) and additional documentation.
This is mostly a copy of AcceptLanguage.__contains__. Changes: - It calls ._old_match() instead of ._match(). - It returns False instead of None when no matches found. - Additional documentation and pending-deprecation warning. - Tests changed to use @pytest.mark.parametrize.
This is mostly a copy of `Accept.best_match`, but with the call to ``self._match()`` updated to ``self._old_match()``, and with additional documentation.
This is mostly a copy of `Accept.quality`, with call to ``self._match`` updated to ``self._old_match``, and documentation added.
__iter__ returns an iterator, not a list.
A class returns True for .__nonzero__/.__bool__ by default, but we write a method so we can document it and have a consistent interface across the three AcceptLanguage classes.
This is a copy of NilAccept.best_match, with added documentation and pending deprecation warning.
There is no need to check offer. If we were to check, it would probably be better as an assert, but there is no need, as the offers are specified to be language tags in the documentation. This method was copied from NilAccept.best_match. The original corresponding method AcceptLanguage, AcceptLanguage.best_match, did not call _check_offer either (not even in ._match).
Specifically, using position in header as a tiebreaker does not mean it is an indicator of preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed only the docstrings. This is really good stuff, especially considering the mental gymnastics needed to put it in writing. I've requested minor improvements. Thank you!
src/webob/acceptparse.py
Outdated
that match the language ranges in the header according to the Basic | ||
Filtering matching scheme, in descending order of preference, together | ||
with the qvalue of the range each tag matched. | ||
tags in the `language_tags` argument and returns the ones that match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original version should have used double-backticks around "code-like" things, else they appear italicized in the rendered docs. Would you please amend this PR like so for all instances of language_tags:
``language_tags``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I looked this up at the time and found in PEP 287:
Text enclosed in single backquotes is recognized as "interpreted text", whose interpretation is application-dependent. In the context of a Python docstring, the default interpretation of interpreted text is as Python identifiers. The text will be marked up with a hyperlink connected to the documentation for the identifier given.
There is an example following that paragraph that shows the use of single backquotes for method parameters. And earlier in the PEP, it mentions the use for double backquotes, but for "program I/O or code snippets":
Inline literals use double-backquotes to indicate program I/O or code snippets. No markup interpretation (including backslash-escape [] interpretation) is done within inline literals.
Does that matter, or should I go with double backquotes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know about that. Let's leave it as single backticks.
As an aside, I think I might have screwed up the markup in Pyramid docstrings along the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to say that I didn't think Sphinx did anything special with single backticks other than italics, but I just checked and, there looks to be something potentially useful in the note at the top of this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single backticks alone render in HTML markup as <cite>
:
<cite>language_tags</cite>
As an aside, I wish Sphinx had used <code>
or <pre>
instead, because <cite>
is semantically incorrect. ¯\_(ツ)_/¯
By default most web browsers use italics for styling <cite>
. We could style the <cite>
tags however we want, say as preformatted or monospace code, but italics are fine for now.
The page you linked uses a :role:
preceding the single backticked item. You could try the :any:
role preceding language_tags
and see if magic happens or if the HTML markup changes to something not italicized. Else there might be a specific existing role under the Python domain in Sphinx. Finally we could define a custom role for parameters, if none of the existing ones are satisfactory, but I think that would be too much effort for very little or no benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea of the single backticks in Sphinx and the default_role
is that we wouldn't need to specify a role — there's an example here, from one of the links in the note. So if they are cross-references, <cite>
kind of makes sense? But I don't think that works for parameters like language_tags
, so it doesn't help us there.
Italics kinda makes sense to me for parameters, although <cite>
doesn't without cross-references. If you'd prefer them styled differently, maybe we can look into it after GSoC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're pretty much in agreement. If doing this:
:any:`language_tags`
...does not generate a useful link or reference, then there's no point in pursuing it.
As far as styles go, I've opened up a new issue to resolve this part of the discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you mean.
I tried
:any:`language_tags`
and got an InvocationError: 'any' reference target not found: language_tags
. Skimming through the docs, it doesn't look like there's currently a way to cross-reference a parameter. I'll come back to your issue after the work on the headers is done and merged, to see if I can help (if it's not resolved by then). Thanks.
src/webob/acceptparse.py
Outdated
the header according to the matching scheme. The returned list is a | ||
list of (language tag, qvalue) tuples, in descending order of qvalue; | ||
if two or more tags have the same qvalue, they are returned in the same | ||
order as the order in the header of the ranges they matched. If the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is less klunky: "order as that in the header"
src/webob/acceptparse.py
Outdated
tags in the `language_tags` argument and returns the ones that match | ||
the header according to the matching scheme. The returned list is a | ||
list of (language tag, qvalue) tuples, in descending order of qvalue; | ||
if two or more tags have the same qvalue, they are returned in the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's break this into two sentences to be consistent with the next sentence. IOW, "...qvalue. If..."
src/webob/acceptparse.py
Outdated
order as the order in the header of the ranges they matched. If the | ||
matched range is the same for two or more tags (i.e. their matched | ||
ranges have the same qvalue and the same position in the header), then | ||
they are returned in the same order as their order in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shorten: "returned in the same order as that in the"
src/webob/acceptparse.py
Outdated
ranges have the same qvalue and the same position in the header), then | ||
they are returned in the same order as their order in the | ||
`language_tags` argument. (If `language_tags` is unordered, e.g. if it | ||
is a set or a dict, then that order may not be reliable.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove parentheses, as this is really not a parenthetical statement, but a sentence of distinct importance.
src/webob/acceptparse.py
Outdated
the `language_tags` argument is used as tiebreaker. (If `language_tags` | ||
is unordered, e.g. if it is a set or a dict, then that order may not be | ||
reliable.) | ||
and there is one or more ``*`` language range in the header, then: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strip colon "then"
src/webob/acceptparse.py
Outdated
reliable.) | ||
and there is one or more ``*`` language range in the header, then: | ||
if any of the ``*`` language ranges have ``q=0``, the language tag | ||
is filtered out. Otherwise, the language tag is considered a match. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"is filtered out, else the language tag..."
@stevepiercy Other than the one on backquotes that I had a question on, I've made the changes you requested. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your diligence and setting me right.
Re-use OWS_re, and since we are assembling so many regexes by adding strings, switch all regexes to use addition instead of triple quotes, so that we don't have to remember to use re.VERBOSE when compiling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great, I've got a couple of nitpicky issues and a question or two! I am looking forward to merging this massive improvement to the Accept Language handling!
src/webob/acceptparse.py
Outdated
def _match(self, mask, item): | ||
|
||
@classmethod | ||
def python_value_to_header_str(cls, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an underscore to the name. This way it is markedly non-public and makes it simpler to see so when reading the code.
src/webob/acceptparse.py
Outdated
return self._parsed | ||
|
||
@classmethod | ||
def parse(cls, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AcceptLanguageValidHeader
would basically move wholesale into AcceptLanguage
if I understand @mmerickel's comments correctly, and then the sub-classes that are NoHeader
or InvalidHeader
would have .parse()
and others raise ValueError
or some other exception?
Do I understand this right @mmerickel?
src/webob/acceptparse.py
Outdated
'The behavior of AcceptLanguageValidHeader.best_match is ' | ||
'currently being maintained for backward compatibility, but it may' | ||
' be deprecated in the future as it does not conform to the RFC.', | ||
PendingDeprecationWarning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make this a DeprecationWarning
.
src/webob/acceptparse.py
Outdated
not already matched by other ranges within the header are | ||
unacceptable. | ||
""" | ||
assert not (default_tag is None and default is None), \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using assert, use an if statement. assert statements may be turned off globally, and while I tend to believe that if you break it you've bought it, I'd prefer not to have asserts in anything put testing code.
Unless it is a condition that is not likely to ever happen, and is just a sanity check.
src/webob/acceptparse.py
Outdated
# whether it has been specified as not acceptable with a q=0 range in | ||
# the header) or not (in which case we can just return the value). | ||
|
||
assert default_range != '*', 'default_range cannot be *.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here, no assert please.
src/webob/acceptparse.py
Outdated
except TypeError: # default is not a callable | ||
return default | ||
|
||
def quality(self, offer, modifier=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return the quality of the given offer. Returns None if there is no match (not 0).
That's the docs on .quality()
at the moment. Feel free to remove modifier
since it is not documented what it does, or why it is set to 1 by default.
Set it to 1.
src/webob/acceptparse.py
Outdated
'The behavior of AcceptLanguageValidHeader.quality is' | ||
'currently being maintained for backward compatibility, but it may' | ||
' be deprecated in the future as it does not conform to the RFC.', | ||
PendingDeprecationWarning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a DeprecationWarning
. I really don't think it is a good idea to keep this around for very long.
|
||
def fget(request): | ||
"""Get an object representing the header in the request.""" | ||
return create_accept_language_header( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we return a brand new AcceptLanguage*
object each time, yet when we assign a new AcceptLanguage
using fset we set it directly to the AcceptLanguage object.
Would it make sense here to check if it is a string, and if so then run the create_accept_language_header
function, followed by setting the environ key to the newly created object? This way if a user calls request.accept_language
twice they only have the overhead of creating the object the first time? If they modify request.environ
directly it would no longer be an AcceptLanguage
object and would correctly re-create it.
Another thing I noticed is that with your new __add__
functions since they don't modify the existing one but create a new one something like this is not possible, or am I mistaken:
request.accept_langauge += 'nl-NL;q=0.7'
Since that would return a new object and request.accept_language
would be untouched. I can live with that... just not what I expect at the moment based upon the other properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't assign a new AcceptLanguage
when we fset though: fset only changes the header value in the environ, so that the next time we fget, a brand new AcceptLanguage*
object is created using that new header value in the environ. This is the same as how it current works in accept_property
.
I can't quite understand the second paragraph? In terms of overhead, we talked a while back about whether we should cache the object, as a new object was being created on every attribute read, and you said that it wasn't a concern at that point — is that related to what you're asking here?
With the __add__
functions,
request.accept_language += 'nl-NL;q=0.7'
is the same as
request.accept_language = request.accept_language + 'nl-NL;q=0.7'
so it works :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent. No worries, not sure what I was thinking. We are good here!
With these changes, I hope that we have satisfactorily fixed the issue brought up by the original poster in #256 |
@bertjwregeer: I've made the changes you requested — thanks for reviewing! With #256, |
Yes, you may definitely comment on closed issues! Were you able to make the changes that @mmerickel suggested regarding the duck typing? I think that is important before merging this in! |
@bertjwregeer Have made the changes, please see here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless @mmerickel has any other suggestions or changes he would like to see, I am ready to merge this!
Thank you for your hard work!
.. autoclass:: AcceptLanguageInvalidHeader | ||
:members: header_value, parsed, __init__, __add__, __contains__, __iter__, | ||
__radd__, __str__, parse, basic_filtering, best_match, lookup, | ||
quality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the subclasses from the public api now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can if we combine the docstrings for all the subclasses for each method and put them all into AcceptLanguage
. This would imo make some already complicated docstrings even more complicated, adding another layer to the branching. And as I mentioned before, the instances would still i.d. as the subclasses (I guess we could change the __repr__
to indicate that it is a subclass of AcceptLanguage
, so people would know to look there?) If you want me to go ahead and make the change, I will — it's not trivial, and will take some time and thought on how to combine the docstrings from the three subclasses for each method. (I would also have to do the same for the other three headers.)
Would it be possible to merge this so I can push changes for this and the other three headers before the deadline tomorrow to a more appropriately-named PR, or should I push the changes for all four headers to this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine if we want to merge this now. I'm happy with that.
Part of the idea of making the subclasses "hidden" is not that people don't know about them but that they don't care. I'm more worried about the docs the user sees than the actual objects they see in the repl. With that in mind we'd want to make the docstrings focus on valid headers with just a sentence or two in each that explains what happens with invalid/missing data. I can't imagine that will make the docstrings that much more complex or harder to parse but I could be wrong!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was worrying whether it would be clear enough that people should look at the docs for AcceptLanguage
if the objects identify as the subclasses, but I understand what you are looking for, and will try to combine the docstrings and document one api under the base class after I've finished wrapping up the four headers and all the necessary submissions for gsoc. Could you please merge this PR if everything is ok, so I can open a new PR and push the other changes? (Sorry about the last-minute commits, they are minor changes to fix small mistakes.) Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @whiteroses I'm happy with all of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, any future work can be done in a different PR.
Thank you very much @whiteroses, any further changes can be made on a new PR against master! |
Thanks @bertjwregeer! |
Some remaining questions:
fget
inaccept_language_property
, like the existingfget
inaccept_property
, creates a new instance every time. Is that necessary, and might it be good to cache the instance? Most people would probably not expect the header to be re-parsed every time they accessrequest.accept_language
.warn()
withDeprecationWarning
andPendingDeprecationWarning
is now ignored by default from Python version 2.7. I saw quite a fewDeprecationWarning
s in WebOb -- is that an issue?.quality()
and.best_match()
, so that they conform to the RFC in how they handle e.g.q=0
and*
, but they would still be their own unique algorithms that are not specified or mentioned in the RFCs. (And I'm pretty surebest_match()
does not implement the matching scheme used in Apache, at least not forAccept-Language
?) So they are probably not methods you'd want to maintain in the long run -- is it worth fixing those issues when we would like people to move away from them so you can deprecate them?AcceptLanguageValidHeader
,AcceptLanguageInvalidHeader
, andAcceptLanguageNoHeader
all inherit fromAcceptLanguage
, so they can all be identified as anAcceptLanguage
header. Would it be worth makingAcceptLanguage
an abstract base class and specifying the properties and methods that can be expected in anAcceptLanguage
(sub)class?modifier
parameter inAccept.quality()
and thedefault_quality
parameter inNilAccept.quality()
? (I explained the issues with themodifier
parameter with comments inAcceptLanguageValidHeader.quality()
and thedefault_quality
parameter is not used at all inNilAccept.quality
, and does not appear to match any corresponding parameter inAccept.quality()
either.)Please let me know if there is anything else you'd like me to change. Thanks!