Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept language wip #335

Merged
merged 99 commits into from Aug 28, 2017
Merged

Conversation

whiteroses
Copy link
Contributor

@whiteroses whiteroses commented Jul 25, 2017

Some remaining questions:

  1. How would you like me to do the imports in the tests? I can switch to module scope or fixtures, if either is your preference. (Module scope does seem to produce quite a lot of noise when one import is broken though, especially with tox and coverage, and I don't think fixtures are being used for imports in any WebOb tests at the moment.)
  2. Would you prefer I switch from test classes to test functions? I see benefits to using test classes (namespacing, so e.g. we wouldn't have to repeat the class name in every test function name, leaving more room to be more descriptive with the name of the test; classes can be folded in editors), but don't know the benefits of test functions over classes. Happy to learn, and switch if test functions are your preference.
  3. The fget in accept_language_property, like the existing fget in accept_property, creates a new instance every time. Is that necessary, and might it be good to cache the instance? Most people would probably not expect the header to be re-parsed every time they access request.accept_language.
  4. I did what I think you might want with the deprecation warnings, but please let me know if you'd like me to change or remove them. Also, calling warn() with DeprecationWarning and PendingDeprecationWarning is now ignored by default from Python version 2.7. I saw quite a few DeprecationWarnings in WebOb -- is that an issue?
  5. I think I can fix some of the issues in .quality() and .best_match(), so that they conform to the RFC in how they handle e.g. q=0 and *, but they would still be their own unique algorithms that are not specified or mentioned in the RFCs. (And I'm pretty sure best_match() does not implement the matching scheme used in Apache, at least not for Accept-Language?) So they are probably not methods you'd want to maintain in the long run -- is it worth fixing those issues when we would like people to move away from them so you can deprecate them?
  6. AcceptLanguageValidHeader, AcceptLanguageInvalidHeader, and AcceptLanguageNoHeader all inherit from AcceptLanguage, so they can all be identified as an AcceptLanguage header. Would it be worth making AcceptLanguage an abstract base class and specifying the properties and methods that can be expected in an AcceptLanguage (sub)class?
  7. What should I do about the modifier parameter in Accept.quality() and the default_quality parameter in NilAccept.quality()? (I explained the issues with the modifier parameter with comments in AcceptLanguageValidHeader.quality() and the default_quality parameter is not used at all in NilAccept.quality, and does not appear to match any corresponding parameter in Accept.quality() either.)

Please let me know if there is anything else you'd like me to change. Thanks!

This is a copy of the existing AcceptLanguage._match, with a name
reflecting that we are moving away from the existing criterion for a
match, and with additional documentation.
This is a copy of AcceptLanguage.__iter__, with tests (there were not
any before) and additional documentation.
This is mostly a copy of AcceptLanguage.__contains__.

Changes:
- It calls ._old_match() instead of ._match().
- It returns False instead of None when no matches found.
- Additional documentation and pending-deprecation warning.
- Tests changed to use @pytest.mark.parametrize.
This is mostly a copy of `Accept.best_match`, but with the call to
``self._match()`` updated to ``self._old_match()``, and with additional
documentation.
This is mostly a copy of `Accept.quality`, with call to ``self._match``
updated to ``self._old_match``, and documentation added.
__iter__ returns an iterator, not a list.
A class returns True for .__nonzero__/.__bool__ by default, but we write
a method so we can document it and have a consistent interface across
the three AcceptLanguage classes.
This is a copy of NilAccept.best_match, with added documentation and
pending deprecation warning.
There is no need to check offer. If we were to check, it would probably
be better as an assert, but there is no need, as the offers are
specified to be language tags in the documentation.

This method was copied from NilAccept.best_match. The original
corresponding method AcceptLanguage, AcceptLanguage.best_match, did not
call _check_offer either (not even in ._match).
Specifically, using position in header as a tiebreaker does not mean it
is an indicator of preference.
Copy link
Member

@stevepiercy stevepiercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed only the docstrings. This is really good stuff, especially considering the mental gymnastics needed to put it in writing. I've requested minor improvements. Thank you!

that match the language ranges in the header according to the Basic
Filtering matching scheme, in descending order of preference, together
with the qvalue of the range each tag matched.
tags in the `language_tags` argument and returns the ones that match
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original version should have used double-backticks around "code-like" things, else they appear italicized in the rendered docs. Would you please amend this PR like so for all instances of language_tags:

``language_tags``

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I looked this up at the time and found in PEP 287:

Text enclosed in single backquotes is recognized as "interpreted text", whose interpretation is application-dependent. In the context of a Python docstring, the default interpretation of interpreted text is as Python identifiers. The text will be marked up with a hyperlink connected to the documentation for the identifier given.

There is an example following that paragraph that shows the use of single backquotes for method parameters. And earlier in the PEP, it mentions the use for double backquotes, but for "program I/O or code snippets":

Inline literals use double-backquotes to indicate program I/O or code snippets. No markup interpretation (including backslash-escape [] interpretation) is done within inline literals.

Does that matter, or should I go with double backquotes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about that. Let's leave it as single backticks.

As an aside, I think I might have screwed up the markup in Pyramid docstrings along the way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say that I didn't think Sphinx did anything special with single backticks other than italics, but I just checked and, there looks to be something potentially useful in the note at the top of this page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single backticks alone render in HTML markup as <cite>:

<cite>language_tags</cite>

As an aside, I wish Sphinx had used <code> or <pre> instead, because <cite> is semantically incorrect. ¯\_(ツ)_/¯

By default most web browsers use italics for styling <cite>. We could style the <cite> tags however we want, say as preformatted or monospace code, but italics are fine for now.

The page you linked uses a :role: preceding the single backticked item. You could try the :any: role preceding language_tags and see if magic happens or if the HTML markup changes to something not italicized. Else there might be a specific existing role under the Python domain in Sphinx. Finally we could define a custom role for parameters, if none of the existing ones are satisfactory, but I think that would be too much effort for very little or no benefit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea of the single backticks in Sphinx and the default_role is that we wouldn't need to specify a role — there's an example here, from one of the links in the note. So if they are cross-references, <cite> kind of makes sense? But I don't think that works for parameters like language_tags, so it doesn't help us there.

Italics kinda makes sense to me for parameters, although <cite> doesn't without cross-references. If you'd prefer them styled differently, maybe we can look into it after GSoC?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're pretty much in agreement. If doing this:

:any:`language_tags`

...does not generate a useful link or reference, then there's no point in pursuing it.

As far as styles go, I've opened up a new issue to resolve this part of the discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean.

I tried

:any:`language_tags`

and got an InvocationError: 'any' reference target not found: language_tags. Skimming through the docs, it doesn't look like there's currently a way to cross-reference a parameter. I'll come back to your issue after the work on the headers is done and merged, to see if I can help (if it's not resolved by then). Thanks.

the header according to the matching scheme. The returned list is a
list of (language tag, qvalue) tuples, in descending order of qvalue;
if two or more tags have the same qvalue, they are returned in the same
order as the order in the header of the ranges they matched. If the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is less klunky: "order as that in the header"

tags in the `language_tags` argument and returns the ones that match
the header according to the matching scheme. The returned list is a
list of (language tag, qvalue) tuples, in descending order of qvalue;
if two or more tags have the same qvalue, they are returned in the same
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's break this into two sentences to be consistent with the next sentence. IOW, "...qvalue. If..."

order as the order in the header of the ranges they matched. If the
matched range is the same for two or more tags (i.e. their matched
ranges have the same qvalue and the same position in the header), then
they are returned in the same order as their order in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorten: "returned in the same order as that in the"

ranges have the same qvalue and the same position in the header), then
they are returned in the same order as their order in the
`language_tags` argument. (If `language_tags` is unordered, e.g. if it
is a set or a dict, then that order may not be reliable.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove parentheses, as this is really not a parenthetical statement, but a sentence of distinct importance.

the `language_tags` argument is used as tiebreaker. (If `language_tags`
is unordered, e.g. if it is a set or a dict, then that order may not be
reliable.)
and there is one or more ``*`` language range in the header, then:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strip colon "then"

reliable.)
and there is one or more ``*`` language range in the header, then:
if any of the ``*`` language ranges have ``q=0``, the language tag
is filtered out. Otherwise, the language tag is considered a match.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is filtered out, else the language tag..."

@whiteroses
Copy link
Contributor Author

@stevepiercy Other than the one on backquotes that I had a question on, I've made the changes you requested. Thanks!

Copy link
Member

@stevepiercy stevepiercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your diligence and setting me right.

Re-use OWS_re, and since we are assembling so many regexes by adding
strings, switch all regexes to use addition instead of triple quotes, so
that we don't have to remember to use re.VERBOSE when compiling.
Copy link
Member

@digitalresistor digitalresistor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, I've got a couple of nitpicky issues and a question or two! I am looking forward to merging this massive improvement to the Accept Language handling!

def _match(self, mask, item):

@classmethod
def python_value_to_header_str(cls, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an underscore to the name. This way it is markedly non-public and makes it simpler to see so when reading the code.

return self._parsed

@classmethod
def parse(cls, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AcceptLanguageValidHeader would basically move wholesale into AcceptLanguage if I understand @mmerickel's comments correctly, and then the sub-classes that are NoHeader or InvalidHeader would have .parse() and others raise ValueError or some other exception?

Do I understand this right @mmerickel?

'The behavior of AcceptLanguageValidHeader.best_match is '
'currently being maintained for backward compatibility, but it may'
' be deprecated in the future as it does not conform to the RFC.',
PendingDeprecationWarning,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make this a DeprecationWarning.

not already matched by other ranges within the header are
unacceptable.
"""
assert not (default_tag is None and default is None), \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using assert, use an if statement. assert statements may be turned off globally, and while I tend to believe that if you break it you've bought it, I'd prefer not to have asserts in anything put testing code.

Unless it is a condition that is not likely to ever happen, and is just a sanity check.

# whether it has been specified as not acceptable with a q=0 range in
# the header) or not (in which case we can just return the value).

assert default_range != '*', 'default_range cannot be *.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, no assert please.

except TypeError: # default is not a callable
return default

def quality(self, offer, modifier=1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return the quality of the given offer. Returns None if there is no match (not 0).

That's the docs on .quality() at the moment. Feel free to remove modifier since it is not documented what it does, or why it is set to 1 by default.

Set it to 1.

'The behavior of AcceptLanguageValidHeader.quality is'
'currently being maintained for backward compatibility, but it may'
' be deprecated in the future as it does not conform to the RFC.',
PendingDeprecationWarning,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a DeprecationWarning. I really don't think it is a good idea to keep this around for very long.


def fget(request):
"""Get an object representing the header in the request."""
return create_accept_language_header(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we return a brand new AcceptLanguage* object each time, yet when we assign a new AcceptLanguage using fset we set it directly to the AcceptLanguage object.

Would it make sense here to check if it is a string, and if so then run the create_accept_language_header function, followed by setting the environ key to the newly created object? This way if a user calls request.accept_language twice they only have the overhead of creating the object the first time? If they modify request.environ directly it would no longer be an AcceptLanguage object and would correctly re-create it.


Another thing I noticed is that with your new __add__ functions since they don't modify the existing one but create a new one something like this is not possible, or am I mistaken:

request.accept_langauge += 'nl-NL;q=0.7'

Since that would return a new object and request.accept_language would be untouched. I can live with that... just not what I expect at the moment based upon the other properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't assign a new AcceptLanguage when we fset though: fset only changes the header value in the environ, so that the next time we fget, a brand new AcceptLanguage* object is created using that new header value in the environ. This is the same as how it current works in accept_property.

I can't quite understand the second paragraph? In terms of overhead, we talked a while back about whether we should cache the object, as a new object was being created on every attribute read, and you said that it wasn't a concern at that point — is that related to what you're asking here?

With the __add__ functions,

request.accept_language += 'nl-NL;q=0.7'

is the same as

request.accept_language = request.accept_language + 'nl-NL;q=0.7'

so it works :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. No worries, not sure what I was thinking. We are good here!

@digitalresistor
Copy link
Member

With these changes, I hope that we have satisfactorily fixed the issue brought up by the original poster in #256

@whiteroses
Copy link
Contributor Author

@bertjwregeer: I've made the changes you requested — thanks for reviewing! With #256, .lookup() covers that specific case; I can make a comment on the issue explaining this after the PR is merged. I'd also like to update some of the information in these issues, as my understanding of them has changed, and the information there may be useful to others (I was already finding these issues turning up near the top in search results when looking up various aspects of Accept* headers implementation!) We can still comment on closed issues, right?

@digitalresistor
Copy link
Member

Yes, you may definitely comment on closed issues!

Were you able to make the changes that @mmerickel suggested regarding the duck typing? I think that is important before merging this in!

@whiteroses
Copy link
Contributor Author

@bertjwregeer Have made the changes, please see here.

Copy link
Member

@digitalresistor digitalresistor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless @mmerickel has any other suggestions or changes he would like to see, I am ready to merge this!

Thank you for your hard work!

.. autoclass:: AcceptLanguageInvalidHeader
:members: header_value, parsed, __init__, __add__, __contains__, __iter__,
__radd__, __str__, parse, basic_filtering, best_match, lookup,
quality
Copy link
Member

@mmerickel mmerickel Aug 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the subclasses from the public api now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can if we combine the docstrings for all the subclasses for each method and put them all into AcceptLanguage. This would imo make some already complicated docstrings even more complicated, adding another layer to the branching. And as I mentioned before, the instances would still i.d. as the subclasses (I guess we could change the __repr__ to indicate that it is a subclass of AcceptLanguage, so people would know to look there?) If you want me to go ahead and make the change, I will — it's not trivial, and will take some time and thought on how to combine the docstrings from the three subclasses for each method. (I would also have to do the same for the other three headers.)

Would it be possible to merge this so I can push changes for this and the other three headers before the deadline tomorrow to a more appropriately-named PR, or should I push the changes for all four headers to this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine if we want to merge this now. I'm happy with that.

Part of the idea of making the subclasses "hidden" is not that people don't know about them but that they don't care. I'm more worried about the docs the user sees than the actual objects they see in the repl. With that in mind we'd want to make the docstrings focus on valid headers with just a sentence or two in each that explains what happens with invalid/missing data. I can't imagine that will make the docstrings that much more complex or harder to parse but I could be wrong!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worrying whether it would be clear enough that people should look at the docs for AcceptLanguage if the objects identify as the subclasses, but I understand what you are looking for, and will try to combine the docstrings and document one api under the base class after I've finished wrapping up the four headers and all the necessary submissions for gsoc. Could you please merge this PR if everything is ok, so I can open a new PR and push the other changes? (Sorry about the last-minute commits, they are minor changes to fix small mistakes.) Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @whiteroses I'm happy with all of this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, any future work can be done in a different PR.

@digitalresistor digitalresistor merged commit 897da0e into Pylons:master Aug 28, 2017
@digitalresistor
Copy link
Member

Thank you very much @whiteroses, any further changes can be made on a new PR against master!

@whiteroses
Copy link
Contributor Author

Thanks @bertjwregeer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants