New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept language wip #335

Merged
merged 99 commits into from Aug 28, 2017

Conversation

Projects
None yet
4 participants
@whiteroses
Copy link
Contributor

whiteroses commented Jul 25, 2017

Some remaining questions:

  1. How would you like me to do the imports in the tests? I can switch to module scope or fixtures, if either is your preference. (Module scope does seem to produce quite a lot of noise when one import is broken though, especially with tox and coverage, and I don't think fixtures are being used for imports in any WebOb tests at the moment.)
  2. Would you prefer I switch from test classes to test functions? I see benefits to using test classes (namespacing, so e.g. we wouldn't have to repeat the class name in every test function name, leaving more room to be more descriptive with the name of the test; classes can be folded in editors), but don't know the benefits of test functions over classes. Happy to learn, and switch if test functions are your preference.
  3. The fget in accept_language_property, like the existing fget in accept_property, creates a new instance every time. Is that necessary, and might it be good to cache the instance? Most people would probably not expect the header to be re-parsed every time they access request.accept_language.
  4. I did what I think you might want with the deprecation warnings, but please let me know if you'd like me to change or remove them. Also, calling warn() with DeprecationWarning and PendingDeprecationWarning is now ignored by default from Python version 2.7. I saw quite a few DeprecationWarnings in WebOb -- is that an issue?
  5. I think I can fix some of the issues in .quality() and .best_match(), so that they conform to the RFC in how they handle e.g. q=0 and *, but they would still be their own unique algorithms that are not specified or mentioned in the RFCs. (And I'm pretty sure best_match() does not implement the matching scheme used in Apache, at least not for Accept-Language?) So they are probably not methods you'd want to maintain in the long run -- is it worth fixing those issues when we would like people to move away from them so you can deprecate them?
  6. AcceptLanguageValidHeader, AcceptLanguageInvalidHeader, and AcceptLanguageNoHeader all inherit from AcceptLanguage, so they can all be identified as an AcceptLanguage header. Would it be worth making AcceptLanguage an abstract base class and specifying the properties and methods that can be expected in an AcceptLanguage (sub)class?
  7. What should I do about the modifier parameter in Accept.quality() and the default_quality parameter in NilAccept.quality()? (I explained the issues with the modifier parameter with comments in AcceptLanguageValidHeader.quality() and the default_quality parameter is not used at all in NilAccept.quality, and does not appear to match any corresponding parameter in Accept.quality() either.)

Please let me know if there is anything else you'd like me to change. Thanks!

@stevepiercy

This comment has been minimized.

Copy link

stevepiercy commented on src/webob/acceptparse.py in b62746a Jul 7, 2017

I did not know that there was :rfc: reST markup that generates links. That's cool! I learned something!

@stevepiercy

This comment has been minimized.

Copy link

stevepiercy commented on tests/test_acceptparse.py in b62746a Jul 7, 2017

Are the two consecutive underscores in this function name a typographical error? See next one as well.

This comment has been minimized.

Copy link
Owner

whiteroses replied Jul 7, 2017

It's just so we can tell it's testing .parse(), i.e. test_{method name}__{what we're testing}.

whiteroses added some commits Jul 7, 2017

Add AcceptLanguage._old_match.
This is a copy of the existing AcceptLanguage._match, with a name
reflecting that we are moving away from the existing criterion for a
match, and with additional documentation.
Add AcceptLanguageValidHeader.__iter__.
This is a copy of AcceptLanguage.__iter__, with tests (there were not
any before) and additional documentation.
Add AcceptLanguageValidHeader.__contains__.
This is mostly a copy of AcceptLanguage.__contains__.

Changes:
- It calls ._old_match() instead of ._match().
- It returns False instead of None when no matches found.
- Additional documentation and pending-deprecation warning.
- Tests changed to use @pytest.mark.parametrize.
Add AcceptLanguageValidHeader.best_match.
This is mostly a copy of `Accept.best_match`, but with the call to
``self._match()`` updated to ``self._old_match()``, and with additional
documentation.
Add AcceptLanguageValidHeader.quality().
This is mostly a copy of `Accept.quality`, with call to ``self._match``
updated to ``self._old_match``, and documentation added.
Fix docstring.
__iter__ returns an iterator, not a list.
Add AcceptLanguageValidHeader.__nonzero__/.__bool__ and test.
A class returns True for .__nonzero__/.__bool__ by default, but we write
a method so we can document it and have a consistent interface across
the three AcceptLanguage classes.
@stevepiercy
Copy link
Member

stevepiercy left a comment

I reviewed only the docstrings. This is really good stuff, especially considering the mental gymnastics needed to put it in writing. I've requested minor improvements. Thank you!

that match the language ranges in the header according to the Basic
Filtering matching scheme, in descending order of preference, together
with the qvalue of the range each tag matched.
tags in the `language_tags` argument and returns the ones that match

This comment has been minimized.

@stevepiercy

stevepiercy Aug 22, 2017

Member

The original version should have used double-backticks around "code-like" things, else they appear italicized in the rendered docs. Would you please amend this PR like so for all instances of language_tags:

``language_tags``

This comment has been minimized.

@whiteroses

whiteroses Aug 22, 2017

Contributor

I think I looked this up at the time and found in PEP 287:

Text enclosed in single backquotes is recognized as "interpreted text", whose interpretation is application-dependent. In the context of a Python docstring, the default interpretation of interpreted text is as Python identifiers. The text will be marked up with a hyperlink connected to the documentation for the identifier given.

There is an example following that paragraph that shows the use of single backquotes for method parameters. And earlier in the PEP, it mentions the use for double backquotes, but for "program I/O or code snippets":

Inline literals use double-backquotes to indicate program I/O or code snippets. No markup interpretation (including backslash-escape [] interpretation) is done within inline literals.

Does that matter, or should I go with double backquotes?

This comment has been minimized.

@stevepiercy

stevepiercy Aug 22, 2017

Member

I didn't know about that. Let's leave it as single backticks.

As an aside, I think I might have screwed up the markup in Pyramid docstrings along the way.

This comment has been minimized.

@whiteroses

whiteroses Aug 22, 2017

Contributor

I was going to say that I didn't think Sphinx did anything special with single backticks other than italics, but I just checked and, there looks to be something potentially useful in the note at the top of this page.

This comment has been minimized.

@stevepiercy

stevepiercy Aug 22, 2017

Member

Single backticks alone render in HTML markup as <cite>:

<cite>language_tags</cite>

As an aside, I wish Sphinx had used <code> or <pre> instead, because <cite> is semantically incorrect. ¯\_(ツ)_/¯

By default most web browsers use italics for styling <cite>. We could style the <cite> tags however we want, say as preformatted or monospace code, but italics are fine for now.

The page you linked uses a :role: preceding the single backticked item. You could try the :any: role preceding language_tags and see if magic happens or if the HTML markup changes to something not italicized. Else there might be a specific existing role under the Python domain in Sphinx. Finally we could define a custom role for parameters, if none of the existing ones are satisfactory, but I think that would be too much effort for very little or no benefit.

This comment has been minimized.

@whiteroses

whiteroses Aug 24, 2017

Contributor

I think the idea of the single backticks in Sphinx and the default_role is that we wouldn't need to specify a role — there's an example here, from one of the links in the note. So if they are cross-references, <cite> kind of makes sense? But I don't think that works for parameters like language_tags, so it doesn't help us there.

Italics kinda makes sense to me for parameters, although <cite> doesn't without cross-references. If you'd prefer them styled differently, maybe we can look into it after GSoC?

This comment has been minimized.

@stevepiercy

stevepiercy Aug 24, 2017

Member

I think we're pretty much in agreement. If doing this:

:any:`language_tags`

...does not generate a useful link or reference, then there's no point in pursuing it.

As far as styles go, I've opened up a new issue to resolve this part of the discussion.

This comment has been minimized.

@whiteroses

whiteroses Aug 24, 2017

Contributor

Ah, I see what you mean.

I tried

:any:`language_tags`

and got an InvocationError: 'any' reference target not found: language_tags. Skimming through the docs, it doesn't look like there's currently a way to cross-reference a parameter. I'll come back to your issue after the work on the headers is done and merged, to see if I can help (if it's not resolved by then). Thanks.

the header according to the matching scheme. The returned list is a
list of (language tag, qvalue) tuples, in descending order of qvalue;
if two or more tags have the same qvalue, they are returned in the same
order as the order in the header of the ranges they matched. If the

This comment has been minimized.

@stevepiercy

stevepiercy Aug 22, 2017

Member

This is less klunky: "order as that in the header"

tags in the `language_tags` argument and returns the ones that match
the header according to the matching scheme. The returned list is a
list of (language tag, qvalue) tuples, in descending order of qvalue;
if two or more tags have the same qvalue, they are returned in the same

This comment has been minimized.

@stevepiercy

stevepiercy Aug 22, 2017

Member

Let's break this into two sentences to be consistent with the next sentence. IOW, "...qvalue. If..."

order as the order in the header of the ranges they matched. If the
matched range is the same for two or more tags (i.e. their matched
ranges have the same qvalue and the same position in the header), then
they are returned in the same order as their order in the

This comment has been minimized.

@stevepiercy

stevepiercy Aug 22, 2017

Member

Shorten: "returned in the same order as that in the"

ranges have the same qvalue and the same position in the header), then
they are returned in the same order as their order in the
`language_tags` argument. (If `language_tags` is unordered, e.g. if it
is a set or a dict, then that order may not be reliable.)

This comment has been minimized.

@stevepiercy

stevepiercy Aug 22, 2017

Member

Remove parentheses, as this is really not a parenthetical statement, but a sentence of distinct importance.

the `language_tags` argument is used as tiebreaker. (If `language_tags`
is unordered, e.g. if it is a set or a dict, then that order may not be
reliable.)
and there is one or more ``*`` language range in the header, then:

This comment has been minimized.

@stevepiercy

stevepiercy Aug 22, 2017

Member

strip colon "then"

reliable.)
and there is one or more ``*`` language range in the header, then:
if any of the ``*`` language ranges have ``q=0``, the language tag
is filtered out. Otherwise, the language tag is considered a match.

This comment has been minimized.

@stevepiercy

stevepiercy Aug 22, 2017

Member

"is filtered out, else the language tag..."

@whiteroses

This comment has been minimized.

Copy link
Contributor

whiteroses commented Aug 22, 2017

@stevepiercy Other than the one on backquotes that I had a question on, I've made the changes you requested. Thanks!

@stevepiercy
Copy link
Member

stevepiercy left a comment

Thank you for your diligence and setting me right.

Simplify regexes.
Re-use OWS_re, and since we are assembling so many regexes by adding
strings, switch all regexes to use addition instead of triple quotes, so
that we don't have to remember to use re.VERBOSE when compiling.
@bertjwregeer
Copy link
Member

bertjwregeer left a comment

This is looking great, I've got a couple of nitpicky issues and a question or two! I am looking forward to merging this massive improvement to the Accept Language handling!

def _match(self, mask, item):

@classmethod
def python_value_to_header_str(cls, value):

This comment has been minimized.

@bertjwregeer

bertjwregeer Aug 23, 2017

Member

Add an underscore to the name. This way it is markedly non-public and makes it simpler to see so when reading the code.

return self._parsed

@classmethod
def parse(cls, value):

This comment has been minimized.

@bertjwregeer

bertjwregeer Aug 23, 2017

Member

AcceptLanguageValidHeader would basically move wholesale into AcceptLanguage if I understand @mmerickel's comments correctly, and then the sub-classes that are NoHeader or InvalidHeader would have .parse() and others raise ValueError or some other exception?

Do I understand this right @mmerickel?

'The behavior of AcceptLanguageValidHeader.best_match is '
'currently being maintained for backward compatibility, but it may'
' be deprecated in the future as it does not conform to the RFC.',
PendingDeprecationWarning,

This comment has been minimized.

@bertjwregeer

bertjwregeer Aug 23, 2017

Member

Just make this a DeprecationWarning.

not already matched by other ranges within the header are
unacceptable.
"""
assert not (default_tag is None and default is None), \

This comment has been minimized.

@bertjwregeer

bertjwregeer Aug 23, 2017

Member

Instead of using assert, use an if statement. assert statements may be turned off globally, and while I tend to believe that if you break it you've bought it, I'd prefer not to have asserts in anything put testing code.

Unless it is a condition that is not likely to ever happen, and is just a sanity check.

# whether it has been specified as not acceptable with a q=0 range in
# the header) or not (in which case we can just return the value).

assert default_range != '*', 'default_range cannot be *.'

This comment has been minimized.

@bertjwregeer

bertjwregeer Aug 23, 2017

Member

Same thing here, no assert please.

except TypeError: # default is not a callable
return default

def quality(self, offer, modifier=1):

This comment has been minimized.

@bertjwregeer

bertjwregeer Aug 23, 2017

Member

Return the quality of the given offer. Returns None if there is no match (not 0).

That's the docs on .quality() at the moment. Feel free to remove modifier since it is not documented what it does, or why it is set to 1 by default.

Set it to 1.

'The behavior of AcceptLanguageValidHeader.quality is'
'currently being maintained for backward compatibility, but it may'
' be deprecated in the future as it does not conform to the RFC.',
PendingDeprecationWarning,

This comment has been minimized.

@bertjwregeer

bertjwregeer Aug 23, 2017

Member

Make this a DeprecationWarning. I really don't think it is a good idea to keep this around for very long.


def fget(request):
"""Get an object representing the header in the request."""
return create_accept_language_header(

This comment has been minimized.

@bertjwregeer

bertjwregeer Aug 23, 2017

Member

Here we return a brand new AcceptLanguage* object each time, yet when we assign a new AcceptLanguage using fset we set it directly to the AcceptLanguage object.

Would it make sense here to check if it is a string, and if so then run the create_accept_language_header function, followed by setting the environ key to the newly created object? This way if a user calls request.accept_language twice they only have the overhead of creating the object the first time? If they modify request.environ directly it would no longer be an AcceptLanguage object and would correctly re-create it.


Another thing I noticed is that with your new __add__ functions since they don't modify the existing one but create a new one something like this is not possible, or am I mistaken:

request.accept_langauge += 'nl-NL;q=0.7'

Since that would return a new object and request.accept_language would be untouched. I can live with that... just not what I expect at the moment based upon the other properties.

This comment has been minimized.

@whiteroses

whiteroses Aug 23, 2017

Contributor

It doesn't assign a new AcceptLanguage when we fset though: fset only changes the header value in the environ, so that the next time we fget, a brand new AcceptLanguage* object is created using that new header value in the environ. This is the same as how it current works in accept_property.

I can't quite understand the second paragraph? In terms of overhead, we talked a while back about whether we should cache the object, as a new object was being created on every attribute read, and you said that it wasn't a concern at that point — is that related to what you're asking here?

With the __add__ functions,

request.accept_language += 'nl-NL;q=0.7'

is the same as

request.accept_language = request.accept_language + 'nl-NL;q=0.7'

so it works :)

This comment has been minimized.

@bertjwregeer

bertjwregeer Aug 24, 2017

Member

Excellent. No worries, not sure what I was thinking. We are good here!

@bertjwregeer

This comment has been minimized.

Copy link
Member

bertjwregeer commented Aug 23, 2017

With these changes, I hope that we have satisfactorily fixed the issue brought up by the original poster in #256

@whiteroses

This comment has been minimized.

Copy link
Contributor

whiteroses commented Aug 23, 2017

@bertjwregeer: I've made the changes you requested — thanks for reviewing! With #256, .lookup() covers that specific case; I can make a comment on the issue explaining this after the PR is merged. I'd also like to update some of the information in these issues, as my understanding of them has changed, and the information there may be useful to others (I was already finding these issues turning up near the top in search results when looking up various aspects of Accept* headers implementation!) We can still comment on closed issues, right?

@bertjwregeer

This comment has been minimized.

Copy link
Member

bertjwregeer commented Aug 23, 2017

Yes, you may definitely comment on closed issues!

Were you able to make the changes that @mmerickel suggested regarding the duck typing? I think that is important before merging this in!

@whiteroses

This comment has been minimized.

Copy link
Contributor

whiteroses commented Aug 24, 2017

@bertjwregeer Have made the changes, please see here.

@bertjwregeer
Copy link
Member

bertjwregeer left a comment

Unless @mmerickel has any other suggestions or changes he would like to see, I am ready to merge this!

Thank you for your hard work!

.. autoclass:: AcceptLanguageInvalidHeader
:members: header_value, parsed, __init__, __add__, __contains__, __iter__,
__radd__, __str__, parse, basic_filtering, best_match, lookup,
quality

This comment has been minimized.

@mmerickel

mmerickel Aug 28, 2017

Member

Can we remove the subclasses from the public api now?

This comment has been minimized.

@whiteroses

whiteroses Aug 28, 2017

Contributor

We can if we combine the docstrings for all the subclasses for each method and put them all into AcceptLanguage. This would imo make some already complicated docstrings even more complicated, adding another layer to the branching. And as I mentioned before, the instances would still i.d. as the subclasses (I guess we could change the __repr__ to indicate that it is a subclass of AcceptLanguage, so people would know to look there?) If you want me to go ahead and make the change, I will — it's not trivial, and will take some time and thought on how to combine the docstrings from the three subclasses for each method. (I would also have to do the same for the other three headers.)

Would it be possible to merge this so I can push changes for this and the other three headers before the deadline tomorrow to a more appropriately-named PR, or should I push the changes for all four headers to this one?

This comment has been minimized.

@mmerickel

mmerickel Aug 28, 2017

Member

It's fine if we want to merge this now. I'm happy with that.

Part of the idea of making the subclasses "hidden" is not that people don't know about them but that they don't care. I'm more worried about the docs the user sees than the actual objects they see in the repl. With that in mind we'd want to make the docstrings focus on valid headers with just a sentence or two in each that explains what happens with invalid/missing data. I can't imagine that will make the docstrings that much more complex or harder to parse but I could be wrong!

This comment has been minimized.

@whiteroses

whiteroses Aug 28, 2017

Contributor

I was worrying whether it would be clear enough that people should look at the docs for AcceptLanguage if the objects identify as the subclasses, but I understand what you are looking for, and will try to combine the docstrings and document one api under the base class after I've finished wrapping up the four headers and all the necessary submissions for gsoc. Could you please merge this PR if everything is ok, so I can open a new PR and push the other changes? (Sorry about the last-minute commits, they are minor changes to fix small mistakes.) Thanks!

This comment has been minimized.

@mmerickel

mmerickel Aug 28, 2017

Member

Thanks again @whiteroses I'm happy with all of this.

This comment has been minimized.

@bertjwregeer

bertjwregeer Aug 28, 2017

Member

Alright, any future work can be done in a different PR.

@bertjwregeer bertjwregeer merged commit 897da0e into Pylons:master Aug 28, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@bertjwregeer

This comment has been minimized.

Copy link
Member

bertjwregeer commented Aug 28, 2017

Thank you very much @whiteroses, any further changes can be made on a new PR against master!

@whiteroses

This comment has been minimized.

Copy link
Contributor

whiteroses commented Aug 28, 2017

Thanks @bertjwregeer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment