Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check that the alphabet argument to text() is a sequence of unicode characters #1329

Closed
Zac-HD opened this issue Jun 14, 2018 · 5 comments
Closed
Labels
enhancement it's not broken, but we want it to be better

Comments

@Zac-HD
Copy link
Member

Zac-HD commented Jun 14, 2018

The alphabet argument to text() may be None, a strategy for generating unicode characters (typically the characters() strategy), or a sequence of unicode characters (possibly provided as a string).

>>> # This currently "works", and that's terrible.
>>> text(alphabet={1, lambda: None, ('ab', 'cd')}).example()
"('ab', 'cd')('ab', 'cd')111<function <lambda> at 0x00000073F8EE0730>"

However, there is very little validation of the sequence-of-characters form. We should deprecate use of non-sequence collections (as for sampled_from, hash randomisation breaks reproducibility), deprecate non-length-one elements (ie alphabet=['a', 'bc'] is bad), and error on non-string types instead of coercing them to text. Python2 str should emit a deprecation warning, since this is probably a common misuse and usually works.

A fixed implementation might look like

else:
    if not isinstance(alphabet, Sequence):
        note_deprecation(...)
        alphabet = list(alphabet)
    # Warning rather than error on Py2-str for now, to avoid a breaking change
    non_string = [c for c in alphabet if not isinstance(c, string_types)]
    if non_string:
        raise InvalidArgument(...)
    if PY2:
        non_unicode = [c for c in alphabet if not isinstance(c, text_type)]
        ...  # using note_deprecation
    not_one_char = [...]; if not_one_char: note_deprecation(...)
    char_strategy = sampled_from(alphabet)

Note that all error messages should show the repr of each 'bad' value, explain why it is bad, and usually suggest what the user should do instead. Filling that out, plus tests, would be a fantastic contribution.

If you'd like to work on this, let me know - I can lend a hand and also ensure that only one person works on any issue at a given time 😄

@Zac-HD Zac-HD added enhancement it's not broken, but we want it to be better good first issue labels Jun 14, 2018
@sushobhit27
Copy link
Contributor

@Zac-HD I would like to work on this issue. I guess I understand most of the things, except
"non-length-one elements (ie alphabet=['a', 'bc'] is bad),"
how ['a', 'bc'] is non-length elements?
Could you pls explain more.

@Zac-HD
Copy link
Member Author

Zac-HD commented Jun 15, 2018

Because 'bc' is a string of length two, rather than a single unicode character - and therefore it can violate the property that the length of a generated string will always be within the specified bounds. Eg:

>>> text(['abc'], min_size=2, max_size=2).example()
'abcabc'
>>> len(_)
6

@sushobhit27
Copy link
Contributor

@Zac-HD got it, so basically alphabet must be sequence of single characters.

@sushobhit27
Copy link
Contributor

@Zac-HD
I was working on this and got some queries regarding this issue:

if not isinstance(alphabet, Sequence):
        note_deprecation(...)
        alphabet = list(alphabet)

If I assume that above check is for catching things like like
alphabet=None or alphabet=123, alphabet=True etc.
Then, alphabet = list(alphabet) is gonna give TypeError.
So if someone is passing non iterable, shouldn't we first raise InvalidArgument instead of note_deprecation?

I have more or less created test cases for all scenario, except

 if PY2:
            non_unicode = [c for c in alphabet if not isinstance(c, text_type)]
            if non_unicode:
                note_deprecation

Shall I have conditional test case for PY2?
P.S: I could not find note_deprecation test cases in test/*, am I missing something?

@Zac-HD
Copy link
Member Author

Zac-HD commented Jun 18, 2018

The main purpose is actually to catch unordered collections, like alphabet=set('abc') - the problem being that because iteration order for e.g. sets is random, Hypothesis won't be able to reproduce its results or even shrink to the same value next time. Note: the conversion to list should be outside the conditional block. We'll then get a warning and the TypeError if you pass a non-collection, but that's OK as it's never worked.

You can put a Python2-specific test in tests/py2/ - this is almost always better than a conditional test.

To test a deprecated thing, write a test that passes but ignores the deprecation, then add the @checks_deprecated_behaviour decorator. The note_deprecation function only issues deprecations, so it's quite useless in tests (though short and worth reading). with pytest.warns(HypothesisDeprecationWarning) might help though 😄

sushobhit27 added a commit to sushobhit27/hypothesis-python that referenced this issue Jun 18, 2018
sushobhit27 added a commit to sushobhit27/hypothesis-python that referenced this issue Jun 18, 2018
sushobhit27 added a commit to sushobhit27/hypothesis-python that referenced this issue Jun 20, 2018
Close HypothesisWorks#1329

fix lint issues

refactor code as per review comments.
add RELEASE.rst
sushobhit27 added a commit to sushobhit27/hypothesis-python that referenced this issue Jun 20, 2018
Close HypothesisWorks#1329

fix lint issues

refactor code as per review comments.
add RELEASE.rst

fix check-formats travis job.
sushobhit27 added a commit to sushobhit27/hypothesis-python that referenced this issue Jun 20, 2018
Close HypothesisWorks#1329

fix lint issues

refactor code as per review comments.
add RELEASE.rst

fix check-formats travis job.

fix lint issues after refactoring.
sushobhit27 added a commit to sushobhit27/hypothesis-python that referenced this issue Jun 20, 2018
Close HypothesisWorks#1329

fix lint issues

refactor code as per review comments.
add RELEASE.rst

fix check-format issues.
sushobhit27 added a commit to sushobhit27/hypothesis-python that referenced this issue Jun 24, 2018
Close HypothesisWorks#1329

fix lint issues

refactor code as per review comments.
add RELEASE.rst

fix check-format issues.
sushobhit27 added a commit to sushobhit27/hypothesis-python that referenced this issue Jun 26, 2018
Close HypothesisWorks#1329

fix lint issues

refactor code as per review comments.
add RELEASE.rst

fix failing test cases due to unicode warning.

fix mypy error.
sushobhit27 added a commit to sushobhit27/hypothesis-python that referenced this issue Jun 26, 2018
Close HypothesisWorks#1329

fix lint issues

refactor code as per review comments.
add RELEASE.rst

fix failing test cases due to unicode warning.

fix mypy error.
sushobhit27 added a commit to sushobhit27/hypothesis-python that referenced this issue Jun 28, 2018
sushobhit27 added a commit to sushobhit27/hypothesis-python that referenced this issue Jul 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement it's not broken, but we want it to be better
Projects
None yet
Development

No branches or pull requests

2 participants