Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitelist characters for characters strategy #668

Closed
maximkulkin opened this issue May 31, 2017 · 6 comments · Fixed by #726
Closed

Whitelist characters for characters strategy #668

maximkulkin opened this issue May 31, 2017 · 6 comments · Fixed by #726
Labels
enhancement it's not broken, but we want it to be better

Comments

@maximkulkin
Copy link
Contributor

Hypothesis has characters() strategy that is unicode aware and allows to specify which unicode categories to get characters from. It has white list of categories and black list of categories. Also, it has blacklist_characters parameter which allows to exclude certain characters.

What I would like to have is a whitelist_characters - add extra characters in addition to given categories to use during generation. E.g. characters(whitelist_categories=['Nd', 'Ll', 'Lu'], whitelist_characters='_')

@Zac-HD Zac-HD added enhancement it's not broken, but we want it to be better help-wanted labels Jun 1, 2017
@moreati
Copy link
Contributor

moreati commented Jul 10, 2017

I'm taking a stab at this

@moreati
Copy link
Contributor

moreati commented Jul 10, 2017

Clarification question:

  1. Should character in whitelisted_characters also be in addition to the min/max codepoint interval?
    i.e. should characters(min_codepoint=ord('0'), max_codepoint=ord('9'), whitelist_chars='_') generate '_'?

@Zac-HD
Copy link
Member

Zac-HD commented Jul 11, 2017

  • When 'the right thing to do' is not obvious, clear documentation is even more important. Whatever you do, make sure the characters() docs are concise and accurate.
  • For Hypothesis, it's better to generate surprising things than silently generate less than expected. I'd therefore generate all characters in the interval or whitelist categories, excluding those in blacklist categories.

@maximkulkin
Copy link
Contributor Author

@moreati Well, min/max_codepoint is a bit tricky here. Obviously, those params relate to codepoints you can not specify explicitly (e.g. characters of particular categories). I would say the answer is "Yes, in your example it should generate '_'".

@moreati
Copy link
Contributor

moreati commented Jul 17, 2017

Some questions regarding hypothesis.internal.charmap

  1. _union_interval_lists(x, y) seems to return the wrong result if and interval of y falls entirely within an interval of x, e.g.

     >>> _union_interval_lists([(0,10)], [(1,2), (5,17)])
     ((0, 2), (5, 17))
    

    Shouldn't that be ((0, 17),)? Is there something about Unicode category intervals that means this doesn't matter?

  2. None of the functions have a docstring. Is this intentional, or would such a patch be welcome?

@DRMacIver
Copy link
Member

_union_interval_lists(x, y) seems to return the wrong result if and interval of y falls entirely within an interval of x, e.g.
Shouldn't that be ((0, 17),)? Is there something about Unicode category intervals that means this doesn't matter?

I don't remember that code very well at this point, but no that sure looks like a bug to me.

None of the functions have a docstring. Is this intentional, or would such a patch be welcome?

In general any lack of internal documentation or commenting is not intentional and improving it is always welcome. :-)

moreati added a commit to moreati/hypothesis-python that referenced this issue Jul 18, 2017
moreati added a commit to moreati/hypothesis-python that referenced this issue Aug 13, 2017
moreati added a commit to moreati/hypothesis-python that referenced this issue Aug 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement it's not broken, but we want it to be better
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants