Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strategy for strings matching a regex #662

Closed
Zac-HD opened this issue May 30, 2017 · 20 comments
Closed

Strategy for strings matching a regex #662

Zac-HD opened this issue May 30, 2017 · 20 comments
Assignees
Labels
new-feature entirely novel capabilities or strategies

Comments

@Zac-HD
Copy link
Member

Zac-HD commented May 30, 2017

This seems like a fairly common desire, with @jreinhardt's pull #393 and @maximkulkin's hypothesis-regex package (which I would like to merge into upstream, with some minor changes).

Some design ideas:

  • calling the strategy regex risks confusion with a strategy for generating regex patterns, so a more specific name is probably in order. Actually, I'd integrate it into text() as a new keyword: matching=r'regex'.

  • we should pass through **kwargs to characters(), so that (eg) you can limit the output to ASCII. Using regex flags, as per Implement support for regex flags maximkulkin/hypothesis-regex#3, is a better solution.

@Zac-HD Zac-HD added the enhancement it's not broken, but we want it to be better label May 30, 2017
@mulkieran
Copy link
Contributor

@Zac-HD If hypothesis-regex package isn't going to go in real soon, I would like to ask @maximkulkin to prepare a PR against the list of external strategies in the docs. What do you think?

@Zac-HD
Copy link
Member Author

Zac-HD commented May 30, 2017

Sounds good to me! I think as a matter of principle we'd welcome such a pull regardless, and if/when we add an upstream version just note that next to the external link - to give appropriate credit and show that external strategies can be merged 😄

@mulkieran
Copy link
Contributor

Sounds good!

@maximkulkin
Copy link
Contributor

Guys, I have created #664. Let me know if I understood you correctly.

As of issues @Zac-HD mentioned, can you open them as hypothesis-regex issues, so that we can have discussions there. I have my own thoughts on those and I would like discussions to be tracked in proper places.

@Zac-HD
Copy link
Member Author

Zac-HD commented May 31, 2017

IMO the name is not at all problematic in an external package. When merging upstream, I think the cleanest way to implement this is as a keyword arg like text(matching=r'some regex') - conceptually it's "just" a really expressive constraint on the text strategy. This would avoid the naming problem entirely, at the cost of slightly more annoying implementation but a nicer API.

I've opened issues for everything I would want addressed before merging upstream, aside from integration with text() - obviously you don't have to address them, but if you do that would make integration easier for whoever does the merge (and you'll end up on the contributors list even if that's one of us, of course).

@DRMacIver
Copy link
Member

I'm not totally convinced integration with text is the way to go, mostly because I think combining it with custom characters() strategy is going to be a pain, and in general we'd like to avoid interacting arguments.

@DRMacIver
Copy link
Member

BTW I am definitely keen to get this included in core Hypothesis, nitpicking about API specifics aside. :-)

@Zac-HD
Copy link
Member Author

Zac-HD commented May 31, 2017

We came to basically the same conclusion in maximkulkin/hypothesis-regex#1, though I still think calling it regex could be misleading. Maybe text_matching(regex)?

@jreinhardt
Copy link

Sorry that I was inactive for so long. I will have some time on the weekend, I will process and incorporate all the feedback then.

@Zac-HD
Copy link
Member Author

Zac-HD commented Jun 1, 2017

@jreinhardt, FYI we're probably going to merge the hypothesis-regex package upstream instead of #393. You're welcome to keep working on it of course, but as an alternative it would be great to get your feedback on the issues and code in that repo.

@Zac-HD
Copy link
Member Author

Zac-HD commented Jun 2, 2017

For @maximkulkin and anyone else playing with regex, we can construct an inefficient-but-effective strategy simply by drawing printable characters and filtering out anything that can't be compiled as a regex.

def try_compile(pattern):
    try:
        return re.compile(pattern)
    except re.error:
        return None
strat = st.text(alphabet=string.printable).map(try_compile).filter(bool)

This should at least be noted in the documentation, if we don't just build a proper strategy out of it (mostly supporting flags and adding docs, I think).

@maximkulkin
Copy link
Contributor

@Zac-HD Looks like your example constructs a "valid regex", not a "string that matches regex" (which is the goal of our efforts).

@Zac-HD
Copy link
Member Author

Zac-HD commented Jun 2, 2017

Yep, the idea is that you can use this pattern strategy to test the text-matching-regex strategy, like:

# Don't use this one; better test in a later comment
@given(pattern_strategy.filter(lambda p: p.match('')))
def test_regex_strategy_minimisation(pattern):
    assert find(regex(pattern), lambda t: True) == ''

@given(pattern_strategy)
def test_regex_strategy_invariant(pattern):
    @given(regex(pattern))
    def inner_test(ex):
        assert re.match(pattern, ex)
    inner()

Bump up the max_examples setting and leave them running for a while, and you should discover the cases that the regex strategy doesn't cover.

@maximkulkin
Copy link
Contributor

I do not get this "minimisation" requirement (and, yes, I know what minimisation/shrinking means, I have used QuickCheck in Haskell before). Why empty string is a special case? Why find() always need to return empty string if it can? find() is not even documented. My understanding is that it just draws data until it finds one that is matched by predicate.

@Zac-HD
Copy link
Member Author

Zac-HD commented Jun 3, 2017

find searches for and returns the smallest (shortlex) example matching the predicate, not just the first match. To get any example matching the predicate, strategy.filter(predicate).example() skips the minimisation step which can be quite expensive. Here are the find() docs, which I agree should be better - new issue #674.

The minimisation requirement is simply that the regex strategy should shrink towards the minimal text that matches it's pattern. Which actually gives me a better test:

@given(pattern_strategy)
def test_regex_strategy_minimisation(r):
    assert find(regex(r), lambda t: True) == find(st.text(), r.match)

I would actually expect this to fail as the implementation would be a branch-reordering nightmare, but it does describe the ideal behaviour.

There's nothing special about the empty string except that it's easy to tell that there's no shorter possible match, and I hadn't thought about the property enough to realise that we do in fact have a nice tool for finding the shortest match for a given regex.

@Zac-HD Zac-HD added new-feature entirely novel capabilities or strategies and removed enhancement it's not broken, but we want it to be better labels Jun 10, 2017
@Zac-HD
Copy link
Member Author

Zac-HD commented Jun 24, 2017

@maximkulkin, we now have a third (fourth?) pull about a regex strategy.

  • Are you happy for hypothesis-regex to be merged into upstream Hypothesis?
  • If so, can you make a pull request in the next few days, or would you prefer that someone (probably me) makes the pull? (You'll be credited regardless, of course)

@DRMacIver
Copy link
Member

BTW @maximkulkin, if and when we merge hypothesis-regex upstream, you'd be very welcome on the maintainers team if you wanted to join.

@maximkulkin
Copy link
Contributor

Guys, I will work on PR

@maximkulkin
Copy link
Contributor

PR is out: #708

@Zac-HD
Copy link
Member Author

Zac-HD commented Aug 23, 2017

Closed by #792 - thanks to everyone who contributed to this long process.

@Zac-HD Zac-HD closed this as completed Aug 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-feature entirely novel capabilities or strategies
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants