-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strategy for strings matching a regex #662
Comments
|
@Zac-HD If hypothesis-regex package isn't going to go in real soon, I would like to ask @maximkulkin to prepare a PR against the list of external strategies in the docs. What do you think? |
|
Sounds good to me! I think as a matter of principle we'd welcome such a pull regardless, and if/when we add an upstream version just note that next to the external link - to give appropriate credit and show that external strategies can be merged 😄 |
|
Sounds good! |
|
IMO the name is not at all problematic in an external package. When merging upstream, I think the cleanest way to implement this is as a keyword arg like I've opened issues for everything I would want addressed before merging upstream, aside from integration with |
|
I'm not totally convinced integration with text is the way to go, mostly because I think combining it with custom characters() strategy is going to be a pain, and in general we'd like to avoid interacting arguments. |
|
BTW I am definitely keen to get this included in core Hypothesis, nitpicking about API specifics aside. :-) |
|
We came to basically the same conclusion in maximkulkin/hypothesis-regex#1, though I still think calling it |
|
Sorry that I was inactive for so long. I will have some time on the weekend, I will process and incorporate all the feedback then. |
|
@jreinhardt, FYI we're probably going to merge the hypothesis-regex package upstream instead of #393. You're welcome to keep working on it of course, but as an alternative it would be great to get your feedback on the issues and code in that repo. |
|
For @maximkulkin and anyone else playing with regex, we can construct an inefficient-but-effective strategy simply by drawing printable characters and filtering out anything that can't be compiled as a regex. def try_compile(pattern):
try:
return re.compile(pattern)
except re.error:
return None
strat = st.text(alphabet=string.printable).map(try_compile).filter(bool)This should at least be noted in the documentation, if we don't just build a proper strategy out of it (mostly supporting flags and adding docs, I think). |
|
@Zac-HD Looks like your example constructs a "valid regex", not a "string that matches regex" (which is the goal of our efforts). |
|
Yep, the idea is that you can use this pattern strategy to test the text-matching-regex strategy, like: # Don't use this one; better test in a later comment
@given(pattern_strategy.filter(lambda p: p.match('')))
def test_regex_strategy_minimisation(pattern):
assert find(regex(pattern), lambda t: True) == ''
@given(pattern_strategy)
def test_regex_strategy_invariant(pattern):
@given(regex(pattern))
def inner_test(ex):
assert re.match(pattern, ex)
inner()Bump up the max_examples setting and leave them running for a while, and you should discover the cases that the |
|
I do not get this "minimisation" requirement (and, yes, I know what minimisation/shrinking means, I have used QuickCheck in Haskell before). Why empty string is a special case? Why |
|
The minimisation requirement is simply that the regex strategy should shrink towards the minimal text that matches it's pattern. Which actually gives me a better test: @given(pattern_strategy)
def test_regex_strategy_minimisation(r):
assert find(regex(r), lambda t: True) == find(st.text(), r.match)I would actually expect this to fail as the implementation would be a branch-reordering nightmare, but it does describe the ideal behaviour. There's nothing special about the empty string except that it's easy to tell that there's no shorter possible match, and I hadn't thought about the property enough to realise that we do in fact have a nice tool for finding the shortest match for a given regex. |
|
@maximkulkin, we now have a third (fourth?) pull about a regex strategy.
|
|
BTW @maximkulkin, if and when we merge |
|
Guys, I will work on PR |
|
PR is out: #708 |
|
Closed by #792 - thanks to everyone who contributed to this long process. |
This seems like a fairly common desire, with @jreinhardt's pull #393 and @maximkulkin's hypothesis-regex package (which I would like to merge into upstream, with some minor changes).
Some design ideas:
calling the strategyActually, I'd integrate it intoregexrisks confusion with a strategy for generating regex patterns, so a more specific name is probably in order.text()as a new keyword:matching=r'regex'.we should pass throughUsing regex flags, as per Implement support for regex flags maximkulkin/hypothesis-regex#3, is a better solution.**kwargstocharacters(), so that (eg) you can limit the output to ASCII.The text was updated successfully, but these errors were encountered: