Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce a sample that does not match a regex #36

Closed
spacether opened this issue Aug 12, 2020 · 7 comments · Fixed by #38
Closed

Produce a sample that does not match a regex #36

spacether opened this issue Aug 12, 2020 · 7 comments · Fixed by #38
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@spacether
Copy link

spacether commented Aug 12, 2020

Thank you so much for this package!
It looks great and I am looking forward to seeing if I can use it to generate regex samples for api documentation and test generation.

Is your feature request related to a problem? Please describe.
One can generate strings that do match a regex pattern but can it generate a string that does not match a pattern?
Your code lets us test a positive use case. If you provide a sample_does_not_match_pattern functionality then we can test the negative use case too.

Describe the solution you'd like
Could a method or function be created to produce a sample that does not match a regex pattern?

Describe alternatives you've considered
Randomly generating strings until a non matching sample is found.
But that is non-deterministic and is brute forcing the problem.

Additional context
I am considering using this library in https://github.com/OpenAPITools/openapi-generator

@spacether spacether added the enhancement New feature or request label Aug 12, 2020
@curious-odd-man
Copy link
Owner

curious-odd-man commented Aug 12, 2020

Hello @spacether .

Thank you for your ticket.
Let me describe what solution I may be able to provide and please let me know if that solution will satisfy you.

Solution:

Suppose we have a regex: abc[def] - a simple one.

I would then separate this regex into 2 parts: asd and [def].

To produce string that does not match the initial pattern I will;

  1. Select 1 to N parts (N is total number of parts - 2 in sample case)
  2. Invert each selected part. For example:
    1. For abc part I will produce random 3 character long string, that is not abc
    2. For [def] part I will change to [^def]
  3. Generate a regex mixing "normal" parts and "inverted" parts.

This approach should produce strings that will partially match the regex.
I will also allow user to specify fraction of parts that should be inverted.
For example if user specifies 1 - then all parts will be inverted. If user will specify 0 - then only 1 part will be inverted, and if user will specify 0.5 - then half of the parts will be inverted.

I have not yet deep dived into this approach, so there might be some underwater stones, but generally it looks promising for me.

Is that approach clear for you? Do you think that will suite your needs?

@curious-odd-man curious-odd-man added this to the Version 1.2 milestone Aug 12, 2020
@spacether
Copy link
Author

spacether commented Aug 12, 2020

Yup, I think that that approach will suit my needs.
For now your library generating samples that match is the most important use case for us.

These not matching negative use cases are a nice to have for the future.
The most important need that we would have both for the matching and non matching samples is that wen we get the first sample that does or doesn't match, that it is constant.

If our pattern is:
^def
and we get:
first_match = def
first_non_match = fed
as long as those samples will always be those values that works for us.
We need this determinism because the samples are used to generate documentation and tests.
We don't want values changing with every sample generation invocation.

Will your solution always give the same result for the first non matching sample?
The random 3 character long string does not sound deterministic.

@curious-odd-man
Copy link
Owner

In current implementation you can provide initialized Random to get same generated random values: public String generate(Random random);
In not matching generation I will use same approach - if identical Random will be provided - identical results will be returned.

You can refer to this test case.

Random 3 character long string will be deterministic because randomness of this string will be defined by some of the pseudo-random values generators, e.g. Random, which is deterministic. Refer to this post for explanation.

@spacether
Copy link
Author

That works great; thank you!

curious-odd-man added a commit that referenced this issue Aug 23, 2020
First steps.
Disabled performance tests for now.
curious-odd-man added a commit that referenced this issue Aug 23, 2020
Step 2. visitor and tests.
@curious-odd-man curious-odd-man linked a pull request Aug 23, 2020 that will close this issue
curious-odd-man added a commit that referenced this issue Aug 23, 2020
Step 3. Implemented some of nodes and tests.
curious-odd-man added a commit that referenced this issue Aug 23, 2020
Step 4. In reality i need to only implement terminal nodes.
Container nodes should work, because they will use terminal nodes inside.
curious-odd-man added a commit that referenced this issue Aug 23, 2020
Step 5. More tests and implementations.
curious-odd-man added a commit that referenced this issue Aug 25, 2020
Refactoring.
Logging really is not needed clean up to make library smaller.
curious-odd-man added a commit that referenced this issue Aug 25, 2020
Refactoring.
Analysis fixes.
curious-odd-man added a commit that referenced this issue Aug 25, 2020
Implementation of Choice and other final steps.
curious-odd-man added a commit that referenced this issue Aug 25, 2020
Unwrapped repeatable tests.
curious-odd-man added a commit that referenced this issue Aug 25, 2020
curious-odd-man added a commit that referenced this issue Aug 25, 2020
curious-odd-man added a commit that referenced this issue Aug 25, 2020
curious-odd-man added a commit that referenced this issue Aug 25, 2020
curious-odd-man added a commit that referenced this issue Aug 25, 2020
Fixed complex expressions groups generation.
curious-odd-man added a commit that referenced this issue Aug 25, 2020
Fix for a coverage of tests.
curious-odd-man added a commit that referenced this issue Aug 25, 2020
@curious-odd-man
Copy link
Owner

curious-odd-man commented Aug 25, 2020

Hello. I've completed implementation for the feature requested.
Please refer to README for usage.
It is available as a snapshot release.

If something is not working or any additional features are needed - please open another ticket.

I did not implement any customization for now - from your description i understand it is not very important and this will require big overhead also.

@spacether
Copy link
Author

Thank you! Openapi generator is now using your package to generate matching regex examples for the python-experimental generator OpenAPITools/openapi-generator#7157

@curious-odd-man
Copy link
Owner

That is great! Thank you. It's always nice to know that someone uses your work 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants