Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lookaround does not work when it influences other part of pattern #63

Open
Pigeon-Barry opened this issue Feb 12, 2021 · 2 comments · May be fixed by #64
Open

Lookaround does not work when it influences other part of pattern #63

Pigeon-Barry opened this issue Feb 12, 2021 · 2 comments · May be fixed by #64
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@Pigeon-Barry
Copy link

Pigeon-Barry commented Feb 12, 2021

There is a general issue with lookaround patterns,

Whenever lookaround pattern part should influence another part of pattern (values that can be produced in another part of pattern) - it does not work correctly.

For example:

(?!B)[AB]

In this pattern lookahead part (?!B) influences [AB] part by limiting number of valid values of [AB] part.
This should be supported.

Original request text:

**Describe the bug** A clear and concise description of what the bug is.

When using

new RgxGen("^((?!(BG|GB|KN|NK|NT|TN|ZZ)|(D|F|I|Q|U|V)[A-Z]|[A-Z](D|F|I|O|Q|U|V))[A-Z]{2})[0-9]{6}[A-D]?$").generate();

the following String is generated '">MO281733' which does not conform to the regular expression.

To Reproduce
Steps to reproduce the behavior:

  1. With regex pattern '^((?!(BG|GB|KN|NK|NT|TN|ZZ)|(D|F|I|Q|U|V)[A-Z]|[A-Z](D|F|I|O|Q|U|V))[A-Z]{2})[0-9]{6}[A-D]?$'

  2. Use code/API - Code

  3. See error
    Invalid String is returned '">MO281733'

Expected behavior
A clear and concise description of what you expected to happen.

I expect a string such as 'AA222222D' to be return as this is valid against the regex however this is not the case

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: [e.g. iOS] Windows 10

  • JDK/JRE version
    java version "14.0.1" 2020-04-14
    Java(TM) SE Runtime Environment (build 14.0.1+7)
    Java HotSpot(TM) 64-Bit Server VM (build 14.0.1+7, mixed mode, sharing)

  • RgxGen Version or commit id

	        <dependency>
			<groupId>com.github.curious-odd-man</groupId>
			<artifactId>rgxgen</artifactId>
			<version>1.3</version>
		</dependency>

Additional context
Add any other context about the problem here.

@Pigeon-Barry Pigeon-Barry added the bug Something isn't working label Feb 12, 2021
@curious-odd-man curious-odd-man added this to the Version 1.4 milestone Feb 12, 2021
curious-odd-man added a commit that referenced this issue Feb 13, 2021
Regression tests created.
@curious-odd-man curious-odd-man linked a pull request Feb 13, 2021 that will close this issue
@curious-odd-man curious-odd-man added the enhancement New feature or request label Feb 13, 2021
curious-odd-man added a commit that referenced this issue Feb 13, 2021
@curious-odd-man curious-odd-man changed the title ^((?!(BG|GB|KN|NK|NT|TN|ZZ)|(D|F|I|Q|U|V)[A-Z]|[A-Z](D|F|I|O|Q|U|V))[A-Z]{2})[0-9]{6}[A-D]?$ Generates an invalid String Lookaround does not work when it influences other part of pattern Feb 13, 2021
curious-odd-man added a commit that referenced this issue Feb 16, 2021
Ideas for the solution
@curious-odd-man
Copy link
Owner

I can partially solve this issue, while throw an exception in cases where I cannot handle lookaround.

My idea is to handle those patterns, where lookaround pattern matches text that is shorter or equal in length for the part which is influenced by this lookaround.
For example:
(?!BG)[A-Z]{2} the part under negative lookahead is 2 char long and the part that is influenced - [A-Z]{2} is 2 chars long. I can handle it by retrying [A-Z]{2} part unless it satisfies the restriction.
The same way I could handle (?!B)[A-Z]{2} or (?!.X)[A-Z]{2}.

Funny enough that I could also handle this pattern
image
Though that kind of pattern could be hard to handle
(?!X+)[A-Z]{2}[CDE]

@curious-odd-man
Copy link
Owner

Or, really I could go the easy way first - generate text and then verify that it matches with all lookaround things. if not - regenerate, if yes - then give it away to user. Brute-force, but easiest to implement. I can think about performance improvements for special cases later.

curious-odd-man added a commit that referenced this issue Aug 21, 2021
First implementation
curious-odd-man added a commit that referenced this issue Aug 21, 2021
This kind of almost works
curious-odd-man added a commit that referenced this issue Aug 29, 2021
Fixing some tests.
curious-odd-man added a commit that referenced this issue Aug 29, 2021
Fixing some tests.
curious-odd-man added a commit that referenced this issue Aug 29, 2021
Fixing some tests.
curious-odd-man added a commit that referenced this issue Aug 29, 2021
@curious-odd-man curious-odd-man removed this from the Version 1.5 milestone Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants