Skip to content

Commit

Permalink
Merge pull request #69 from curious-odd-man/dev
Browse files Browse the repository at this point in the history
release 1.4
  • Loading branch information
curious-odd-man committed Aug 12, 2022
2 parents 55f4f2a + dfd27c9 commit a32ad07
Show file tree
Hide file tree
Showing 16 changed files with 150 additions and 76 deletions.
102 changes: 68 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ This is a java library that, given a regex pattern, allows to:
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.github.curious-odd-man/rgxgen/badge.svg?style=plastic)](https://search.maven.org/search?q=a:rgxgen)
[![javadoc](https://javadoc.io/badge2/com.github.curious-odd-man/rgxgen/javadoc.svg?style=plastic)](https://javadoc.io/doc/com.github.curious-odd-man/rgxgen)


Build status:

| Latest Release | Latest snapshot |
Expand All @@ -31,7 +32,7 @@ Build status:

## Try it now!!!

Follow the link to Online IDE with created project: [JDoodle](https://www.jdoodle.com/a/2fPm).
Follow the link to Online IDE with created project: [JDoodle](https://www.jdoodle.com/a/2Q6T).
Enter your pattern and see the results.

## Usage
Expand All @@ -43,16 +44,42 @@ Enter your pattern and see the results.
<dependency>
<groupId>com.github.curious-odd-man</groupId>
<artifactId>rgxgen</artifactId>
<version>1.3</version>
<version>1.4</version>
</dependency>
```

#### The Latest SNAPSHOT:
```xml
<project>
<repositories>
<repository>
<id>snapshots-repository</id>
<url>https://oss.sonatype.org/content/repositories/snapshots/</url>
</repository>
</repositories>

<!-- .... -->

<dependency>
<groupId>com.github.curious-odd-man</groupId>
<artifactId>rgxgen</artifactId>
<version>1.5-SNAPSHOT</version>
</dependency>
</project>
```

Changes in snapshot:

None at the moment.

---
### Code:
```java
public class Main {
public static void main(String[] args){
RgxGen rgxGen = new RgxGen("[^0-9]*[12]?[0-9]{1,2}[^0-9]*"); // Create generator
String s = rgxGen.generate(); // Generate new random value
BigInteger estimation = rgxGen.numUnique(); // The estimation (not accurate, see Limitations) how much unique values can be generated with that pattern.
Optional<BigInteger> estimation = rgxGen.getUniqueEstimation(); // The estimation (not accurate, see Limitations) how much unique values can be generated with that pattern.
StringIterator uniqueStrings = rgxGen.iterateUnique(); // Iterate over unique values (not accurate, see Limitations)
String notMatching = rgxGen.generateNotMatching(); // Generate not matching string
}
Expand All @@ -78,35 +105,36 @@ public class Main {
<details>
<summary><b>Supported syntax</b></summary>

| Pattern | Description |
| ---------: |-------------|
| `.` | Any symbol |
| `?` | One or zero occurrences |
| `+` | One or more occurrences |
| `*` | Zero or more occurrences |
| `\r` | Carriage return `CR` character |
| `\t` | Tab ` ` character |
| `\n` | Line feed `LF` character. |
| `\d` | A digit. Equivalent to `[0-9]` |
| `\D` | Not a digit. Equivalent to `[^0-9]` |
| `\s` | Carriage Return, Space, Tab, Newline, Vertical Tab, Form Feed |
| `\S` | Anything, but Carriage Return, Space, Tab, Newline, Vertical Tab, Form Feed |
| `\w` | Any word character. Equivalent to `[a-zA-Z0-9_]` |
| `\W` | Anything but a word character. Equivalent to `[^a-zA-Z0-9_]` |
| `\i` | Places same value as capture group with index `i`. `i` is any integer number. |
| `\Q` and `\E` | Any characters between `\Q` and `\E`, including metacharacters, will be treated as literals. |
| `\b` and `\B` | These characters are ignored. No validation is performed! |
| `\xXX` and `\x{XXXX}` | Hexadecimal value of unicode characters 2 or 4 digits |
| `{a}` and `{a,b}` | Repeat a; or min a max b times. Use {n,} to repeat at least n times. |
| `[...]` | Single character from ones that are inside brackets. `[a-zA-Z]` (dash) also supported |
| `[^...]` | Single character except the ones in brackets. `[^a]` - any symbol except 'a' |
| `()` | To group multiple characters for the repetitions |
| `foo(?=bar)` and `(?<=foo)bar` | Positive lookahead and lookbehind. These are equivalent to `foobar` |
| `foo(?!bar)` and `(?<!foo)bar` | Negative lookahead and lookbehind. |
| <code>(a&#124;b)</code> | Alternatives |
| \\ | Escape character (use \\\\ (double backslash) to generate single \ character) |

RgxGen treats any other characters as literals and those are generated as is.
| Pattern | Description |
|-------------------------------:|----------------------------------------------------------------------------------------------|
| `.` | Any symbol |
| `?` | One or zero occurrences |
| `+` | One or more occurrences |
| `*` | Zero or more occurrences |
| `\r` | Carriage return `CR` character |
| `\t` | Tab ` ` character |
| `\n` | Line feed `LF` character. |
| `\d` | A digit. Equivalent to `[0-9]` |
| `\D` | Not a digit. Equivalent to `[^0-9]` |
| `\s` | Carriage Return, Space, Tab, Newline, Vertical Tab, Form Feed |
| `\S` | Anything, but Carriage Return, Space, Tab, Newline, Vertical Tab, Form Feed |
| `\w` | Any word character. Equivalent to `[a-zA-Z0-9_]` |
| `\W` | Anything but a word character. Equivalent to `[^a-zA-Z0-9_]` |
| `\i` | Places same value as capture group with index `i`. `i` is any integer number. |
| `\Q` and `\E` | Any characters between `\Q` and `\E`, including metacharacters, will be treated as literals. |
| `\b` and `\B` | These characters are ignored. No validation is performed! |
| `\xXX` and `\x{XXXX}` | Hexadecimal value of unicode characters 2 or 4 hexadecimal digits |
| `\uXXXX` | Hexadecimal value of unicode characters 4 hexadecimal digits |
| `{a}` and `{a,b}` | Repeat a; or min a max b times. Use {n,} to repeat at least n times. |
| `[...]` | Single character from ones that are inside brackets. `[a-zA-Z]` (dash) also supported |
| `[^...]` | Single character except the ones in brackets. `[^a]` - any symbol except 'a' |
| `()` | To group multiple characters for the repetitions |
| `foo(?=bar)` and `(?<=foo)bar` | Positive lookahead and lookbehind. These are equivalent to `foobar` |
| `foo(?!bar)` and `(?<!foo)bar` | Negative lookahead and lookbehind. |
| <code>(a&#124;b)</code> | Alternatives |
| \\ | Escape character (use \\\\ (double backslash) to generate single \ character) |

RgxGen treats any other characters as literals - those are generated as is.

</details>

Expand Down Expand Up @@ -194,8 +222,13 @@ public class Main {

## Limitations

### Lookahead and Lookbehind

Currently, these two have very limited support. Please refer to [#63](https://github.com/curious-odd-man/RgxGen/issues/63).
I'm currently working on the solution, but I cannot say when I come up with something.

### Estimation
`rgxGen.numUnique()` - might not be accurate, because it does not count actual unique values, but only counts different states of each building block of the expression.
`rgxGen.getUniqueEstimation()` - might not be accurate, because it does not count actual unique values, but only counts different states of each building block of the expression.
For example: `"(a{0,2}|b{0,2})"` will be estimated as 6, though actual number of unique values is 5.
That is because left and right alternative can produce same value.
At the same time `"(|(a{1,2}|b{1,2}))"` will be correctly estimated to 5, though it will generate same values.
Expand All @@ -207,7 +240,7 @@ For the similar reasons as with estimations - requested unique values iterator c
### Infinite patterns

By design `a+`, `a*` and `a{n,}` patterns in regex imply infinite number of characters should be matched.
When generating data that would mean values of infinite length might be generated.
When generating data, that would mean values of infinite length might be generated.
It is highly doubtful anyone would require a string of infinite length, thus I've artificially limited repetitions in such patterns to 100 symbols, when generating random values.
This value can be changed - please refer to [configuration](https://github.com/curious-odd-man/RgxGen#configuration) section.

Expand All @@ -233,6 +266,7 @@ Though I found they have following issues:
1. All of them build graph which can easily produce OOM exception. For example pattern `a{60000}`, or [IPV6 regex pattern](https://stackoverflow.com/questions/53497/regular-expression-that-matches-valid-ipv6-addresses).
1. Alternatives - only 2 alternatives gives equal probability of each alternative to appear in generated values. For example: `(a|b)` the probability of a and b is equal. For `(a|b|c)` it would be expected to have a or b or c with probability 33.(3)% each. Though really the probabilities are a=50%, and b=25% and c=25% each. For longer alternatives you might never get the last alternative.
1. They are quite slow
1. Lightweight. This library does not have any dependencies.

## Support

Expand Down
6 changes: 3 additions & 3 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

<groupId>com.github.curious-odd-man</groupId>
<artifactId>rgxgen</artifactId>
<version>1.3</version>
<version>1.4</version>

<packaging>jar</packaging>

Expand All @@ -21,8 +21,8 @@
<min.maven.version>3.6.1</min.maven.version>

<!-- Dependencies versions -->
<junit.version>4.13.1</junit.version>
<jmh.version>1.26</jmh.version>
<junit.version>4.13.2</junit.version>
<jmh.version>1.35</jmh.version>

<!-- Plugins versions -->
<maven.compiler.plugin.version>3.8.1</maven.compiler.plugin.version>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

import java.util.NoSuchElementException;

public class ArrayIterator extends StringIterator {
public class ArrayIterator implements StringIterator {

private final int aMaxIndex;
private final Character[] aStrings;
Expand All @@ -36,7 +36,7 @@ public boolean hasNext() {
}

@Override
public String nextImpl() {
public String next() {
++aIndex;
if (aIndex >= aStrings.length) {
throw new NoSuchElementException("Not enough elements in arrays");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
import java.util.OptionalInt;
import java.util.TreeMap;

public class CaseVariationIterator extends StringIterator {
public class CaseVariationIterator implements StringIterator {
private final String aOriginalValue;
private final StringBuilder aValue;
private final TreeMap<Integer, Boolean> aSwitchableCharPositions; // true - lower, false - upper case
Expand All @@ -44,7 +44,7 @@ public CaseVariationIterator(String value) {
}

@Override
protected String nextImpl() {
public String next() {
if (!hasNext) {
throw new NoSuchElementException("No more variations");
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

import java.util.NoSuchElementException;

public class ChoiceIterator extends StringIterator {
public class ChoiceIterator implements StringIterator {
private final StringIterator[] aIterators;

private int aCurrentIteratorIndex;
Expand All @@ -33,7 +33,7 @@ public boolean hasNext() {
}

@Override
public String nextImpl() {
public String next() {
if (!aIterators[aCurrentIteratorIndex].hasNext()) {
++aCurrentIteratorIndex;
if (aCurrentIteratorIndex >= aIterators.length) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
import java.util.NoSuchElementException;
import java.util.function.Supplier;

public class IncrementalLengthIterator extends StringIterator {
public class IncrementalLengthIterator implements StringIterator {
private final Supplier<StringIterator> aSupplier;
private final int aMin;
private final int aMax;
Expand Down Expand Up @@ -79,7 +79,7 @@ private void extendIterators() {
}

@Override
public String nextImpl() {
public String next() {
if (aCurrentLength == 0) {
++aCurrentLength;
return "";
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

import java.util.regex.Pattern;

public class NegativeStringIterator extends StringIterator {
public class NegativeStringIterator implements StringIterator {
private final StringIterator aIterator;
private final Pattern aPattern;

Expand All @@ -30,7 +30,7 @@ public NegativeStringIterator(StringIterator iterator, Pattern pattern) {
}

@Override
protected String nextImpl() {
public String next() {
do {
aValue = aIterator.next();
} while (aPattern.matcher(aValue)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
import java.util.NoSuchElementException;
import java.util.function.Supplier;

public class PermutationsIterator extends StringIterator {
public class PermutationsIterator implements StringIterator {
private final StringIterator[] aIterators;

private boolean aInitialized;
Expand All @@ -47,7 +47,7 @@ public boolean hasNext() {
}

@Override
public String nextImpl() {
public String next() {
// Initialize all value
if (aInitialized) {
// Advance one of iterators
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@
limitations under the License.
/* **************************************************************************/

public class ReferenceIterator extends StringIterator {
import java.util.NoSuchElementException;

public class ReferenceIterator implements StringIterator {
private StringIterator aOther;
private boolean hasNext = true;
private String aLast;
Expand All @@ -26,7 +28,10 @@ public void setOther(StringIterator other) {
}

@Override
protected String nextImpl() {
public String next() {
if (!hasNext()) {
throw new NoSuchElementException("Cannot return value second time");
}
hasNext = false;
return aOther.current();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

import java.util.NoSuchElementException;

public class SingleValueIterator extends StringIterator {
public class SingleValueIterator implements StringIterator {
private final String aValue;

private boolean hasNext;
Expand All @@ -38,7 +38,7 @@ public boolean hasNext() {
}

@Override
public String nextImpl() {
public String next() {
if (!hasNext) {
throw new NoSuchElementException("Cannot return a value second time.");
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,25 +17,21 @@
/* **************************************************************************/

import java.util.Iterator;
import java.util.NoSuchElementException;

public abstract class StringIterator implements Iterator<String> {
@SuppressWarnings("IteratorNextCanNotThrowNoSuchElementException")
@Override
public String next() {
return nextImpl();
}
public interface StringIterator extends Iterator<String> {
/**
* Reset the iterator to the initial position.
* After reset it will start iterating from the first value.
* <p>
* Can be used to restart iterator that returns {@code false} when {@code hasNext()} is called.
*/
void reset();

/**
* This method returns correct value only on top level iterator.
* For other iterators 2 steps are required - next() and then current().
* Return same value as last call to {@code next()}.
* Behavior is not defined if method is called before {@code next()}
*
* @return next String.
* @throws NoSuchElementException if the iteration has no more elements
* @return Value returned by last call to {@code next()}.
*/
protected abstract String nextImpl();

public abstract void reset();

public abstract String current();
String current();
}
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,17 @@ private int parseHexadecimal() {
return Integer.parseInt(hexValue, HEX_RADIX);
}

/**
* Parse unicode hexadecimal string into a integer value.
* Format: NNNN
*
* @return integer value
*/
private int parseUnicode() {
String hexValue = aCharIterator.next(4);
return Integer.parseInt(hexValue, HEX_RADIX);
}

/**
* Create group reference node.
* It starts after escape character AND first digit of group index.
Expand Down Expand Up @@ -392,6 +403,10 @@ private void handleEscapedCharacter(StringBuilder sb, Collection<Node> nodes, bo
sb.append((char) parseHexadecimal());
break;

case 'u':
sb.append((char) parseUnicode());
break;

case 'Q':
sb.append(aCharIterator.nextUntil("\\E"));
break;
Expand Down
Loading

0 comments on commit a32ad07

Please sign in to comment.