Inverse regex #1

jayvdb · 2020-02-13T11:32:01Z

It appears that your inverse regex from https://www.mail-archive.com/python-list@python.org/msg125198.html isnt in this repo, or anywhere, however it is mentioned in https://stackoverflow.com/questions/17518554/how-to-reverse-a-regex-in-python/49389042 and there are a few unattributed copies in GitHub.

IMO it is a very useful solution, and warrants being its own Python library on PyPI for easy adoption.

https://github.com/pyparsing/pyparsing/blob/master/examples/invRegex.py and https://pypi.org/project/er/ exist, and there are a few other data-fuzzing type code available, but few do full permutations of the regex efficiently. The old code is py2 only, but the changes needed to support py3 are quite small.

jayvdb · 2020-02-13T16:02:39Z

My current patch for py3 is

--- regex_inverter.py   2020-02-13 17:25:40.000000000 +0700
+++ regex_inverter.py  2020-02-13 22:46:59.652203153 +0700
@@ -5,10 +5,11 @@
 import sre_parse
 import string
 
+# Note string.ascii_letters is not the same as \w which is unicode by default
 category_chars = {
     CATEGORY_DIGIT: string.digits,
     CATEGORY_SPACE: string.whitespace,
-    CATEGORY_WORD: string.digits + string.letters + '_'
+    CATEGORY_WORD: string.digits + string.ascii_letters + '_',
 }
 
 
@@ -26,7 +27,8 @@
     return string.printable
 
 
-def handle_branch((tok, val)):
+def handle_branch(val):
+    tok, val = val
     all_opts = []
     for toks in val:
         opts = permute_toks(toks)
@@ -49,10 +51,11 @@
     return [chr(val)]
 
 
-def handle_max_repeat((min, max, val)):
+def handle_max_repeat(val):
     """
     Handle a repeat token such as {x,y} or ?.
     """
+    (min, max, val) = val
     subtok, subval = val[0]
 
     if max > 5000:
@@ -76,7 +79,7 @@
 
 
 def handle_subpattern(val):
-    return list(permute_toks(val[1]))
+    return list(permute_toks(val[3]))
 
 
 def handle_tok(tok, val):

jayvdb · 2020-02-14T17:44:15Z

I see sre-yield also tries to do this, , and input-generator and randre and xeger (and rstr) produce a single match.

bjourne · 2020-02-15T14:31:51Z

Thank you. I'll take a look at this shortly

bjourne · 2020-03-23T00:46:47Z

Thank you for the suggestions! I've reimplemented the code and added it to my repository. I don't have time to maintain it as a Python package, but if you or anyone else needs, it feel free to copy the code.

jayvdb · 2020-03-23T03:15:19Z

Thanks @bjourne

bjourne closed this as completed Mar 23, 2020

This was referenced Mar 23, 2020

inverse_regexp.py Slater-Victoroff/PyFuzz#4

Open

inverse_regexp.py matiasherranz/talks-and-django-misc#1

Open

inverse_regexp.py CodeShane/codeshane-android#1

Open

inverse_regexp.py totalgood/guten#1

Open

inverse_regexp.py dkilcy/python-sandbox#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inverse regex #1

Inverse regex #1

jayvdb commented Feb 13, 2020

jayvdb commented Feb 13, 2020 •

edited

Loading

jayvdb commented Feb 14, 2020

bjourne commented Feb 15, 2020

bjourne commented Mar 23, 2020

jayvdb commented Mar 23, 2020

Inverse regex #1

Inverse regex #1

Comments

jayvdb commented Feb 13, 2020

jayvdb commented Feb 13, 2020 • edited Loading

jayvdb commented Feb 14, 2020

bjourne commented Feb 15, 2020

bjourne commented Mar 23, 2020

jayvdb commented Mar 23, 2020

jayvdb commented Feb 13, 2020 •

edited

Loading