Pyttern is a python library to create pattern files for Python code.
To install the pyttern library, you can use pip:
pip install git+https://github.com/JulienLie/pyttern.git
You can also clone the repository and install it manually:
- clone the repository
- go to the pyttern directory
- run the following command:
pip install -e .
We extended the python syntax to create pattern files. Our new syntax includes the following elements:
Wildcard | Description |
---|---|
? | Match 1 element |
?* | Match 0 or more elements |
?name | Match 1 element and bind it to name |
?: | Match 1 element with a body |
?:* | Match the body of the wildcard in any indentation |
?<...> | Match if the inside of the wildcard is contained inside the matching node |
In addition to these wildcards, we added some optional elements to allow more options:
Option | Description |
---|---|
?[Type, ...] | Match 1 element of type Type |
?{n, m} | Match between n and m elements |
import Matcher
code = "code_file.py"
pattern = "pattern_file.py"
match = Matcher.match_files(pattern, code, strict_match=False, match_details=False)
if match:
print("We found a match")
else:
print("No match")
The match_files
function takes 4 arguments:
pattern_file: string
The path to the file describing the patterncode_file: string
The path to the python code filestrict_match: boolean
(optional) When strict_match is set to True, a strict match is performed. A strict match requires an exact match between the code file and the pattern file, including code structure and syntax. If strict_match is set to False, a "soft" match is performed, which allows for flexibility in code sections using wildcards.match_details: boolean
(optional) If match_details is set to True, the function returns a tuple (result, details), where result is a boolean value indicating whether the code matches the pattern. If result is True, details contains the match details. If result is False, details contains the error that prevented the match.
The ?
wildcard matches any single element in the code. For example, the pattern ? = 0
will match any assignment statement where the right-hand side is 0.
def ?():
? = 0
return ?
def foo():
x = 0
return "bar"
The ?*
wildcard matches zero or more elements in the code. For example, the pattern ?*
will match any sequence of elements.
def foo(?*):
?*
a = 0
?*
return a
def foo(x, y, z):
x = 1
y = 2
z = 3
a = 0
if a == 0:
return a
return a
The ?name
wildcard matches a single element in the code and binds it to the name name
. For example, the pattern ?name = 0
will match any assignment statement where the right-hand side is 0 and bind the left-hand side to the name name
.
def foo(?name):
?name.append(0)
return ?name
def foo(lst):
lst.append(0)
return lst
The ?:
wildcard matches a single element in the code that has a body.
def foo():
?:
x = 0
return x
def foo():
if True:
x = 0
return x
The ?:*
wildcard matches the body of the wildcard in any indentation.
def foo():
?:*
x = 0
return x
def foo():
if True:
if True:
x = 0
return x
def foo():
x = 0
return x
The ?<...>
wildcard matches if the inside of the wildcard is contained inside the matching node.
def foo(x):
y = ?<x>
return y
def foo(x):
y = 2*x + 1
return y
You can combine wildcards to create more complex patterns.
TODO: Add more examples
def ?(?*):
?acc = 0
?:*
for ? in ?:
?:*
?acc += ?
return ?acc
def sum(lst):
acc = 0
for i in lst:
acc += i
return acc
In a soft match, there is flexibility in code structure and the possibility of including extra code, as long as the main matching criteria are met. In contrast, in a strict match, precise adherence to the defined code structure and syntax is necessary, and there is limited to no allowance for variations or additional code outside the specified structure.
The wildcard ?![]
is a notation that allows for a combination of strict and soft matching in certain parts of a code pattern. It is useful when you want to perform a soft match but have a strict match requirement within a specific section of code.
Let's consider an example to illustrate this. Suppose we have the following pattern:
def foo(bar):
?var = 0
for ? in range(?*):
?![
if ?:
?var += 1
]
In this pattern, the wildcard ? represents a placeholder for any valid Python identifier. The ?var = 0 statement assigns the value 0 to a variable, which we'll refer to as x. The for ? in range(?*) loop iterates over a range of values, which we'll refer to as y. Finally, ?![ ... ] represents a strict match requirement that enforces certain code within the if statement.
Now, let's say we have the following code snippet:
def foo(bar):
x = 0
y = len(bar)
for i in range(y):
z = bar[i]
if z:
x += 1
return x
We want to match this code snippet with the given pattern. Let's go through each line and see how the wildcard ?![] allows for matching.
- In the pattern,
?var = 0
matches the codex = 0
because the wildcard?var
represents the variable namedx
. - The loop statement
for ? in range(?*)
in the pattern matches the codefor i in range(y)
. - Here,
?
corresponds to the loop variablei
, and?*
corresponds to the length of the bar list, which is stored iny
. - The strict match requirement
?![ ... ]
checks the code within the if statement. - In the pattern,
?
represents the conditionz
, and?var += 1
corresponds tox += 1
within the if block.
As a result, the code snippet matches the pattern because all the placeholders and the strict match requirements are satisfied.
However, if we have additional code within the if block, such as a print statement like print("true")
,
the pattern won't match because the strict match requirement ?![ ... ]
doesn't accommodate that extra code.
In summary, the wildcard ?![]
allows for a combination of soft and strict matching.
It provides flexibility by allowing soft matches for variables and loop structures while enforcing strict matches for
specific code sections. This helps in creating adaptable code patterns that can match similar code snippets with some variations.
To add some flexibility to our pattern we implemented different features.
We added some options to put on the wildcards:
?[Type1, Type2, ...]
allows to specify the type of the element to match.?{n, m}
allows to specify the number of elements to match. It can be used with or without the type option. This option can be used in five different ways:?{n, m}
: Match betweenn
andm
elements?{n, }
: Match at leastn
elements?{, m}
: Match at mostm
elements?{n}
: Match exactlyn
elements [//]: # (5.?{0}
: Create anot
wildcard. For example:?:{0}
will ensure that the current element does not have a body.)
The ?[Type, ...]
option allows you to specify the type of the element to match. For example, the pattern ?[For]
will match any integer value.
def foo():
?[For]:
x = 0
return x
def foo():
for i in range(10):
x = 0
return x
The ?{n, m}
option allows you to specify the number of elements to match. For example, the pattern ?{1, 2}
will match between 1 and 2 elements.
def foo():
?:{3} # I want to match exactly 3 level of indentation
x = 0
return x
def foo():
if True:
if True:
if True:
x = 0
return x
To implement a system that allows to create logic with different patterns, we implemented a specific file structure. The file structure is composed as follows:
pattern_name
└── and
├── pattern1.pyt
├── pattern2.pyt
└── or
├── not
│ └── pattern3.pyt
└── pattern4.pyt
The resulting is a boolean logic tree. The and
folder contains patterns that must all match.
The or
folder contains patterns that can match.
The not
folder contains patterns that must not match.
This allows the creation of more complex patterns with different files.
To visualize the matching algorithm, we implemented a web visualization tool. If you installed Pyttern, you can run it using the following command:
pytternweb
You can then import your pyttern on the left part of the site and your code on the right side. You can control the matching algorithm using the buttons on the bottom of the site.