listregex

listregex implements the same functions as Python's stdlib re module (and a few more), but instead of operating only on strings, it operates on lists of arbitrary objects. If you've found yourself writing awkward code to extract subsequences from a list, and thought to yourself "this would be a tiny regex if my list was a string", then this is the library.

This is not a high-speed regex engine, as it currently uses naive backtracking in pure Python. On the other hand, there's greater flexibility in the patterns allowed, and even a mechanism for arbitrary tests.

Patterns can be:

A single literal value. Example: search(pattern=1, items=[1, 2]) matches [1].
A list/tuple of patterns, where the sub-patterns are matched sequentially. Example: search([1, 2], [0, 1, 2]) matches [1, 2].
A value from a helper function, such as optional(pattern), zero_or_more(pattern), end(), etc. Example: findall(repeat(1), my_list) finds all sequences of 1's.
Any combination of the above. Example: search(pattern=[1, repeat(negate(3)), 1], items=[0, 2, 1, 3, 1, 2, 1, 0]) matches [1, 2, 1].
A function that takes one parameter, the current match, and returns the number of following items that should be added to be match (note that True == 1 and False == 0). Returning 0 means no match and the engine backtracks. Examples:
- lambda m: 2 blindly accepts the next two items, such that findall(lambda m: 2, items) returns the items divided in pairs.
- lambda m: m.next % 2 == 0 checks if the next item is even, and if so, extends the match to include it.
- lambda m: m.items.count(m.next) > 1 matches all items that occur more than once.
- lambda m: m[0] > m.next compares the first item of the current match with the next.

from listregex import *

# Matches 1 and 3, optionally with a 2 between them:
fullmatch([1, optional(2), 3], [1, 3])
# Match(1, 3)

# A sequence of 1 or more items between 0 and 3:
search(repeat(lambda m: 0 < m.next <= 3), [0, 1, 2, 3, 4])
# Match(1, 2, 3)

from datetime import date, timedelta
from collections import namedtuple
Login = namedtuple('Login', 'country date')
logins = [
    Login('Germany', date(2020, 1, 1)),
    Login('Belgium', date(2020, 1, 2)),
    Login('Germany', date(2020, 3, 1)),
    Login('Germany', date(2020, 3, 2)),
    Login('Russia', date(2020, 3, 2)),
    Login('Russia', date(2020, 3, 2)),
    Login('Germany', date(2020, 3, 3)),
]
# Find suspicious logins by looking at quick country switches:
pattern = [
    # Start from any login...
    any(),
    
    # Followed by one or more logins at a different country...
    repeat(lambda m: m[0].country != m.next.country), 
    
    # Followed by a login at the original country within 2 days.
    lambda m: m.next.date - m[0].date < timedelta(days=2), 
]
search(pattern, logins)[1].country
# 'Russia'

# Collapses repeated elements.
sub([any(), zero_or_more(lambda m: m.next == m[0])], lambda m: [m[0]], [1, 2, 3, 3, 4, 5, 5])
# [1, 2, 3, 4, 5]

# Parses a binary array where elements are encoded as [length, *values].
findall(lambda m: int(m[0])+1, b'\x00\x01\x55\x02\x66\x66\x00')
# [b'\x00', b'\x01\x55', b'\x02\x66\x66', b'\x00']

# Finds all items that are bigger than the next, or at the `end` of the list.
# Uses `lookahead` to allow the next item to also be matched.
findall([any(), either(end(), lookahead(lambda m: m[0] > m.next))], [1, 2, 1, 3, 2, 4, 3, 1])
# [[2], [3], [4], [3], [1]]

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
README.md		README.md
listregex.py		listregex.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

listregex

About

Releases

Packages

Languages

boppreh/listregex

Folders and files

Latest commit

History

Repository files navigation

listregex

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages