# Welcome to REmatchLite!
- This program allows users to experiment with the REmatch regex engine created by Reveros et al. without knowledge of the RegEx or REQL languages.
- To use this notebook, you may need to install iPyWidgets with `pip install ipywidgets`
- Additionally, you will likely need to install pyrematch with `pip install pyrematch`
- If a given cell's GUI is not giving any output, rerun the cell manually. Using the "restart the kernel and rerun all cells" button seems to cause problems with iPyWidgets

Tip: For a sample text file to use, Project Gutenberg is a great resource. Example: https://www.gutenberg.org/cache/epub/84/pg84.txt


In [4]:
import ipywidgets as widgets
import pyrematch as REmatch
import codecs
import re

## Demonstration 1: Finding *all* Matches
- The key benefit of using the REmatch RegEx engine is its **all-match semantics**
- Typical RegEx engines do not get every possible match to the string you are looking for. If matches are overlapping, a decision will be made to keep one match and ignore the other
- Below, you can compare a simple REmatch match to RegEx with a query of your choosing

In [5]:
documentEntry = (widgets.Text(
    value="kayakayakayakayak",
    placeholder='Type something...',
    description='Examine:',
    disabled=False   
))
print("Please enter a body of text to search:")
display(documentEntry)
patternEntry = (widgets.Text(
    value="kayak",
    placeholder='Type something...',
    description='Looking for:',
    disabled=False   
))
print("After that, please enter a string to search for within your body of text:")
display(patternEntry)

def get_results():
    # Formulate a REmatch query
    pattern = "!x{"+patternEntry.value+"}"
    query = REmatch.reql(pattern)
    print("Searching for pattern...")
    rematch_iter = query.finditer(documentEntry.value)
    rematch_results = [match.group('x') for match in rematch_iter]
    
    print("REmatch found " + str(len(rematch_results)) + " results: " + str(rematch_results))

    # Formulate a standard RegEx query
    standard_pattern = patternEntry.value
    standard_results = re.findall(standard_pattern, documentEntry.value)
    print("Standard RegEx found " + str(len(standard_results)) + " results: " + str(standard_results))

button = widgets.Button(description="Run Search!")
output = widgets.Output(
    layout={'border': '1px solid black', 'padding': '24px'}
)

display(button, output)

def on_button_clicked(b):
    with output:
        print("RESULTS:")
        get_results()

button.on_click(on_button_clicked)

Please enter a body of text to search:


Text(value='kayakayakayakayak', description='Examine:', placeholder='Type something...')

After that, please enter a string to search for within your body of text:


Text(value='kayak', description='Looking for:', placeholder='Type something...')

Button(description='Run Search!', style=ButtonStyle())

Output(layout=Layout(border_bottom='1px solid black', border_left='1px solid black', border_right='1px solid b…

## Demonstration 2: Searching for words with a given prefix *and* the context in which the word was used.
- RegEx typically uses a leftmost-longest strategy for matching. This means it throws away any matches that overlap with each other, except for the match that is the furthest to the left and longest.
- Because overlapping matches are thrown away, it is difficult to have a standard RegEx engine grab a desired string *and* the context in which that string appeared--the desired string and the context overlap each other. This is true even with the use of lookahead operators.
- Finding surrounding context can be beneficial. For example, suppose you are a linguist looking for examples of a particular prefix or word root in literature. Extracting surrounding sentences, instead of just the word itself, can be beneficial. Try it below!

In [6]:
uploader = widgets.FileUpload(
    description="Select a text file:",
    accept='.txt'
)
print("Please select a file to search:")
display(uploader)
wordEntry = (widgets.Text(
    placeholder='Type something...',
    disabled=False   
))
print("After that, please enter a prefix to search for within your selected file:")
display(wordEntry)

def get_results():
    fileContent = ""
    try:
        uploaded_file = uploader.value[0]
        fileContent = codecs.decode(uploaded_file.content, encoding="utf-8").replace("\n", " ").replace("\r", "")
    except:
        print("No File Chosen! Using sample excerpt from Jack London's White Fang.")
        fileContent = "Dark spruce forest frowned on either side the frozen waterway. The trees had been stripped by a recent wind of their white covering of frost, and they seemed to lean towards each other, black and ominous, in the fading light. A vast silence reigned over the land. The land itself was a desolation, lifeless, without movement, so lone and cold that the spirit of it was not even that of sadness. There was a hint in it of laughter, but of a laughter more terrible than any sadness—a laughter that was mirthless as the smile of the sphinx, a laughter cold as the frost and partaking of the grimness of infallibility. It was the masterful and incommunicable wisdom of eternity laughing at the futility of life and the effort of life. It was the Wild, the savage, frozen-hearted Northland Wild.."
    prefix = wordEntry.value
    if prefix == "":
        print("No Prefix Chosen! Using sample prefix 'Tree'.\n---")
        prefix = "tree"
    # Formulate a query
    pattern = "(^|(\.))!sent{[^.]* !w1{"+prefix+"\w+}( [^.]*)?\.}"
    query = REmatch.reql(pattern)
    # Execute the query and print the results
    for match in query.finditer(fileContent):
        word = match.group('w1')
        sentence = match.group('sent')
        print(f"The word '{word}' appears in the following sentence:\n {sentence}")
        print("\n----------------------------------------------------------")

button = widgets.Button(description="Run Search!")
output = widgets.Output(
    layout={'border': '1px solid black', 'padding': '24px'}
)

display(button, output)

def on_button_clicked(b):
    with output:
        print("RESULTS:")
        get_results()

button.on_click(on_button_clicked)

Please select a file to search:


FileUpload(value=(), accept='.txt', description='Select a text file:')

After that, please enter a prefix to search for within your selected file:


Text(value='', placeholder='Type something...')

Button(description='Run Search!', style=ButtonStyle())

Output(layout=Layout(border_bottom='1px solid black', border_left='1px solid black', border_right='1px solid b…