https://www.notion.so/scalr/bb14f178490140e0bac9016a30949b84?v=5cd436e7aea34a5daaa677ba63790450&p=4d815b136f164c429bc319f45b9db015&pm=s

# [Rule Modifier] Rule splitting

## Problem statement

- current UI can modify elements in a rule (e.g.),
- need: ability to modify the rule as a whole

## Proposed work

Create a new prompt:

- split rule based on the following two scenarios:

1. [<or> Tags](https://www.notion.so/or-Tags-83b18d9960614627b7a3e5430193689e?pvs=21)
2. [Rule too broad](https://www.notion.so/Rule-too-broad-26c11a6d02cc44c388710ce7d1a47915?pvs=21)

The resulting split rules would need:

- New IDs
  - `"BRIEFCATCH_{rand_int(40)}"`
- Rule `name` tag modified like so
  - `"BRIEFCATCH_PUNCHINESS_288.1"`
  - `"BRIEFCATCH_PUNCHINESS_288.2"`

### when deploying

- Use the rule modification checker (developed in previous story) to ensure example / suggestion tags match the pattern after splitting the rules

# Success criteria

The criteria that must be met in order to consider this project a success.

- UI updated to allow for the splitting of rules
  - "Separate branch of logic"
    - not sure what is meant here
  - Split option added to drop down
  - Once user selects “Split”, i think a secondary select to determine why the user wants to split the rule would be necessary
    - `<or> tag` or `rule too broad` could be the options
      - If there is no <or> tag present in the rule we could drop that option
    - <or> tag splitting won’t need other user input or need to call GPT to split the rule up
    - if the rule is too broad, then we will need a field for user input and call GPT to identify how to best split the rules
- Splitting a Rule prompt written and tested
- Method for creating PR for updating repo will need to change too
  - Instead of replacing a rule that has been modified or just adding in new rules, we will need to delete the original rule that has now been split into N many


Using this rule as an example

```
<rule id="BRIEFCATCH_11012406027615556274904173201077833804" name="BRIEFCATCH_PUNCHINESS_288">
    <pattern>
        <or>
            <token inflected="yes">inquire</token>
            <token>inquiry</token>
        </or>
        <token>as</token>
        <token>to</token>
    </pattern>
    <message>Would direct language...</message>
    <suggestion>\1 into</suggestion>
    <suggestion>\1 about</suggestion>
    <suggestion>\1 in</suggestion>
    <suggestion>\1 from</suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":4,"priority":"3.209","WORD":true,"OUTLOOK":true}</short>
    <example correction="inquiry into|inquiry about|inquiry in|inquiry from">The <marker>inquiry as to</marker> the strong majority of documents and testimony sought continues.</example>
</rule>
```

We would need to split the rule into two rules:

`Rule 1` - just the inflected token for `inquire` (this one requires an update to the example tag)

```
<rule id="BRIEFCATCH_11012406027615556274904173201077833804" name="BRIEFCATCH_PUNCHINESS_288.1">
    <pattern>
        <token inflected="yes">inquire</token>
        <token>as</token>
        <token>to</token>
    </pattern>
    <message>Would direct language...</message>
    <suggestion>\1 into</suggestion>
    <suggestion>\1 about</suggestion>
    <suggestion>\1 in</suggestion>
    <suggestion>\1 from</suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":4,"priority":"3.209","WORD":true,"OUTLOOK":true}</short>
    <example correction="inquiry into|inquiry about|inquiry in|inquiry from">He <marker>inquired as to</marker> the willpower of the group</example>
</rule>
```

`Rule 2` - just the token for `inquiry` (no change for the example tag)

```
<rule id="BRIEFCATCH_11012406027615556274904173201077833804" name="BRIEFCATCH_PUNCHINESS_288.2">
    <pattern>
        <token>inquiry</token>
        <token>as</token>
        <token>to</token>
    </pattern>
    <message>Would direct language...</message>
    <suggestion>\1 into</suggestion>
    <suggestion>\1 about</suggestion>
    <suggestion>\1 in</suggestion>
    <suggestion>\1 from</suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":4,"priority":"3.209","WORD":true,"OUTLOOK":true}</short>
    <example correction="inquiry into|inquiry about|inquiry in|inquiry from">The <marker>inquiry as to</marker> the strong majority of documents and testimony sought continues.</example>
</rule>
```


In [1]:
# Using this rule as an example

original_rule = """
<rule id="BRIEFCATCH_11012406027615556274904173201077833804" name="BRIEFCATCH_PUNCHINESS_288">
    <pattern>
        <or>
            <token inflected="yes">inquire</token>
            <token>inquiry</token>
        </or>
        <token>as</token>
        <token>to</token>
    </pattern>
    <message>Would direct language...</message>
    <suggestion>\1 into</suggestion>
    <suggestion>\1 about</suggestion>
    <suggestion>\1 in</suggestion>
    <suggestion>\1 from</suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":4,"priority":"3.209","WORD":true,"OUTLOOK":true}</short>
    <example correction="inquiry into|inquiry about|inquiry in|inquiry from">The <marker>inquiry as to</marker> the strong majority of documents and testimony sought continues.</example>
</rule>
"""

# We would need to split the rule into two rules:

# `Rule 1` - just the inflected token for `inquire` (this one requires an update to the example tag)

rule_1 = """
<rule id="BRIEFCATCH_11012406027615556274904173201077833804" name="BRIEFCATCH_PUNCHINESS_288.1">
    <pattern>
        <token inflected="yes">inquire</token>
        <token>as</token>
        <token>to</token>
    </pattern>
    <message>Would direct language...</message>
    <suggestion>\1 into</suggestion>
    <suggestion>\1 about</suggestion>
    <suggestion>\1 in</suggestion>
    <suggestion>\1 from</suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":4,"priority":"3.209","WORD":true,"OUTLOOK":true}</short>
    <example correction="inquiry into|inquiry about|inquiry in|inquiry from">He <marker>inquired as to</marker> the willpower of the group</example>
</rule>
"""

# `Rule 2` - just the token for `inquiry` (no change for the example tag)

rule_2 = """
<rule id="BRIEFCATCH_11012406027615556274904173201077833804" name="BRIEFCATCH_PUNCHINESS_288.2">
    <pattern>
        <token>inquiry</token>
        <token>as</token>
        <token>to</token>
    </pattern>
    <message>Would direct language...</message>
    <suggestion>\1 into</suggestion>
    <suggestion>\1 about</suggestion>
    <suggestion>\1 in</suggestion>
    <suggestion>\1 from</suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":4,"priority":"3.209","WORD":true,"OUTLOOK":true}</short>
    <example correction="inquiry into|inquiry about|inquiry in|inquiry from">The <marker>inquiry as to</marker> the strong majority of documents and testimony sought continues.</example>
</rule>
"""

# split on or operands


In [96]:
simple_rule = """
<rule id="BRIEFCATCH_331448315792705843437979608685430062094" name="BRIEFCATCH_PUNCHINESS_1872">
    <antipattern>
        <token postag="RB.*" postag_regexp="yes"/>
        <token inflected="yes">file</token>
        <token min="0"/>
        <token regexp="yes">motion|motions</token>
        <token min="0">seeking</token>
        <token>to</token>
    </antipattern>
    <pattern>
        <token inflected="yes">file<exception>filing</exception></token>
        <or>
            <token min="0" postag="PRP$"/>
            <token>a</token>
        </or>
        <token regexp="yes">motion|motions</token>
        <token min="0">seeking</token>
        <token>to</token>
    </pattern>
    <message>Would a stronger verb help engage the reader?|**Example** from Justice Kagan: "Lange **moved to suppress** all evidence obtained after the officer entered his garage[.]"|**Example** from Justice Kavanaugh: "Before trial, Edwards **moved to suppress** the videotaped confession on the ground that the confession was involuntary."|**Example** from Morgan Chu: "The defendants also **moved to transfer** another state court action to the state court considering the petitions."</message>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">move</match> to</suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":1,"priority":"5.319","WORD":true,"OUTLOOK":true}</short>
    <example correction="moved to">The prosecution <marker>filed a motion seeking to</marker> have the victim.</example>
    <example>It was a properly filed motion seeking to overturn the election.</example>
</rule>
"""

In [60]:
product_rule = """
<rule id="BRIEFCATCH_245927502998399442504807079542143713153" name="BRIEFCATCH_CONCISENESS_3959">
    <pattern>
        <token inflected="yes">assist</token>
        <or>
            <token min="0" postag="JJ.*|PRP" postag_regexp="yes"/>
            <token>the</token>
        </or>
        <or>
            <token min="0" postag="JJ.*|PRP|PRP\$" postag_regexp="yes"/>
            <token>the</token>
        </or>
        <token postag="N.*|PRP" postag_regexp="yes">
            <exception>all</exception>
            <exception>are</exception>
            <exception>being</exception>
            <exception>beliefs</exception>
            <exception>but</exception>
            <exception>by</exception>
            <exception>can</exception>
            <exception>circuit</exception>
            <exception>clear</exception>
            <exception>concerning</exception>
            <exception>concerns</exception>
            <exception>dissent</exception>
            <exception>does</exception>
            <exception>due</exception>
            <exception>even</exception>
            <exception>fails</exception>
            <exception>find</exception>
            <exception>finds</exception>
            <exception>get</exception>
            <exception>given</exception>
            <exception>having</exception>
            <exception>his</exception>
            <exception>hold</exception>
            <exception>holds</exception>
            <exception>if</exception>
            <exception>in</exception>
            <exception>left</exception>
            <exception>like</exception>
            <exception>likes</exception>
            <exception>long</exception>
            <exception>make</exception>
            <exception>makes</exception>
            <exception>may</exception>
            <exception>might</exception>
            <exception>must</exception>
            <exception>no</exception>
            <exception>note</exception>
            <exception>one</exception>
            <exception>or</exception>
            <exception>other</exception>
            <exception>prior</exception>
            <exception>regarding</exception>
            <exception>see</exception>
            <exception>then</exception>
            <exception>try</exception>
            <exception>will</exception>
        </token>
        <token>in</token>
        <token postag="VBG">
            <exception>regarding</exception>
            <exception>concerning</exception>
            <exception>pending</exception>
            <exception>following</exception>
            <exception>standing</exception>
            <exception>helping</exception>
            <exception>neighboring</exception>
            <exception>neighbouring</exception>
        </token>
    </pattern>
    <message>Would using fewer words help sharpen the point?|**Example** from Justice Sotomayor: “The Affordable Care Act did this by, among other things, providing tax credits to **help people buy** insurance and establishing online marketplaces where insurers could sell plans.”</message>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">help</match> \2 \3 \4 <match no="6" postag="V.*" postag_regexp="yes" postag_replace="VB"/></suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":1,"priority":"6.3201","WORD":true,"OUTLOOK":true}</short>
    <example correction="Help his brother find"><marker>Assist his brother in finding</marker> an apartment.</example>
</rule>
"""
# split_rule_by_or_operands(rule)

In [100]:
from typing import List
import re


def extract_or_tag(rule_xml: str) -> str:
    or_contentL = re.search(r"(<or>.*?</or>)", rule_xml, re.DOTALL)
    if not or_contentL:
        return None
    return or_contentL.group(1)


def extract_operands(or_input_string: str) -> List[str]:
    # regular expression to find <token> tags
    token_pattern = r"(<token.*?/>|<token.*?</token>)"
    # extract all <token> tags
    return re.findall(token_pattern, or_input_string, re.DOTALL)


def split_rule_by_or_operands(input_rule: str) -> List[str]:
    """
    TODO: currently does not handle case where rule has two or tags.

    """
    or_content = extract_or_tag(input_rule)
    if not or_content:
        return input_rule
    operand_list = extract_operands(or_content)

    split_rule = input_rule.split(or_content)
    operand_rules = []
    for operand_str in operand_list:
        operand_rule = f"{split_rule[0]}{operand_str}{split_rule[1]}"
        operand_rules.append(operand_rule)
    return operand_rules


## split rule that is too broad
split_rule_by_or_operands(split_rule_by_or_operands(simple_rule)[0])

'\n<rule id="BRIEFCATCH_331448315792705843437979608685430062094" name="BRIEFCATCH_PUNCHINESS_1872">\n    <antipattern>\n        <token postag="RB.*" postag_regexp="yes"/>\n        <token inflected="yes">file</token>\n        <token min="0"/>\n        <token regexp="yes">motion|motions</token>\n        <token min="0">seeking</token>\n        <token>to</token>\n    </antipattern>\n    <pattern>\n        <token inflected="yes">file<exception>filing</exception></token>\n        <token min="0" postag="PRP$"/>\n        <token regexp="yes">motion|motions</token>\n        <token min="0">seeking</token>\n        <token>to</token>\n    </pattern>\n    <message>Would a stronger verb help engage the reader?|**Example** from Justice Kagan: "Lange **moved to suppress** all evidence obtained after the officer entered his garage[.]"|**Example** from Justice Kavanaugh: "Before trial, Edwards **moved to suppress** the videotaped confession on the ground that the confession was involuntary."|**Example** 

# split rule that is too broad


In [26]:
unsplit_rule = """
<rule id="BRIEFCATCH_147725296952682099987839530434290533040" name="PUNCHINESS_377">
    <antipattern>
        <token regexp="yes">can|could|shall|should</token>
        <token>ascertain</token>
    </antipattern>
    <antipattern>
        <token inflected="yes">ascertain<exception>ascertaining</exception></token>
        <token min="0" skip="5"/>
        <token regexp="yes">intent|meaning|standing|truth</token>
    </antipattern>
    <antipattern>
        <token inflected="yes">ascertain<exception>ascertaining</exception></token>
        <token>the</token>
        <token>citizenship</token>
    </antipattern>
    <antipattern>
        <token inflected="yes">ascertain<exception>ascertaining</exception></token>
        <token>their</token>
    </antipattern>
    <antipattern>
        <token>ascertain</token>
        <token>whether</token>
    </antipattern>
    <pattern>
        <token inflected="yes">ascertain<exception>ascertaining</exception></token>
    </pattern>
    <message>Would direct language convey your point just as effectively?|**Example** from Chief Justice Roberts: "The SEC . . . is not like an individual victim who relies on apparent injury to **learn of** a wrong."|**Example** from Judge Nalbandian: "The same day that her son lodged his complaints . . . , police . . . told her to come to the police station to **find out about** her son."|**Example** from Juanita Brooks: "In *Keystone*, after obtaining a patent, the patentee **learned of** a possible prior use of its invention."|**Example** from Jeff Lamken: "If the President **determined** that the SEC Commissioners were neglecting their duty to regulate the securities markets, he could remove the Commissioners[.]"|**Example** from YouTube’s terms of service: "Among other things, you can **find out about** YouTube Kids, the YouTube Partner Program . . . ."</message>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">determine</match></suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">learn</match></suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">establish</match></suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">discover</match></suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">find</match> out</suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">figure</match> out</suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">decide</match></suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">arrive</match> at</suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">learn</match> of</suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":9,"priority":"1.53","WORD":true,"OUTLOOK":true}</short>
    <example correction="determined|learned|learnt|established|discovered|found out|figured out|decided|arrived at|learned of|learnt of">She <marker>ascertained</marker> the item's whereabouts.</example>
    <example>We can ascertain their intent from the examples provided</example>
    <example>Ascertain its intent.</example>
    <example>To ascertain the citizenship.</example>
    <example>We couldn't ascertain their intent.</example>
    <example>It is crucial to ascertain whether the facts are true.</example>
</rule>
"""
split_rule_1 = """
<rule id="BRIEFCATCH_147725296952682099987839530434290533040" name="PUNCHINESS_377.1">
    <antipattern>
        <token regexp="yes">can|could|shall|should</token>
        <token>ascertain</token>
    </antipattern>
    <antipattern>
        <token inflected="yes">ascertain<exception>ascertaining</exception></token>
        <token min="0" skip="5"/>
        <token regexp="yes">intent|meaning|standing|truth</token>
    </antipattern>
    <antipattern>
        <token inflected="yes">ascertain<exception>ascertaining</exception></token>
        <token>the</token>
        <token>citizenship</token>
    </antipattern>
    <antipattern>
        <token inflected="yes">ascertain<exception>ascertaining</exception></token>
        <token>their</token>
    </antipattern>
    <antipattern>
        <token>ascertain</token>
        <token>whether</token>
    </antipattern>
    <pattern>
        <token inflected="yes">ascertain<exception>ascertaining</exception></token>
    </pattern>
    <message>Would direct language convey your point just as effectively?|**Example** from Chief Justice Roberts: "The SEC . . . is not like an individual victim who relies on apparent injury to **learn of** a wrong."|**Example** from Judge Nalbandian: "The same day that her son lodged his complaints . . . , police . . . told her to come to the police station to **find out about** her son."|**Example** from Juanita Brooks: "In *Keystone*, after obtaining a patent, the patentee **learned of** a possible prior use of its invention."|**Example** from Jeff Lamken: "If the President **determined** that the SEC Commissioners were neglecting their duty to regulate the securities markets, he could remove the Commissioners[.]"|**Example** from YouTube’s terms of service: "Among other things, you can **find out about** YouTube Kids, the YouTube Partner Program . . . ."</message>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">determine</match></suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">learn</match></suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">establish</match></suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">discover</match></suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">find</match> out</suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">figure</match> out</suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">decide</match></suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">arrive</match> at</suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">learn</match> of</suggestion>
    <short>{"ruleGroup":BRIEFCATCH_PUNCHINESS_377,"ruleGroupIdx":1,"isConsistency":false,"isStyle":true,"correctionCount":9,"priority":"1.53","WORD":true,"OUTLOOK":true}</short>
    <example correction="determined|learned|learnt|established|discovered|found out|figured out|decided|arrived at|learned of|learnt of">She <marker>ascertained</marker> the item's whereabouts.</example>
    <example>We can ascertain their intent from the examples provided</example>
    <example>Ascertain its intent.</example>
    <example>To ascertain the citizenship.</example>
    <example>We couldn't ascertain their intent.</example>
    <example>It is crucial to ascertain whether the facts are true.</example>
</rule>
"""
split_rule_2 = """
<rule id="BRIEFCATCH_147725296952682099987839530434290533041" name="PUNCHINESS_377.2">
    <pattern>
        <token inflected="yes">ascertain</token>
        <token>whether</token>
    </pattern>
    <message>Would direct language convey your point just as effectively?</message>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">determine</match> /2</suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">decide</match> /2</suggestion>
    <short>{"ruleGroup":BRIEFCATCH_PUNCHINESS_377,"ruleGroupIdx":2,"isConsistency":false,"isStyle":true,"correctionCount":2,"priority":"1.53","WORD":true,"OUTLOOK":true}</short>
    <example correction="determine whether|decide whether">They will <marker>ascertain whether</marker> a construction of the statute is fairly possible.</example>
</rule>
"""
explanation = """
The original rule has an antipattern for the sequence "ascertain whether"
Upon reviewing the ngram data for that sequence vs. the rule`s suggestions applied to
that sequence (which if not for the antipattern would be triggered) -
so "ascertain whether" vs. "decide whether", "determine whether", etc. - 
the results indicated that at least some of the suggestions - 
in particular the "determine whether" and "decide whether" - 
score higher in ngram than the original "ascertain whether".
And that in turn results in a "rule split" in a sense that we need to 
create an additional rule just for the sequence "ascertain whether" with 
only the 2 suggestions "decide" and "determine"."""

## algorithm

- input: `input_rule`
  - assume `input_rule` $\isin \{broad\_rules\}$
- extract POS


In [8]:
input_rule = unsplit_rule

In [9]:
from utils.utils import generate_simple_message, call_gpt

In [11]:
# generate_simple_message(toobroad_rule)

In [18]:
from utils.dynamic_rule_checking import (
    get_pos_tag_dicts_from_rule,
    POS_MAPS,
    get_similar_template_rules,
)

pos_tags = get_pos_tag_dicts_from_rule(input_rule, list(POS_MAPS.keys()))
_replace_pos = "\n".join([f"{v}" for k, v in pos_tags.items()])

print(_replace_pos)

RB Adverb and negation: easily, sunnily, suddenly, specifically, not
RBR Adverb, comparative: better, faster, quicker
RBS Adverb, superlative: best, fastest, quickest
RB_SENT Adverbial phrase including a comma that starts a sentence. #New tag (experimental) since LT 4.8. Specified in disambiguation.xml. Examples: However, Whenever possible, First of all, On the other hand,


In [24]:
# find similar examples
similar_rules = get_similar_template_rules(input_rule)
_replace_matching_examples = "\n".join(similar_rules)

# 2. grab pos tags present in input xml rule
pos_tags_from_input = get_pos_tag_dicts_from_rule(
    input_rule,
    list(POS_MAPS.keys()),
)

# grab pos tags present in matching examples
pos_tag_from_examples = [
    get_pos_tag_dicts_from_rule(r, POS_MAPS) for r in similar_rules
]
pos_tag_from_examples = {k: v for d in pos_tag_from_examples for k, v in d.items()}

# assemble POS list used in prompt from input and matching examples
pos_tags = {**pos_tags_from_input, **pos_tag_from_examples}
pos_tags

_replace_pos = "\n".join([f"{v}" for k, v in pos_tags.items()])

In [14]:
def rule_has_regex(xml_string):
    return "postag_regexp=" in xml_string


if rule_has_regex(input_rule):
    regex_instructions = """
    II. Regular Expressions Used in Rules
    RX(.*?) A token that can be any word, punctuation mark, or symbol.
    RX([a-zA-Z]*) A token that can be any word.
    RX([a-zA-Z]+) A token that can be any word.
    """
    _replace_regex = regex_instructions
else:
    _replace_regex = ""

In [None]:
PROMPT_TEMPLATE = """
You are a system focused on {task_description} 

Here's some examples:
{example_rules}   

Here are what the abbreviations mean when making modifications to the rules. 

I. Part of Speech Tags:
{part_of_speech} 

{regex_rules} 

III. Rules
Rules consist of a number of tokens, some are required and some are optional.
In the corrections, the first token is referred to as $0, the second $1, and so forth.
If at least one word or tag or regular expression appears inside parentheses/brackets, the entire string, including the parentheses/brackets, is considered a single token.
If at least two words or tags or regular expressions appear inside parentheses/brackets and if there is no “~” symbol at the end of the string, then any one of those words or tags or regular expressions is a required token in the string.
If at least one word or tag or regular expression appears inside parentheses/brackets and if there is a “~” symbol at the end of the string, then any one of those words or tags or regular expressions is an optional token in the string.
When a word or Part of Speech tag is preceded by “!”, that word or tag is excluded from the token. For example, "( CT(be) !been )" would include "be", "is", "am", "are", and "was", "were", and "being", but it would not include "been". Thus, the rule "( CT(be) !been ) happy" would flag "He was happy" but not "He had been happy".
“SKIP” is always followed by a cardinal number. The number tells you how many tokens can come between the preceding token and the next one. The string “dog SKIP4 cat”, for example, would flag “The dog likes the cat” but would not flag “The dog likes the neighbor’s old cat,” nor would it flag “The cat likes the dog”.
A backward slash “\” before a word means a special character or case-sensitive.
“CT” refers to the infinitive form of a verb that can be conjugated. “CT(read)”, for example, could be “reads”, “read”, “reading”, etc.

IV. Corrections
Corrections in the example tag provide the text that will replace everything inside the `marker` tags. Make sure when creating these, the corrected sentence would make sense when substituting in the correction. This would include no overlapping or duplicated words. However, and this is very important, if a word does not match the pattern for the rule, do not include it in the correction or within the marker tags.
Sometimes a rule has more than one possible correction. In that case, multiple alternative corrections are separated by the “@” symbol.
"""

_task_description = "splitting XML rules that are too broad. These rules encode substitutions for improving writing."