# Rule Creation from Ad Hoc Format
We need to take in the custom ad hoc representation of rules and output the correct XML representation of the rule. Here is the current evaluation sheet https://docs.google.com/spreadsheets/d/1IcKvUz15310M0p4P7Wbm06veaH5qeT_mo5FEoVjj8Lo/edit?pli=1#gid=58372770

### Util Functions

In [1]:
import openai
openai.api_key = input("Enter your api key: ")

In [2]:
from typing import List
from utils.logger import setup_logger
from dotenv import load_dotenv
load_dotenv()
from utils.utils import call_gpt_with_backoff, generate_simple_message, search_pinecone_index

logger = setup_logger("notebook_logger")
index_name = "chatbot-knowledge-base"
namespace = "adhoc-rule-creation"


  from tqdm.autonotebook import tqdm


### Current Prompt

In [3]:
SYSTEM_PROMPT = """You are a system that takes in ad hoc rule syntax and some other info to then translate the rule into full xml rules. Here are some examples:

{{dynamic_examples}}


Here are some abbreviations and their meanings that will be helpful in creating these rules:
I.             Part of Speech Tags 
CC Coordinating conjunction: for, and, nor, but, or, yet, so			
CD Cardinal number: one, two, twenty-four			
DT Determiner: a, an, all, many, much, any, some, this			
EX Existential there: there (no other words)			
FW Foreign word: infinitum, ipso			
IN Preposition/subordinate conjunction: except, inside, across, on, through, beyond, with, without			
JJ Adjective: beautiful, large, inspectable			
JJR Adjective, comparative: larger, quicker			
JJS Adjective, superlative: largest, quickest			
LS List item marker: not used by LanguageTool			
MD Modal: should, can, need, must, will, would			
NN Noun, singular count noun: bicycle, earthquake, zipper			
NNS Noun, plural: bicycles, earthquakes, zippers			
NN:U Nouns that are always uncountable #new tag - deviation from Penn, examples: admiration, Afrikaans			
NN:UN Nouns that might be used in the plural form and with an indefinite article, depending on their meaning #new tag - deviation from Penn, examples: establishment, wax, afternoon			
NNP Proper noun, singular: Denver, DORAN, Alexandra			
NNPS Proper noun, plural: Buddhists, Englishmen			
ORD Ordinal number: first, second, twenty-third, hundredth #New tag (experimental) since LT 4.9. Specified in disambiguation.xml. Examples: first, second, third, twenty-fourth, seventy-sixth			
PCT Punctuation mark: (`.,;:…!?`) #new tag - deviation from Penn			
PDT Predeterminer: all, sure, such, this, many, half, both, quite			
POS Possessive ending: s (as in: Peter's)			
PRP Personal pronoun: everyone, I, he, it, myself			
PRP$ Possessive pronoun: its, our, their, mine, my, her, his, your			
RB Adverb and negation: easily, sunnily, suddenly, specifically, not			
RBR Adverb, comparative: better, faster, quicker			
RBS Adverb, superlative: best, fastest, quickest			
RB_SENT Adverbial phrase including a comma that starts a sentence. #New tag (experimental) since LT 4.8. Specified in disambiguation.xml. Examples: However, Whenever possible, First of all, On the other hand,			
RP Particle: in, into, at, off, over, by, for, under			
SENT_END: LanguageTool tags the last token of a sentence as both SENT_END and a regular part-of-speech tag.			
SENT_START: LanguageTool tags the first token of a sentence as both SENT_START and a regular part-of-speech tag.			
SYM Symbol: rarely used by LanguageTool (e.g. for 'DD/MM/YYYY')			
TO to: to (no other words)			
UH Interjection: aargh, ahem, attention, congrats, help			
VB Verb, base form: eat, jump, believe, be, have			
VBD Verb, past tense: ate, jumped, believed			
VBG Verb, gerund/present participle: eating, jumping, believing			
VBN Verb, past participle: eaten, jumped, believed			
VBP Verb, non-3rd ps. sing. present: eat, jump, believe, am (as in 'I am'), are			
VBZ Verb, 3rd ps. sing. present: eats, jumps, believes, is, has			
WDT wh-determiner: that, whatever, what, whichever, which (no other words)			
WP wh-pronoun: that, whatever, what, whatsoever, whomsoever, whosoever, who, whom, whoever, whomever, which (no other words)			
WP$ Possessive wh-pronoun: whose (no other words)			
WRB wh-adverb: however, how, wherever, where, when, why			
II.             Regular Expressions Used in Rules			
RX(.*?) A token that can be any word, punctuation mark, or symbol.			
RX([a-zA-Z]*) A token that can be any word.			
RX([a-zA-Z]+) A token that can be any word.			
III. Rules			
Rules consist of a number of tokens, some are required and some are optional.			
In the corrections, the first token is referred to as $0, the second $1, and so forth.			
If at least one word or tag or regular expression appears inside parentheses/brackets, the entire string, including the parentheses/brackets, is considered a single token.			
If at least two words or tags or regular expressions appear inside parentheses/brackets and if there is no “~” symbol at the end of the string, then any one of those words or tags or regular expressions is a required token in the string.			
If at least one word or tag or regular expression appears inside parentheses/brackets and if there is a “~” symbol at the end of the string, then any one of those words or tags or regular expressions is an optional token in the string.			
When a word or Part of Speech tag is preceded by “!”, that word or tag is excluded from the token. For example, "( CT(be) !been )" would include "be", "is", "am", "are", and "was", "were", and "being", but it would not include "been". Thus, the rule "( CT(be) !been ) happy" would flag "He was happy" but not "He had been happy".			
“SKIP” is always followed by a cardinal number. The number tells you how many tokens can come between the preceding token and the next one. The string “dog SKIP4 cat”, for example, would flag “The dog likes the cat” but would not flag “The dog likes the neighbor’s old cat,” nor would it flag “The cat likes the dog”.			
A backward slash “\” before a word means a special character or case-sensitive.			
“CT” refers to the infinitive form of a verb that can be conjugated. “CT(read)”, for example, could be “reads”, “read”, “reading”, etc.			
IV.          Corrections			
Corrections in the example tag provide the text that will replace everything inside the `marker` tags. Make sure when creating these, the corrected sentence would make sense when substituting in the correction. This would include no overlapping or duplicated words. However, and this is very important, if a word does not match the pattern for the rule, do not include it in the correction or within the marker tags.
Sometimes a rule has more than one possible correction. In that case, multiple alternative corrections are separated by the “@” symbol.			


Important Notes:
- Always set the rule id to `{new_rule_id}`
- Only return the rule XML, do not introduce it or wrap it with back ticks.
- If the ad hoc version has a part of speech tag in the same parentheses as suggestions, use the `<or>...</or>` tag with the part of speech tag as one token and the other options as a regexp token. for example with the input: 
keep the change ( NNPS how that when)
the output pattern would be:
<pattern>
  <token>keep</token>
  <token>the</token>
  <token>change</token>
  <or>
    <token postag="NNPS"/>
    <token regexp="yes">how|that|when</token>
  </or>
</pattern>
  - Note how the or tag is applied ONLY when the part of speech tag is inside the same parentheses as "how that when". Do not use the or tag if a part of speech tag is separate from other options. If using the or tag, make sure to use the regexp field and include multiple options in one token separated by `|`.
- The only instance that marker tags should be in the pattern is if there is a SENT_START postag in a token in the pattern. In this case, all tokens that succeed the SENT_START token need to be nested within marker tags, so that the SENT_START token is applied correctly.
- When converting the explanation to the message tag, make sure to convert any HTML notation to its markdown equivalent.
- The exception tags are only used for words that are marked with `!`. If you see you need to make an exception tag, make a note of this in your thoughts to determine which group of options needs to be exceptions and which are regular regexp.

Write your thoughts breaking down each part of the rule you are about to write, surround these thoughts in tags like <THOUGHT>...</THOUGHT>. Write up to 100 words thinking through your choices and considering the rules laid out
"""

### Test Dataset
Some example inputs and their expected outputs

In [4]:
example_one = """Ad Hoc:
SENT_START keep in mind ( NNP how that the what when )
Rule Number:
30119
Correction:
Remember $4 @ Recall $4
Category:
Conciseness
Explanation:
Would using fewer words help tighten the sentence?
Test Sentence:
Keep in mind George Orwell’s six rules. 
Corrected Test Sentence:
Remember George Orwell’s six rules.

XML Rule:"""

In [5]:
relevant_examples = search_pinecone_index(
    index_name=index_name,
    namespace=namespace,
    search_param=example_one.split("\n")[1],
    num_results=6,
    threshold=0.5
)
relevant_examples

[{'expected_output': '<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30119">\n    <pattern>\n            <token postag="SENT_START"/>\n            <marker>\n                    <token>keep</token>\n                    <token>in</token>\n                    <token>mind</token>\n                    <or>\n                            <token postag="NNP"/>\n                            <token regexp="yes">how|that|the|what|when</token>\n                    </or>\n            </marker>\n    </pattern>\n    <message>Would using fewer words help tighten the sentence?</message>\n    <suggestion>Remember <match no="5"/></suggestion>\n    <suggestion>Recall <match no="5"/></suggestion>\n    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":2,"priority":"5.128","WORD":true,"OUTLOOK":true}</short>\n    <example correction="Remember George|Recall George"><marker>Keep in mind George</marker> Orwell`s six rules.</example>\n</rule>',
  'full_input': 

In [6]:
formatted_examples_string = "\n\n###\n\n".join(f"{item['full_input']}\n\n{item['expected_output']}" for item in relevant_examples)
print(formatted_examples_string)

Ad Hoc:
SENT_START keep in mind ( NNP how that the what when )
Rule Number:
30119
Correction:
Remember $4 @ Recall $4
Category:
Conciseness
Explanation:
Would using fewer words help tighten the sentence?
Test Sentence:
Keep in mind George Orwell’s six rules. 
Corrected Test Sentence:
Remember George Orwell’s six rules.

XML Rule:

<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30119">
    <pattern>
            <token postag="SENT_START"/>
            <marker>
                    <token>keep</token>
                    <token>in</token>
                    <token>mind</token>
                    <or>
                            <token postag="NNP"/>
                            <token regexp="yes">how|that|the|what|when</token>
                    </or>
            </marker>
    </pattern>
    <message>Would using fewer words help tighten the sentence?</message>
    <suggestion>Remember <match no="5"/></suggestion>
    <suggestion>Recall <match no="5"/></suggestion>
    <short>{"r

In [7]:
example_one_messages = generate_simple_message(SYSTEM_PROMPT.replace("{{dynamic_examples}}", formatted_examples_string), example_one)
example_one_response = call_gpt_with_backoff(messages=example_one_messages, model="gpt-4-1106-preview", temperature=0, max_length=1480)
print(example_one_response[0])

<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30119">
    <pattern>
        <token postag="SENT_START"/>
        <marker>
            <token>keep</token>
            <token>in</token>
            <token>mind</token>
            <or>
                <token postag="NNP"/>
                <token regexp="yes">how|that|the|what|when</token>
            </or>
        </marker>
    </pattern>
    <message>Would using fewer words help tighten the sentence?</message>
    <suggestion>Remember <match no="5"/></suggestion>
    <suggestion>Recall <match no="5"/></suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":2,"priority":"5.128","WORD":true,"OUTLOOK":true}</short>
    <example correction="Remember George|Recall George"><marker>Keep in mind George</marker> Orwell’s six rules.</example>
</rule>


Expected output:
```
<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30119">
        <pattern>
                <token postag="SENT_START"/>
                <marker>
                        <token>keep</token>
                        <token>in</token>
                        <token>mind</token>
                        <or>
                                <token postag="NNP"/>
                                <token regexp="yes">how|that|the|what|when</token>
                        </or>
                </marker>
        </pattern>
        <message>Would using fewer words help tighten the sentence?</message>
        <suggestion>Remember <match no="5"/></suggestion>
        <suggestion>Recall <match no="5"/></suggestion>
        <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":2,"priority":"5.128","WORD":true,"OUTLOOK":true}</short>
        <example correction="Remember George|Recall George"><marker>Keep in mind George</marker> Orwell`s six rules.</example>
</rule>
```
---

In [8]:
example_two = """Ad Hoc:
( and is ) not without ( consequence consequences )
Rule Number:
30120
Correction:
$0 significant @ $0 weighty @ $0 consequential 
Category:
Conciseness
Explanation:
Would using fewer words help tighten the sentence?
Test Sentence:
The event is not without consequence. 
Corrected Test Sentence:
The event is significant.

XML Rule:"""

In [9]:
relevant_examples = search_pinecone_index(
    index_name=index_name,
    namespace=namespace,
    search_param=example_two.split("\n")[1],
    num_results=6,
    threshold=0.5
)
relevant_examples

[{'expected_output': '<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30120">\n        <pattern>\n                <token regexp="yes">and|is</token>\n                <token>not</token>\n                <token>without</token>\n                <token regexp="yes">consequence|consequences</token>\n        </pattern>\n        <message>Would using fewer words help tighten the sentence?</message>\n        <suggestion><match no="1"/> significant</suggestion>\n        <suggestion><match no="1"/> weighty</suggestion>\n        <suggestion><match no="1"/> consequential</suggestion>\n        <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":3,"priority":"4.145","WORD":true,"OUTLOOK":true}</short>\n        <example correction="is significant|is weighty|is consequential">The event <marker>is not without consequence</marker>.</example>\n</rule>',
  'full_input': 'Ad Hoc:\n( and is ) not without ( consequence consequences )\nRule Number:\n30120\nCor

In [10]:
formatted_examples_string = "\n\n###\n\n".join(f"{item['full_input']}\n\n{item['expected_output']}" for item in relevant_examples)
print(formatted_examples_string)

Ad Hoc:
( and is ) not without ( consequence consequences )
Rule Number:
30120
Correction:
$0 significant @ $0 weighty @ $0 consequential 
Category:
Conciseness
Explanation:
Would using fewer words help tighten the sentence?
Test Sentence:
The event is not without consequence. 
Corrected Test Sentence:
The event is significant.

XML Rule:

<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30120">
        <pattern>
                <token regexp="yes">and|is</token>
                <token>not</token>
                <token>without</token>
                <token regexp="yes">consequence|consequences</token>
        </pattern>
        <message>Would using fewer words help tighten the sentence?</message>
        <suggestion><match no="1"/> significant</suggestion>
        <suggestion><match no="1"/> weighty</suggestion>
        <suggestion><match no="1"/> consequential</suggestion>
        <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":3

In [11]:
example_two_messages = generate_simple_message(SYSTEM_PROMPT.replace("{{dynamic_examples}}", formatted_examples_string), example_two)
example_two_response = call_gpt_with_backoff(messages=example_two_messages, model="gpt-4-1106-preview", temperature=0, max_length=1480)
print(example_two_response[0])

<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30120">
        <pattern>
                <token regexp="yes">and|is</token>
                <token>not</token>
                <token>without</token>
                <token regexp="yes">consequence|consequences</token>
        </pattern>
        <message>Would using fewer words help tighten the sentence?</message>
        <suggestion><match no="1"/> significant</suggestion>
        <suggestion><match no="1"/> weighty</suggestion>
        <suggestion><match no="1"/> consequential</suggestion>
        <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":3,"priority":"4.145","WORD":true,"OUTLOOK":true}</short>
        <example correction="is significant|is weighty|is consequential">The event <marker>is not without consequence</marker>.</example>
</rule>


Expected output:
```
<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30120">
        <pattern>
                <token regexp="yes">and|is</token>
                <token>not</token>
                <token>without</token>
                <token regexp="yes">consequence|consequences</token>
        </pattern>
        <message>Would using fewer words help tighten the sentence?</message>
        <suggestion><match no="1"/> significant</suggestion>
        <suggestion><match no="1"/> weighty</suggestion>
        <suggestion><match no="1"/> consequential</suggestion>
        <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":3,"priority":"4.145","WORD":true,"OUTLOOK":true}</short>
        <example correction="is significant|is weighty|is consequential">The event <marker>is not without consequence</marker>.</example>
</rule>
```
___

In [12]:
example_three = """Ad Hoc:
CT(be) ( fairly quite rather somewhat ) ( afraid available clear difficult easy essential good important likely necessary possible ready similar sure true wrong )
Rule Number:
30122
Correction:
$0 $2
Category:
Conciseness
Explanation:
Would cutting this implied modifier help strengthen the sentence?
Test Sentence:
It is quite easy to rewrite an article. 
Corrected Test Sentence:
It is easy to rewrite an article.

XML Rule:"""

In [13]:
relevant_examples = search_pinecone_index(
    index_name=index_name,
    namespace=namespace,
    search_param=example_three.split("\n")[1],
    num_results=6,
    threshold=0.5
)
relevant_examples

[{'expected_output': '<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30122">\n    <pattern>\n            <token inflected="yes">be</token>\n            <token regexp="yes">fairly|quite|rather|somewhat</token>\n            <token regexp="yes">afraid|available|clear|difficult|easy|essential|good|important|likely|necessary|possible|ready|similar|sure|true|wrong</token>\n    </pattern>\n    <message>Would cutting this implied modifier help strengthen the sentence?</message>\n    <suggestion><match no="1"/> <match no="3"/></suggestion>\n    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":1,"priority":"3.249","WORD":true,"OUTLOOK":true}</short>\n    <example correction="is easy">It <marker>is quite easy</marker> to rewrite an article.</example>\n</rule>',
  'full_input': 'Ad Hoc:\nCT(be) ( fairly quite rather somewhat ) ( afraid available clear difficult easy essential good important likely necessary possible ready similar sure true wro

In [14]:
formatted_examples_string = "\n\n###\n\n".join(f"{item['full_input']}\n\n{item['expected_output']}" for item in relevant_examples)
print(formatted_examples_string)

Ad Hoc:
CT(be) ( fairly quite rather somewhat ) ( afraid available clear difficult easy essential good important likely necessary possible ready similar sure true wrong )
Rule Number:
30122
Correction:
$0 $2
Category:
Conciseness
Explanation:
Would cutting this implied modifier help strengthen the sentence?
Test Sentence:
It is quite easy to rewrite an article. 
Corrected Test Sentence:
It is easy to rewrite an article.

XML Rule:

<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30122">
    <pattern>
            <token inflected="yes">be</token>
            <token regexp="yes">fairly|quite|rather|somewhat</token>
            <token regexp="yes">afraid|available|clear|difficult|easy|essential|good|important|likely|necessary|possible|ready|similar|sure|true|wrong</token>
    </pattern>
    <message>Would cutting this implied modifier help strengthen the sentence?</message>
    <suggestion><match no="1"/> <match no="3"/></suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"is

In [15]:
example_three_messages = generate_simple_message(SYSTEM_PROMPT.replace("{{dynamic_examples}}", formatted_examples_string), example_three)
example_three_response = call_gpt_with_backoff(messages=example_three_messages, model="gpt-4-1106-preview", temperature=0, max_length=1480)
print(example_three_response[0])

<THOUGHT>
The pattern consists of three tokens: a conjugatable form of "be", a set of adverbs, and a set of adjectives. The adverbs and adjectives are optional and can be any one of the listed words, so they will be represented with the regexp attribute. The correction suggests removing the second token, which is the adverb, to make the sentence more concise. The message will be adapted from the explanation, converting HTML to markdown if necessary. The example will show the correction applied to the test sentence.
</THOUGHT>

<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30122">
    <pattern>
        <token inflected="yes">be</token>
        <token regexp="yes">fairly|quite|rather|somewhat</token>
        <token regexp="yes">afraid|available|clear|difficult|easy|essential|good|important|likely|necessary|possible|ready|similar|sure|true|wrong</token>
    </pattern>
    <message>Would cutting this implied modifier help strengthen the sentence?</message>
    <suggestion><match no

Expected output:
```
<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30122">
        <pattern>
                <token inflected="yes">be</token>
                <token regexp="yes">fairly|quite|rather|somewhat</token>
                <token regexp="yes">afraid|available|clear|difficult|easy|essential|good|important|likely|necessary|possible|ready|similar|sure|true|wrong</token>
        </pattern>
        <message>Would cutting this implied modifier help strengthen the sentence?</message>
        <suggestion><match no="1"/> <match no="3"/></suggestion>
        <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":1,"priority":"3.249","WORD":true,"OUTLOOK":true}</short>
        <example correction="is easy">It <marker>is quite easy</marker> to rewrite an article.</example>
</rule>
```
---

In [16]:
example_four = """Ad Hoc:
really ( V.*? !did !do !have !know !think !want !wanted )
Rule Number:
30123
Correction:
$1 
Category:
Conciseness
Explanation:
Would cutting this implied modifier help strengthen the sentence?
Test Sentence:
They may also wonder whether these two people really exist. 
Corrected Test Sentence:
They may also wonder whether these two people exist.

XML Rule:"""

In [17]:
relevant_examples = search_pinecone_index(
    index_name=index_name,
    namespace=namespace,
    search_param=example_four.split("\n")[1],
    num_results=6,
    threshold=0.5
)
relevant_examples

[{'expected_output': '<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30123">\n    <pattern>\n            <token>really</token>\n            <token postag="V.*" postag_regexp="yes">\n                    <exception regexp="yes">did|do|have|know|think|want|wanted</exception>\n            </token>\n    </pattern>\n    <message>Would cutting this implied modifier help strengthen the sentence?</message>\n    <suggestion><match no="2"/></suggestion>\n    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":1,"priority":"2.159","WORD":true,"OUTLOOK":true}</short>\n    <example correction="exist">They may also wonder whether these two people <marker>really exist</marker>.</example>\n</rule>',
  'full_input': 'Ad Hoc:\nreally ( V.*? !did !do !have !know !think !want !wanted )\nRule Number:\n30123\nCorrection:\n$1 \nCategory:\nConciseness\nExplanation:\nWould cutting this implied modifier help strengthen the sentence?\nTest Sentence:\nThey may al

In [18]:
formatted_examples_string = "\n\n###\n\n".join(f"{item['full_input']}\n\n{item['expected_output']}" for item in relevant_examples)
print(formatted_examples_string)

Ad Hoc:
really ( V.*? !did !do !have !know !think !want !wanted )
Rule Number:
30123
Correction:
$1 
Category:
Conciseness
Explanation:
Would cutting this implied modifier help strengthen the sentence?
Test Sentence:
They may also wonder whether these two people really exist. 
Corrected Test Sentence:
They may also wonder whether these two people exist.

XML Rule:

<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30123">
    <pattern>
            <token>really</token>
            <token postag="V.*" postag_regexp="yes">
                    <exception regexp="yes">did|do|have|know|think|want|wanted</exception>
            </token>
    </pattern>
    <message>Would cutting this implied modifier help strengthen the sentence?</message>
    <suggestion><match no="2"/></suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":1,"priority":"2.159","WORD":true,"OUTLOOK":true}</short>
    <example correction="exist">They may also wond

In [19]:
example_four_messages = generate_simple_message(SYSTEM_PROMPT.replace("{{dynamic_examples}}", formatted_examples_string), example_four)
example_four_response = call_gpt_with_backoff(messages=example_four_messages, model="gpt-4-1106-preview", temperature=0, max_length=1480)
print(example_four_response[0])

<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30123">
    <pattern>
        <token>really</token>
        <token postag="V.*" postag_regexp="yes">
            <exception regexp="yes">did|do|have|know|think|want|wanted</exception>
        </token>
    </pattern>
    <message>Would cutting this implied modifier help strengthen the sentence?</message>
    <suggestion><match no="2"/></suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":1,"priority":"2.159","WORD":true,"OUTLOOK":true}</short>
    <example correction="exist">They may also wonder whether these two people <marker>really exist</marker>.</example>
</rule>


Expected output:
```
<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30123">
        <pattern>
                <token>really</token>
                <token postag="V.*" postag_regexp="yes">
                        <exception regexp="yes">did|do|have|know|think|want|wanted</exception>
                </token>
        </pattern>
        <message>Would cutting this implied modifier help strengthen the sentence?</message>
        <suggestion><match no="2"/></suggestion>
        <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":1,"priority":"2.159","WORD":true,"OUTLOOK":true}</short>
        <example correction="exist">They may also wonder whether these two people <marker>really exist</marker>.</example>
</rule>
```

---

In [20]:
example_five = """Ad Hoc:
(CT(be) and but they i he they have ) not ( generally typically usually ) ( RX(.*?) !accepted !considered !known )
Rule Number:
30124
Correction:
$0 rarely $3 @ $0 seldom $3 
Category:
Conciseness
Explanation:
Would using fewer words help tighten the sentence?
Test Sentence:
They are not generally definite articles.
Corrected Test Sentence:
They are rarely definite articles.

XML Rule:"""

In [21]:
relevant_examples = search_pinecone_index(
    index_name=index_name,
    namespace=namespace,
    search_param=example_five.split("\n")[1],
    num_results=6,
    threshold=0.5
)
relevant_examples

[{'expected_output': '<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30124">                                        \n        <pattern>                                \n                <or>                        \n                        <token inflected="yes">be</token>                \n                        <token regexp="yes">and|but|they|i|he|they|have</token>                \n                </or>                        \n                <token>not</token>                        \n                <token regexp="yes">generally|typically|usually</token>                        \n                <token>                        \n                        <exception regexp="yes">accepted|considered|known</exception>                \n                </token>                        \n        </pattern>                                \n        <message>Would using fewer words help tighten the sentence?</message>                                \n        <suggestion><match no="1"/> r

In [22]:
formatted_examples_string = "\n\n###\n\n".join(f"{item['full_input']}\n\n{item['expected_output']}" for item in relevant_examples)
print(formatted_examples_string)

Ad Hoc:
(CT(be) and but they i he they have ) not ( generally typically usually ) ( RX(.*?) !accepted !considered !known )
Rule Number:
30124
Correction:
$0 rarely $3 @ $0 seldom $3 
Category:
Conciseness
Explanation:
Would using fewer words help tighten the sentence?
Test Sentence:
They are not generally definite articles.
Corrected Test Sentence:
They are rarely definite articles.

XML Rule:

<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30124">                                        
        <pattern>                                
                <or>                        
                        <token inflected="yes">be</token>                
                        <token regexp="yes">and|but|they|i|he|they|have</token>                
                </or>                        
                <token>not</token>                        
                <token regexp="yes">generally|typically|usually</token>                        
                <token>           

In [23]:
example_five_messages = generate_simple_message(SYSTEM_PROMPT.replace("{{dynamic_examples}}", formatted_examples_string), example_five)
example_five_response = call_gpt_with_backoff(messages=example_five_messages, model="gpt-4-1106-preview", temperature=0, max_length=1480)
print(example_five_response[0])

<THOUGHT>
The pattern consists of a group of words that includes a conjugated form of "be" or any of the words "and", "but", "they", "i", "he", "they", "have". This group is followed by the word "not" and then by any of the adverbs "generally", "typically", "usually". The last token is any word except "accepted", "considered", "known". The correction suggests replacing "not" followed by the adverbs with "rarely" or "seldom", keeping the first and last tokens intact. The explanation is about conciseness, which is reflected in the message. The test sentence and corrected test sentence show the rule in action.
</THOUGHT>
<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30124">
    <pattern>
        <or>
            <token inflected="yes">be</token>
            <token regexp="yes">and|but|they|i|he|they|have</token>
        </or>
        <token>not</token>
        <token regexp="yes">generally|typically|usually</token>
        <token>
            <exception regexp="yes">accepted|consi

Expected output: 
```
<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30124">                                        
        <pattern>                                
                <or>                        
                        <token inflected="yes">be</token>                
                        <token regexp="yes">and|but|they|i|he|they|have</token>                
                </or>                        
                <token>not</token>                        
                <token regexp="yes">generally|typically|usually</token>                        
                <token>                        
                        <exception regexp="yes">accepted|considered|known</exception>                
                </token>                        
        </pattern>                                
        <message>Would using fewer words help tighten the sentence?</message>                                
        <suggestion><match no="1"/> rarely <match no="4"/></suggestion>                                
        <suggestion><match no="1"/> seldom <match no="4"/></suggestion>                                
        <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":2,"priority":"4.174","WORD":true,"OUTLOOK":true}</short>                                
        <example correction="are rarely definite|are seldom definite">They <marker>are not generally definite</marker> articles.</example>                                
</rule>
```
---

In [24]:
example_six = """Ad Hoc:
CT(do) not ( generally typically usually ) ( VB !give !include !take )
Rule Number:
30125
Correction:
rarely $3-$0 @ seldom $3-$0 
Category:
Conciseness
Explanation:
Would using fewer words help tighten the sentence?
Test Sentence:
They do not generally required a definite article.
Corrected Test Sentence:
They rarely require a definite article.

XML Rule:"""

In [25]:
relevant_examples = search_pinecone_index(
    index_name=index_name,
    namespace=namespace,
    search_param=example_six.split("\n")[1],
    num_results=6,
    threshold=0.5
)
relevant_examples

[{'expected_output': '<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30125">\n    <pattern>\n        <marker>\n            <token inflected="yes">do</token>\n            <token>not</token>\n            <token regexp="yes">generally|typically|usually</token>\n            <token postag="VB" postag_regexp="yes">\n                <exception regexp="yes">give|include|take</exception>\n            </token>\n        </marker>\n    </pattern>\n    <message>Would using fewer words help tighten the sentence?</message>\n    <suggestion>rarely <match no="4"/></suggestion>\n    <suggestion>seldom <match no="4"/></suggestion>\n    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":2,"priority":"5.262","WORD":true,"OUTLOOK":true}</short>\n    <example correction="They rarely require|They seldom require">They <marker>do not generally require</marker> a definite article.</example>\n</rule>',
  'full_input': 'Ad Hoc:\nCT(do) not ( generally typically 

In [26]:
formatted_examples_string = "\n\n###\n\n".join(f"{item['full_input']}\n\n{item['expected_output']}" for item in relevant_examples)
print(formatted_examples_string)

Ad Hoc:
CT(do) not ( generally typically usually ) ( VB !give !include !take )
Rule Number:
30125
Correction:
rarely $3-$0 @ seldom $3-$0 
Category:
Conciseness
Explanation:
Would using fewer words help tighten the sentence?
Test Sentence:
They do not generally required a definite article.
Corrected Test Sentence:
They rarely require a definite article.

XML Rule:

<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30125">
    <pattern>
        <marker>
            <token inflected="yes">do</token>
            <token>not</token>
            <token regexp="yes">generally|typically|usually</token>
            <token postag="VB" postag_regexp="yes">
                <exception regexp="yes">give|include|take</exception>
            </token>
        </marker>
    </pattern>
    <message>Would using fewer words help tighten the sentence?</message>
    <suggestion>rarely <match no="4"/></suggestion>
    <suggestion>seldom <match no="4"/></suggestion>
    <short>{"ruleGroup":null,"ruleGroupI

In [27]:
example_six_messages = generate_simple_message(SYSTEM_PROMPT.replace("{{dynamic_examples}}", formatted_examples_string), example_six)
example_six_response = call_gpt_with_backoff(messages=example_six_messages, model="gpt-4-1106-preview", temperature=0, max_length=1480)
print(example_six_response[0])

<THOUGHT>
The ad hoc rule specifies a pattern where the verb "do" can be in any conjugated form (CT), followed by "not", and then one of the adverbs "generally", "typically", or "usually". The next token is a verb (VB) that should not be "give", "include", or "take". The correction suggests replacing the phrase with "rarely" or "seldom" followed by the verb. The category is "Conciseness", and the explanation is about using fewer words. The test sentence and corrected test sentence show how the rule should be applied. The XML rule will reflect this pattern and provide two suggestions for correction.
</THOUGHT>

<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30125">
    <pattern>
        <marker>
            <token inflected="yes">do</token>
            <token>not</token>
            <token regexp="yes">generally|typically|usually</token>
            <token postag="VB" postag_regexp="yes">
                <exception regexp="yes">give|include|take</exception>
            </token>
 

Expected output:
```
<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30125">
    <pattern>
        <marker>
            <token inflected="yes">do</token>
            <token>not</token>
            <token regexp="yes">generally|typically|usually</token>
            <token postag="VB" postag_regexp="yes">
                <exception regexp="yes">give|include|take</exception>
            </token>
        </marker>
    </pattern>
    <message>Would using fewer words help tighten the sentence?</message>
    <suggestion>rarely <match no="4"/></suggestion>
    <suggestion>seldom <match no="4"/></suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":2,"priority":"5.262","WORD":true,"OUTLOOK":true}</short>
    <example correction="They rarely require|They seldom require">They <marker>do not generally require</marker> a definite article.</example>
</rule>
```
---

In [28]:
example_seven = """Ad Hoc:
( RX(.*?) !closed !him !prohibited !time !times !used ) except when ( he i it otherwise the there they we you )
Rule Number:
30132
Correction:
$0 unless $3
Category:
Fresh Language
Explanation:
Would direct language such as <i>unless</i> convey your point just as effectively?<linebreak/><linebreak/><b>Example</b> from Justice Sotomayor: “[I]t contends that no aged-out child may retain her priority date <b>unless</b> her petition is also eligible for automatic conversion.”<linebreak/><linebreak/><b>Example</b> from Office of Legal Counsel: “The 2019 Opinion reasoned that Congress lacks constitutional authority to compel the Executive Branch . . . even when a statute vests the committee with a right to the information, <b>unless</b> the information would serve a legitimate legislative purpose.”<linebreak/><linebreak/><b>Example</b> from Morgan Chu: “During this arbitration, [Defendant] stopped paying royalties and refused to pay anything <b>unless</b> ordered to do so.”<linebreak/><linebreak/><b>Example</b> from Paul Clement: “The bottom line is that there is no preemption <b>unless</b> state law conflicts with some identifiable federal statute.”<linebreak/><linebreak/><b>Example</b> from Andy Pincus: “The law does not permit a claim for defamation <b>unless</b> the allegedly false statement has caused actual harm.”<linebreak/><linebreak/><b>Example</b> from Microsoft’s Standard Contract: “Licenses granted on a subscription basis expire at the end of the applicable subscription period set forth in the Order, <b>unless</b> renewed.”
Test Sentence:
Omit except when it is part of a name.
Corrected Test Sentence:
Omit unless it is part of a name.

XML Rule:"""

In [29]:
relevant_examples = search_pinecone_index(
    index_name=index_name,
    namespace=namespace,
    search_param=example_seven.split("\n")[1],
    num_results=6,
    threshold=0.5
)
relevant_examples

[{'expected_output': '<rule id="{new_rule_id}" name="BRIEFCATCH_DIRECT_LANGUAGE_30132">\n        <pattern>\n                <token>\n                        <exception regexp="yes">closed|him|prohibited|time|times|used</exception>\n                </token>\n                <token>except</token>\n                <token>when</token>\n                <token regexp="yes">he|i|it|otherwise|the|there|they|we|you</token>\n        </pattern>\n        <message>Would direct language such as *unless* convey your point just as effectively?|**Example** from Justice Sotomayor: “[I]t contends that no aged-out child may retain her priority date **unless** her petition is also eligible for automatic conversion.”|**Example** from Office of Legal Counsel: “The 2019 Opinion reasoned that Congress lacks constitutional authority to compel the Executive Branch . . . even when a statute vests the committee with a right to the information, **unless** the information would serve a legitimate legislative purpose

In [30]:
formatted_examples_string = "\n\n###\n\n".join(f"{item['full_input']}\n\n{item['expected_output']}" for item in relevant_examples)
print(formatted_examples_string)

Ad Hoc:
( RX(.*?) !closed !him !prohibited !time !times !used ) except when ( he i it otherwise the there they we you )
Rule Number:
30132
Correction:
$0 unless $3
Category:
Fresh Language
Explanation:
Would direct language such as <i>unless</i> convey your point just as effectively?<linebreak/><linebreak/><b>Example</b> from Justice Sotomayor: “[I]t contends that no aged-out child may retain her priority date <b>unless</b> her petition is also eligible for automatic conversion.”<linebreak/><linebreak/><b>Example</b> from Office of Legal Counsel: “The 2019 Opinion reasoned that Congress lacks constitutional authority to compel the Executive Branch . . . even when a statute vests the committee with a right to the information, <b>unless</b> the information would serve a legitimate legislative purpose.”<linebreak/><linebreak/><b>Example</b> from Morgan Chu: “During this arbitration, [Defendant] stopped paying royalties and refused to pay anything <b>unless</b> ordered to do so.”<linebreak

In [31]:
example_seven_messages = generate_simple_message(SYSTEM_PROMPT.replace("{{dynamic_examples}}", formatted_examples_string), example_seven)
example_seven_response = call_gpt_with_backoff(messages=example_seven_messages, model="gpt-4-1106-preview", temperature=0, max_length=1480)
print(example_seven_response[0])

<rule id="{new_rule_id}" name="BRIEFCATCH_DIRECT_LANGUAGE_30132">
  <pattern>
    <token>
      <exception regexp="yes">closed|him|prohibited|time|times|used</exception>
    </token>
    <token>except</token>
    <token>when</token>
    <token regexp="yes">he|i|it|otherwise|the|there|they|we|you</token>
  </pattern>
  <message>Would direct language such as **unless** convey your point just as effectively?\n\n**Example** from Justice Sotomayor: “[I]t contends that no aged-out child may retain her priority date **unless** her petition is also eligible for automatic conversion.”\n\n**Example** from Office of Legal Counsel: “The 2019 Opinion reasoned that Congress lacks constitutional authority to compel the Executive Branch . . . even when a statute vests the committee with a right to the information, **unless** the information would serve a legitimate legislative purpose.”\n\n**Example** from Morgan Chu: “During this arbitration, [Defendant] stopped paying royalties and refused to pay an

Expected output:
```
<rule id="{new_rule_id}" name="BRIEFCATCH_DIRECT_LANGUAGE_30132">
        <pattern>
                <token>
                        <exception regexp="yes">closed|him|prohibited|time|times|used</exception>
                </token>
                <token>except</token>
                <token>when</token>
                <token regexp="yes">he|i|it|otherwise|the|there|they|we|you</token>
        </pattern>
        <message>Would direct language such as *unless* convey your point just as effectively?|**Example** from Justice Sotomayor: “[I]t contends that no aged-out child may retain her priority date **unless** her petition is also eligible for automatic conversion.”|**Example** from Office of Legal Counsel: “The 2019 Opinion reasoned that Congress lacks constitutional authority to compel the Executive Branch . . . even when a statute vests the committee with a right to the information, **unless** the information would serve a legitimate legislative purpose.”|**Example** from Morgan Chu: “During this arbitration, [Defendant] stopped paying royalties and refused to pay anything **unless** ordered to do so.”|**Example** from Paul Clement: “The bottom line is that there is no preemption **unless** state law conflicts with some identifiable federal statute.”|**Example** from Andy Pincus: “The law does not permit a claim for defamation **unless** the allegedly false statement has caused actual harm.”|**Example** from Microsoft's Standard Contract: “Licenses granted on a subscription basis expire at the end of the applicable subscription period set forth in the Order, **unless** renewed.”</message>
        <suggestion><match no="1"/> unless <match no="4"/></suggestion>
        <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":1,"priority":"4.225","WORD":true,"OUTLOOK":true}</short>
        <example correction="Omit unless it"><marker>Omit except when it</marker> is part of a name.</example>
</rule>
```
---

In [32]:
example_eight = """Ad Hoc:
SENT_START in that case , ( however though ) , ( i he if in it she there this )
Rule Number:
30136
Correction:
But $7 @ Then $6 $7 @ But then $7
Category:
Flow
Explanation:
Could shortening your opening transition add punch and help lighten the style?<linebreak/><linebreak/><b>Example</b> from Chief Justice Roberts: “<b>But</b> that argument . . . confuses mootness with the merits.”
Test Sentence:
In that case, however, this subtitle should tell you.
Corrected Test Sentence:
But this subtitle should tell you.

XML Rule:"""

In [33]:
relevant_examples = search_pinecone_index(
    index_name=index_name,
    namespace=namespace,
    search_param=example_eight.split("\n")[1],
    num_results=6,
    threshold=0.5
)
relevant_examples

[{'expected_output': '<rule id="BRIEFCATCH_145346392105646606287940325719406917958" name="BRIEFCATCH_FLOW_30136">\n    <pattern>\n        <token postag="SENT_START"/>\n        <marker>\n            <token>in</token>\n            <token>that</token>\n            <token>case</token>\n            <token>,</token>\n            <token regexp="yes">however|though</token>\n            <token>,</token>\n            <token regexp="yes">he|i|if|in|it|she|there|this</token>\n        </marker>\n    </pattern>\n    <message>Could shortening your opening transition add punch and help lighten the style?|**Example** from Chief Justice Roberts: “**But** that argument . . . confuses mootness with the merits.”</message>\n    <suggestion>But <match no="8"/></suggestion>\n    <suggestion>Then<match no="7"/> <match no="8"/></suggestion>\n    <suggestion>But then <match no="8"/></suggestion>\n    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":3,"priority":"8.

In [34]:
formatted_examples_string = "\n\n###\n\n".join(f"{item['full_input']}\n\n{item['expected_output']}" for item in relevant_examples)
print(formatted_examples_string)

Ad Hoc:
SENT_START in that case , ( however though ) , ( i he if in it she there this )
Rule Number:
30136
Correction:
But $7 @ Then $6 $7 @ But then $7
Category:
Flow
Explanation:
Could shortening your opening transition add punch and help lighten the style?<linebreak/><linebreak/><b>Example</b> from Chief Justice Roberts: “<b>But</b> that argument . . . confuses mootness with the merits.”
Test Sentence:
In that case, however, this subtitle should tell you.
Corrected Test Sentence:
But this subtitle should tell you.

XML Rule:

<rule id="BRIEFCATCH_145346392105646606287940325719406917958" name="BRIEFCATCH_FLOW_30136">
    <pattern>
        <token postag="SENT_START"/>
        <marker>
            <token>in</token>
            <token>that</token>
            <token>case</token>
            <token>,</token>
            <token regexp="yes">however|though</token>
            <token>,</token>
            <token regexp="yes">he|i|if|in|it|she|there|this</token>
        </marker>
    </patte

In [35]:
example_eight_messages = generate_simple_message(SYSTEM_PROMPT.replace("{{dynamic_examples}}", formatted_examples_string), example_eight)
example_eight_response = call_gpt_with_backoff(messages=example_eight_messages, model="gpt-4-1106-preview", temperature=0, max_length=1480)
print(example_eight_response[0])

<THOUGHT>The rule starts with SENT_START which means the pattern should begin with a token that has the postag="SENT_START". Following this, the pattern includes a sequence of tokens that are part of the phrase "in that case," followed by a choice between "however" and "though," and then another choice between pronouns and the word "if." Since "however" and "though" are in the same parentheses, they will be combined into a single token with regexp="yes". The same applies to the pronouns and "if." The correction offers three alternatives, which means we will have three suggestion tags. The explanation contains HTML tags that need to be converted to markdown for the message tag. The example will be included with the correction options separated by a pipe symbol "|".</THOUGHT>

<rule id="{new_rule_id}" name="BRIEFCATCH_FLOW_30136">
    <pattern>
        <token postag="SENT_START"/>
        <marker>
            <token>in</token>
            <token>that</token>
            <token>case</toke

Expected output:
```
<rule id="{new_rule_id}" name="BRIEFCATCH_FLOW_30136">
        <pattern>
                <token postag="SENT_START"/>
                <marker>
                        <token>in</token>
                        <token>that</token>
                        <token>case</token>
                        <token>,</token>
                        <token regexp="yes">however|though</token>
                        <token>,</token>
                        <token regexp="yes">he|i|if|in|it|she|there|this</token>
                </marker>
        </pattern>
        <message>Could shortening your opening transition add punch and help lighten the style?|**Example** from Chief Justice Roberts: “**But** that argument . . . confuses mootness with the merits.”</message>
        <suggestion>But <match no="8"/></suggestion>
        <suggestion>Then<match no="7"/> <match no="8"/></suggestion>
        <suggestion>But then <match no="8"/></suggestion>
        <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":3,"priority":"8.252","WORD":true,"OUTLOOK":true}</short>
        <example correction="But this|Then, this|But then this"><marker>In that case, however, this</marker> subtitle should tell you.</example>
</rule>
```
---

In [36]:
example_nine = """Ad Hoc:
( RX(.*?) !for !in !on !that !through !to !with ) the use of the ( RX(.*?) !band !land !phrase !verb !word !words )
Rule Number:
30156
Correction:
$0 using $4 $5
Category:
Conciseness
Explanation:
Would using fewer words and cutting the <i>of</i> phrase help tighten the sentence?
Test Sentence:
But the use of the dictionary is wrong.
Corrected Test Sentence:
But using the dictionary is wrong.

XML Rule:"""

In [37]:
relevant_examples = search_pinecone_index(
    index_name=index_name,
    namespace=namespace,
    search_param=example_nine.split("\n")[1],
    num_results=6,
    threshold=0.5
)
relevant_examples

[{'expected_output': '<rule id="{new_rule_id}" name="BRIEFCATCH_PUNCHINESS_30156">\n    <pattern>\n            <token>\n                    <exception regexp="yes">for|in|on|that|through|to|with</exception>\n            </token>\n            <token>the</token>\n            <token>use</token>\n            <token>of</token>\n            <token>the</token>\n            <token>\n                    <exception regexp="yes">band|land|phrase|verb|word|words</exception>\n            </token>\n    </pattern>\n    <message>Would using fewer words and cutting the *of* phrase help tighten the sentence?</message>\n    <suggestion><match no="1"/> using <match no="5"/> <match no="6"/></suggestion>\n    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":1,"priority":"6.286","WORD":true,"OUTLOOK":true}</short>\n    <example correction="But using the dictionary"><marker>But the use of the dictionary</marker> is wrong.</example>\n</rule>',
  'full_input': 'Ad

In [38]:
formatted_examples_string = "\n\n###\n\n".join(f"{item['full_input']}\n\n{item['expected_output']}" for item in relevant_examples)
print(formatted_examples_string)

Ad Hoc:
( RX(.*?) !for !in !on !that !through !to !with ) the use of the ( RX(.*?) !band !land !phrase !verb !word !words )
Rule Number:
30156
Correction:
$0 using $4 $5
Category:
Conciseness
Explanation:
Would using fewer words and cutting the <i>of</i> phrase help tighten the sentence?
Test Sentence:
But the use of the dictionary is wrong.
Corrected Test Sentence:
But using the dictionary is wrong.

XML Rule:

<rule id="{new_rule_id}" name="BRIEFCATCH_PUNCHINESS_30156">
    <pattern>
            <token>
                    <exception regexp="yes">for|in|on|that|through|to|with</exception>
            </token>
            <token>the</token>
            <token>use</token>
            <token>of</token>
            <token>the</token>
            <token>
                    <exception regexp="yes">band|land|phrase|verb|word|words</exception>
            </token>
    </pattern>
    <message>Would using fewer words and cutting the *of* phrase help tighten the sentence?</message>
    <sugges

In [39]:
example_nine_messages = generate_simple_message(SYSTEM_PROMPT.replace("{{dynamic_examples}}", formatted_examples_string), example_nine)
example_nine_response = call_gpt_with_backoff(messages=example_nine_messages, model="gpt-4-1106-preview", temperature=0, max_length=1480)
print(example_nine_response[0])

<THOUGHT>The ad hoc rule provided requires the creation of a pattern that matches a phrase starting with any word except the listed prepositions, followed by "the use of the" and another word that is not in the list of exceptions like "band", "land", etc. The correction suggests replacing "the use of the" with "using". The explanation provided needs to be converted to markdown for the message tag. The test sentence and corrected test sentence provide a clear example of how the rule should be applied. Now, I will construct the XML rule accordingly.</THOUGHT>

<rule id="{new_rule_id}" name="BRIEFCATCH_PUNCHINESS_30156">
    <pattern>
        <token>
            <exception regexp="yes">for|in|on|that|through|to|with</exception>
        </token>
        <token>the</token>
        <token>use</token>
        <token>of</token>
        <token>the</token>
        <token>
            <exception regexp="yes">band|land|phrase|verb|word|words</exception>
        </token>
    </pattern>
    <message

Expected output:
```
<rule id="{new_rule_id}" name="BRIEFCATCH_PUNCHINESS_30156">
        <pattern>
                <token>
                        <exception regexp="yes">for|in|on|that|through|to|with</exception>
                </token>
                <token>the</token>
                <token>use</token>
                <token>of</token>
                <token>the</token>
                <token>
                        <exception regexp="yes">band|land|phrase|verb|word|words</exception>
                </token>
        </pattern>
        <message>Would using fewer words and cutting the *of* phrase help tighten the sentence?</message>
        <suggestion><match no="1"/> using <match no="5"/> <match no="6"/></suggestion>
        <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":1,"priority":"6.286","WORD":true,"OUTLOOK":true}</short>
        <example correction="But using the dictionary"><marker>But the use of the dictionary</marker> is wrong.</example>
</rule>
```
---

In [40]:
example_ten = """Ad Hoc:
( CT(be) and ) a bit ( JJ.*? more !much !of )
Rule Number:
30115
Correction:
$0 $3
Category:
Conciseness
Explanation:
Would cutting <i>a bit</i> help tighten the sentence?
Test Sentence:
The book does this and a bit more. 
Corrected Test Sentence:
The book does this and more. 

XML Rule:"""

In [41]:
relevant_examples = search_pinecone_index(
    index_name=index_name,
    namespace=namespace,
    search_param=example_ten.split("\n")[1],
    num_results=6,
    threshold=0.5
)
relevant_examples

[{'expected_output': '<rule id="BRIEFCATCH_164054315699492609263729987293589324728" name="BRIEFCATCH_CONCISENESS_30115">\n        <pattern>\n            <or>\n                <token inflected="yes">be</token>\n                <token>and</token>\n            </or>\n            <token>a</token>\n            <token>bit</token>\n            <token postag="JJ.*" postag_regexp="yes">\n                <exception regexp="yes">more|much|of</exception>\n            </token>\n        </pattern>\n        <message>Would cutting *a bit* help tighten the sentence?</message>\n        <suggestion><match no="1"/> <match no="4"/></suggestion>\n        <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":1,"priority":"4.174","WORD":true,"OUTLOOK":true}</short>\n        <example correction="and more.">The book does this <marker>and a bit more</marker>.</example>\n</rule>',
  'full_input': 'Ad Hoc:\n( CT(be) and ) a bit ( JJ.*? more !much !of )\nRule Number:\n3011

In [42]:
formatted_examples_string = "\n\n###\n\n".join(f"{item['full_input']}\n\n{item['expected_output']}" for item in relevant_examples)
print(formatted_examples_string)

Ad Hoc:
( CT(be) and ) a bit ( JJ.*? more !much !of )
Rule Number:
30115
Correction:
$0 $3
Category:
Conciseness
Explanation:
Would cutting <i>a bit</i> help tighten the sentence?
Test Sentence:
The book does this and a bit more. 
Corrected Test Sentence:
The book does this and more. 

XML Rule:

<rule id="BRIEFCATCH_164054315699492609263729987293589324728" name="BRIEFCATCH_CONCISENESS_30115">
        <pattern>
            <or>
                <token inflected="yes">be</token>
                <token>and</token>
            </or>
            <token>a</token>
            <token>bit</token>
            <token postag="JJ.*" postag_regexp="yes">
                <exception regexp="yes">more|much|of</exception>
            </token>
        </pattern>
        <message>Would cutting *a bit* help tighten the sentence?</message>
        <suggestion><match no="1"/> <match no="4"/></suggestion>
        <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":

In [43]:
example_ten_messages = generate_simple_message(SYSTEM_PROMPT.replace("{{dynamic_examples}}", formatted_examples_string), example_ten)
example_ten_response = call_gpt_with_backoff(messages=example_ten_messages, model="gpt-4-1106-preview", temperature=0, max_length=1480)
print(example_ten_response[0])

<THOUGHT>
The ad hoc rule starts with a conjugatable form of "be" (CT(be)) or the word "and", followed by the phrase "a bit" and then an adjective (JJ.*?), but not the words "more", "much", or "of". The correction suggests removing "a bit" from the sentence, which is represented by $0 $3, where $0 is the first token (CT(be) or "and") and $3 is the adjective. The explanation needs to be converted from HTML to markdown. The pattern will use an <or> tag for the first token and an exception tag for the adjective to exclude "more", "much", and "of".
</THOUGHT>

<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30115">
    <pattern>
        <or>
            <token inflected="yes">be</token>
            <token>and</token>
        </or>
        <token>a</token>
        <token>bit</token>
        <token postag="JJ.*" postag_regexp="yes">
            <exception regexp="yes">more|much|of</exception>
        </token>
    </pattern>
    <message>Would cutting *a bit* help tighten the sentence?<

Expected output:
```
<rule id="BRIEFCATCH_164054315699492609263729987293589324728" name="BRIEFCATCH_CONCISENESS_30115">
    <pattern>
        <or>
                <token inflected="yes">be</token>
                <token>and</token>
        </or>
        <token>a</token>
        <token>bit</token>
        <token postag="JJ.*" postag_regexp="yes">
                <exception regexp="yes">more|much|of</exception>
        </token>
    </pattern>
    <message>Would cutting *a bit* help tighten the sentence?</message>
    <suggestion><match no="1"/> <match no="4"/></suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":1,"priority":"4.174","WORD":true,"OUTLOOK":true}</short>
    <example correction="and more.">The book does this <marker>and a bit more</marker>.</example>
</rule>
```
----

In [44]:
example_eleven = """Ad Hoc:
a ( sudden ~ ) surprise move
Rule Number:
3240
Correction:
a surprise @ a move @ surprising @ unexpected
Category:
Fresh Language
Explanation:
<b>A surprise move</b> is a cliché. Could direct language convey your point just as effectively?
Test Sentence:
She made a sudden surprise move. 
Corrected Test Sentence:
She made a surprise.

XML Rule:"""

In [45]:
relevant_examples = search_pinecone_index(
    index_name=index_name,
    namespace=namespace,
    search_param=example_eleven.split("\n")[1],
    num_results=6,
    threshold=0.5
)
relevant_examples

[{'expected_output': '<rule id="BRIEFCATCH_4496626169111403644393793089759868674587" name="BRIEFCATCH_FRESH_LANGUAGE_3240">\n    <pattern>\n        <token>a</token>\n        <token min="0">sudden</token>\n        <token>surprise</token>\n        <token>move</token>\n    </pattern>\n    <message>*A surprise move* is a cliché. Could direct language convey your point just as effectively?</message>\n    <suggestion>a surprise</suggestion>\n    <suggestion>a move</suggestion>\n    <suggestion>surprising</suggestion>\n    <suggestion>unexpected</suggestion>\n    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":4,"priority":"5.0","WORD":true,"OUTLOOK":true}</short>\n    <example correction="a surprise|a move|surprising|unexpected">She made <marker>a sudden surprise move</marker>.</example>\n</rule>',
  'full_input': 'Ad Hoc:\na ( sudden ~ ) surprise move\nRule Number:\n3240\nCorrection:\na surprise @ a move @ surprising @ unexpected\nCategory:\n

In [46]:
formatted_examples_string = "\n\n###\n\n".join(f"{item['full_input']}\n\n{item['expected_output']}" for item in relevant_examples)
print(formatted_examples_string)

Ad Hoc:
a ( sudden ~ ) surprise move
Rule Number:
3240
Correction:
a surprise @ a move @ surprising @ unexpected
Category:
Fresh Language
Explanation:
<b>A surprise move</b> is a cliché. Could direct language convey your point just as effectively?
Test Sentence:
She made a sudden surprise move. 
Corrected Test Sentence:
She made a surprise.

XML Rule:

<rule id="BRIEFCATCH_4496626169111403644393793089759868674587" name="BRIEFCATCH_FRESH_LANGUAGE_3240">
    <pattern>
        <token>a</token>
        <token min="0">sudden</token>
        <token>surprise</token>
        <token>move</token>
    </pattern>
    <message>*A surprise move* is a cliché. Could direct language convey your point just as effectively?</message>
    <suggestion>a surprise</suggestion>
    <suggestion>a move</suggestion>
    <suggestion>surprising</suggestion>
    <suggestion>unexpected</suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":4,"priority":"5.0",

In [47]:
example_eleven_messages = generate_simple_message(SYSTEM_PROMPT.replace("{{dynamic_examples}}", formatted_examples_string), example_eleven)
example_eleven_response = call_gpt_with_backoff(messages=example_eleven_messages, model="gpt-4-1106-preview", temperature=0, max_length=1480)
print(example_eleven_response[0])

<THOUGHT>First, I'll create the pattern for the rule, which includes the word "a" followed by an optional "sudden", then "surprise" and "move". The "~" symbol after "sudden" indicates that it is optional, so I'll use the "min" attribute with a value of "0" for that token. Next, I'll translate the explanation into a message, converting the HTML bold tags to markdown asterisks. The correction section will include the alternatives separated by the "@" symbol. The example will show the incorrect usage with the marker tags around the phrase to be corrected and the corrected alternatives. Since "sudden" is optional, it will not appear in the corrected sentence if it's not part of the match. Lastly, I'll assign the rule ID and name based on the provided rule number and category.</THOUGHT>

<rule id="{new_rule_id}" name="BRIEFCATCH_FRESH_LANGUAGE_3240">
    <pattern>
        <token>a</token>
        <token min="0">sudden</token>
        <token>surprise</token>
        <token>move</token>
    <

Expected output:
```
<rule id="BRIEFCATCH_4496626169111403644393793089759868674587" name="BRIEFCATCH_FRESH_LANGUAGE_3240">
    <pattern>
        <token>a</token>
        <token min="0">sudden</token>
        <token>surprise</token>
        <token>move</token>
    </pattern>
    <message>*A surprise move* is a cliché. Could direct language convey your point just as effectively?</message>
    <suggestion>a surprise</suggestion>
    <suggestion>a move</suggestion>
    <suggestion>surprising</suggestion>
    <suggestion>unexpected</suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":4,"priority":"5.0","WORD":true,"OUTLOOK":true}</short>
    <example correction="a surprise|a move|surprising|unexpected">She made <marker>a sudden surprise move</marker>.</example>
</rule>
```
---

In [48]:
example_twelve = """Ad Hoc:
CT(do) not ( generally typically usually ) ( VB !give !include !take )
Rule Number:
30125
Correction:
rarely $3-$0 @ seldom $3-$0
Category:
Conciseness
Explanation:
Would using fewer words help tighten the sentence?
Test Sentence:
They do not generally required a definite article.
Corrected Test Sentence:
They rarely require a definite article.

XML Rule:"""

In [49]:
relevant_examples = search_pinecone_index(
    index_name=index_name,
    namespace=namespace,
    search_param=example_twelve.split("\n")[1],
    num_results=6,
    threshold=0.5
)
relevant_examples

[{'expected_output': '<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30125">\n    <pattern>\n        <marker>\n            <token inflected="yes">do</token>\n            <token>not</token>\n            <token regexp="yes">generally|typically|usually</token>\n            <token postag="VB" postag_regexp="yes">\n                <exception regexp="yes">give|include|take</exception>\n            </token>\n        </marker>\n    </pattern>\n    <message>Would using fewer words help tighten the sentence?</message>\n    <suggestion>rarely <match no="4"/></suggestion>\n    <suggestion>seldom <match no="4"/></suggestion>\n    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":2,"priority":"5.262","WORD":true,"OUTLOOK":true}</short>\n    <example correction="They rarely require|They seldom require">They <marker>do not generally require</marker> a definite article.</example>\n</rule>',
  'full_input': 'Ad Hoc:\nCT(do) not ( generally typically 

In [50]:
formatted_examples_string = "\n\n###\n\n".join(f"{item['full_input']}\n\n{item['expected_output']}" for item in relevant_examples)
print(formatted_examples_string)

Ad Hoc:
CT(do) not ( generally typically usually ) ( VB !give !include !take )
Rule Number:
30125
Correction:
rarely $3-$0 @ seldom $3-$0 
Category:
Conciseness
Explanation:
Would using fewer words help tighten the sentence?
Test Sentence:
They do not generally required a definite article.
Corrected Test Sentence:
They rarely require a definite article.

XML Rule:

<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30125">
    <pattern>
        <marker>
            <token inflected="yes">do</token>
            <token>not</token>
            <token regexp="yes">generally|typically|usually</token>
            <token postag="VB" postag_regexp="yes">
                <exception regexp="yes">give|include|take</exception>
            </token>
        </marker>
    </pattern>
    <message>Would using fewer words help tighten the sentence?</message>
    <suggestion>rarely <match no="4"/></suggestion>
    <suggestion>seldom <match no="4"/></suggestion>
    <short>{"ruleGroup":null,"ruleGroupI

In [51]:
example_twelve_messages = generate_simple_message(SYSTEM_PROMPT.replace("{{dynamic_examples}}", formatted_examples_string), example_twelve)
example_twelve_response = call_gpt_with_backoff(messages=example_twelve_messages, model="gpt-4-1106-preview", temperature=0, max_length=1480)
print(example_twelve_response[0])

<THOUGHT>
The ad hoc rule specifies a pattern where the conjugated form of "do" is followed by "not" and one of the adverbs "generally", "typically", or "usually". The next token is a verb that is not "give", "include", or "take". The correction suggests replacing the phrase with "rarely" or "seldom" followed by the verb. The explanation will be converted to a message in the XML. The test sentence and corrected test sentence provide context for the example in the XML. The exception tag will be used for the verbs "give", "include", and "take".
</THOUGHT>

<rule id="{new_rule_id}" name="BRIEFCATCH_CONCISENESS_30125">
    <pattern>
        <marker>
            <token inflected="yes">do</token>
            <token>not</token>
            <token regexp="yes">generally|typically|usually</token>
            <token postag="VB" postag_regexp="yes">
                <exception regexp="yes">give|include|take</exception>
            </token>
        </marker>
    </pattern>
    <message>Would using 

Expected output:
```
<rule id="BRIEFCATCH_263762808715424542820983160320978225970" name="BRIEFCATCH_CONCISENESS_30125">                                        
    <pattern>                                
        <token inflected="yes">do</token>                        
        <token>not</token>                        
        <token regexp="yes">generally|typically|usually</token>                        
        <token postag="VB">                        
                <exception regexp="yes">give|include|take</exception>                
        </token>                        
    </pattern>
    <filter class="org.languagetool.rules.en.AdvancedSynthesizerFilter" args="lemmaFrom:4 lemmaSelect:V.* postagFrom:1 postagSelect:V.*"/>                                
    <message>Would using fewer words help tighten the sentence?</message>                                
    <suggestion>rarely {suggestion}</suggestion>                                
    <suggestion>seldom {suggestion}</suggestion>                                
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":2,"priority":"4.174","WORD":true,"OUTLOOK":true}</short>                                
    <example correction="rarely require|seldom require">They <marker>do not generally require</marker> a definite article.</example>                                
</rule>
```
---

In [52]:
example_thirteen = """Ad Hoc:
SENT_START in that case , ( however though ) , ( i he if in it she there this )
Rule Number:
30136
Correction:
But $7 @ Then $6 $7 @ But then $7
Category:
Flow
Explanation:
Could shortening your opening transition add punch and help lighten the style?<linebreak/><linebreak/><b>Example</b> from Chief Justice Roberts: “<b>But</b> that argument . . . confuses mootness with the merits.”
Test Sentence:
In that case, however, this subtitle should tell you.
Corrected Test Sentence:
But this subtitle should tell you.

XML Rule:"""

In [53]:
relevant_examples = search_pinecone_index(
    index_name=index_name,
    namespace=namespace,
    search_param=example_thirteen.split("\n")[1],
    num_results=6,
    threshold=0.5
)
relevant_examples

[{'expected_output': '<rule id="BRIEFCATCH_145346392105646606287940325719406917958" name="BRIEFCATCH_FLOW_30136">\n    <pattern>\n        <token postag="SENT_START"/>\n        <marker>\n            <token>in</token>\n            <token>that</token>\n            <token>case</token>\n            <token>,</token>\n            <token regexp="yes">however|though</token>\n            <token>,</token>\n            <token regexp="yes">he|i|if|in|it|she|there|this</token>\n        </marker>\n    </pattern>\n    <message>Could shortening your opening transition add punch and help lighten the style?|**Example** from Chief Justice Roberts: “**But** that argument . . . confuses mootness with the merits.”</message>\n    <suggestion>But <match no="8"/></suggestion>\n    <suggestion>Then<match no="7"/> <match no="8"/></suggestion>\n    <suggestion>But then <match no="8"/></suggestion>\n    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":3,"priority":"8.

In [54]:
formatted_examples_string = "\n\n###\n\n".join(f"{item['full_input']}\n\n{item['expected_output']}" for item in relevant_examples)
print(formatted_examples_string)

Ad Hoc:
SENT_START in that case , ( however though ) , ( i he if in it she there this )
Rule Number:
30136
Correction:
But $7 @ Then $6 $7 @ But then $7
Category:
Flow
Explanation:
Could shortening your opening transition add punch and help lighten the style?<linebreak/><linebreak/><b>Example</b> from Chief Justice Roberts: “<b>But</b> that argument . . . confuses mootness with the merits.”
Test Sentence:
In that case, however, this subtitle should tell you.
Corrected Test Sentence:
But this subtitle should tell you.

XML Rule:

<rule id="BRIEFCATCH_145346392105646606287940325719406917958" name="BRIEFCATCH_FLOW_30136">
    <pattern>
        <token postag="SENT_START"/>
        <marker>
            <token>in</token>
            <token>that</token>
            <token>case</token>
            <token>,</token>
            <token regexp="yes">however|though</token>
            <token>,</token>
            <token regexp="yes">he|i|if|in|it|she|there|this</token>
        </marker>
    </patte

In [55]:
example_thirteen_messages = generate_simple_message(SYSTEM_PROMPT.replace("{{dynamic_examples}}", formatted_examples_string), example_thirteen)
example_thirteen_response = call_gpt_with_backoff(messages=example_thirteen_messages, model="gpt-4-1106-preview", temperature=0, max_length=1480)
print(example_thirteen_response[0])

<THOUGHT>First, I'll start by creating the rule ID and name using the provided rule number. Next, I'll construct the pattern. Since the ad hoc rule starts with SENT_START, I'll include a marker tag after the SENT_START token. The pattern will include a sequence of tokens that match the ad hoc rule, with the words "in that case," followed by a comma, and then a choice between "however" and "though" (using regexp), another comma, and finally a choice between several pronouns and the word "if" (also using regexp). The correction section will have three suggestions, corresponding to the three options provided, using match groups to refer to the correct tokens. The message will be adapted from the explanation, converting HTML to markdown. Lastly, I'll create an example that shows the correction in action.</THOUGHT>

<rule id="{new_rule_id}" name="BRIEFCATCH_FLOW_30136">
    <pattern>
        <token postag="SENT_START"/>
        <marker>
            <token>in</token>
            <token>that<

Expected output:
```
<rule id="BRIEFCATCH_145346392105646606287940325719406917958" name="BRIEFCATCH_FLOW_30136">
    <pattern>
        <token postag="SENT_START"/>
        <marker>
            <token>in</token>
            <token>that</token>
            <token>case</token>
            <token>,</token>
            <token regexp="yes">however|though</token>
            <token>,</token>
            <token regexp="yes">he|i|if|in|it|she|there|this</token>
        </marker>
    </pattern>
    <message>Could shortening your opening transition add punch and help lighten the style?|**Example** from Chief Justice Roberts: “**But** that argument . . . confuses mootness with the merits.”</message>
    <suggestion>But <match no="8"/></suggestion>
    <suggestion>Then<match no="7"/> <match no="8"/></suggestion>
    <suggestion>But then <match no="8"/></suggestion>
    <short>{"ruleGroup":null,"ruleGroupIdx":0,"isConsistency":false,"isStyle":true,"correctionCount":3,"priority":"8.252","WORD":true,"OUTLOOK":true}</short>
    <example correction="But this|Then, this|But then this"><marker>In that case, however, this</marker> subtitle should tell you.</example>
</rule>
```
---