In [1]:
system_prompt = """
# Task
You are a system focused on making sure the XML rule <pattern> and <antipattern> tags match with the <example> tags. 

# Pattern -> Example rules
(1)The <example> tags that correspond to the <pattern> incorporate the suggestion as a `correction` field and surround the part of the sentence that matches the pattern with <marker>...</marker> tags.
  (1.a) The <marker> tags **must** surround the full pattern of rule
(2) There **must** be an example for the pattern

# Antipattern -> Example rules
(1) The <example> tags that correspond to the <antipattern> do not contain `correction` fields or <marker>...</marker> tags, they just have an example sentence that includes a match for the <antipattern>
(2) A valid rule has only ONE <example> *per* <antipattern>. If a rule has three antipatterns, it needs three examples to be valid. The 1:1 ratio is crucial.


Here are some examples of how <pattern> and <antipattern> tags match <example> tags:


# Pattern Matching Examples
Pattern:
<pattern>
    <token inflected="yes">ascertain<exception>ascertaining</exception></token>
</pattern>
<suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">determine</match></suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">learn</match></suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">establish</match></suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">discover</match></suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">find</match> out</suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">figure</match> out</suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">decide</match></suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">arrive</match> at</suggestion>
    <suggestion><match no="1" postag="(V.*)" postag_regexp="yes" postag_replace="$1">learn</match> of</suggestion>

Matching example:
<example correction="determined|learned|learnt|established|discovered|found out|figured out|decided|arrived at|learned of|learnt of">She <marker>ascertained</marker> the item's whereabouts.</example>


# Antipattern Matching Examples
--------
Antipattern:
<antipattern>
    <token regexp="yes">can|could|shall|should</token>
    <token>ascertain</token>
</antipattern>

Matching example:
<example>We can ascertain their intent from the examples provided</example>
--------
Antipattern:
<antipattern>
    <token inflected="yes">ascertain<exception>ascertaining</exception></token>
    <token>the</token>
    <token>citizenship</token>
</antipattern>

Matching example:
<example>To ascertain the citizenship.</example>
--------


Given the input rule, following all instructions, write any examples that are missing that are needed to make it a valid rule. Think through what example tags will be needed to complete the rule. You should respond in JSON format. Specifically there should be 1. a string field `thought` that where you show your thoughts around how to write a valid example xml; and 2. an array field `suggestions` where each item is an example tag. 
"""

In [2]:
xml = """
<rule id="BRIEFCATCH_63718407441811696862639906069587787980" name="BRIEFCATCH_PUNCHINESS_673.2">
    <antipattern>
        <token min="0" regexp="yes">almost|virtually|following</token>
        <token>immediately</token>
        <token regexp="yes">after|thereafter</token>
        <token regexp="yes">\.|,|;|birth|his|its|the|their|world</token>
    </antipattern>
    <antipattern>
        <token>immediately</token>
        <token>after</token>
        <token postag="DT"/>
        <token regexp="yes">accident|acquisition|acute|attack|change|death|disaster|distribution|earthquake|election|end|event|exchange|fire|first|high|incident|initial|injury|last|major|meal|ownership|period|second|single|transfer|war</token>
    </antipattern>
    <pattern>
        <token min="0" regexp="yes">almost|virtually|following</token>
        <token>immediately</token>
        <token regexp="yes">after|thereafter<exception regexp="yes">&months;|&abbrevMonths;</exception><exception postag="CD"/></token>
    </pattern>
    <message>Would shorter words add punch?|**Example** from Justice Kagan: "The first step of the Government's argument derives from §7703(b)(2)'s second sentence. **Right after** stating that . . . ."|**Example** from Justice Breyer: "The case arose under . . . a statutory provision that Congress enacted **just after** the Civil War . . . to protect the rights of black citizens."</message>
    <suggestion>just after</suggestion>
    <suggestion>right after</suggestion>
    <short>{"ruleGroup":"BRIEFCATCH_PUNCHINESS_673","ruleGroupIdx":2,"isConsistency":false,"isStyle":true,"correctionCount":2,"priority":"3.321","WORD":true,"OUTLOOK":true}</short>
</rule>
"""

In [3]:
from dotenv import load_dotenv
import openai
import os

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

In [4]:
from utils.utils import call_gpt_with_backoff, generate_simple_message
import json

message = generate_simple_message(system_prompt, xml)
resp, usage = call_gpt_with_backoff(
    message,
    response_format="json_object",
    model="gpt-4-0125-preview",
)

  from tqdm.autonotebook import tqdm


In [5]:
try:
    suggestions = json.loads(resp)["suggestions"]
except json.JSONDecodeError:  # Specifically catch JSON errors
    print("bad json")

In [18]:
suggestion_section = "\n" + "\n".join(["    " + s for s in suggestions])

In [19]:
end_of_xml_ix = -(len("<rule/>") + 2)
xml_out = xml[:end_of_xml_ix] + suggestion_section + xml[end_of_xml_ix:]

In [20]:
print(xml_out)


<rule id="BRIEFCATCH_63718407441811696862639906069587787980" name="BRIEFCATCH_PUNCHINESS_673.2">
    <antipattern>
        <token min="0" regexp="yes">almost|virtually|following</token>
        <token>immediately</token>
        <token regexp="yes">after|thereafter</token>
        <token regexp="yes">\.|,|;|birth|his|its|the|their|world</token>
    </antipattern>
    <antipattern>
        <token>immediately</token>
        <token>after</token>
        <token postag="DT"/>
        <token regexp="yes">accident|acquisition|acute|attack|change|death|disaster|distribution|earthquake|election|end|event|exchange|fire|first|high|incident|initial|injury|last|major|meal|ownership|period|second|single|transfer|war</token>
    </antipattern>
    <pattern>
        <token min="0" regexp="yes">almost|virtually|following</token>
        <token>immediately</token>
        <token regexp="yes">after|thereafter<exception regexp="yes">&months;|&abbrevMonths;</exception><exception postag="CD"/></token>
   