# Verify Legal Properties using Chain-of-Thought

This notebook is used to verify whether a specification satisfies a property using Chain-of-Thought (CoT) and demonstrations. In-Context Learning performance has been shown to improve with both CoT and examples that demonstrate the completion. The notebook evaluates a CoT template, which allows an arbitrary number of examples to be included. Examples are sampled from the training dataset, and the results are collected using the testing dataset to ensure consistent comparison with other experiments that also use the testing dataset.

In [1]:
import json

# load the design properties
properties = json.load(open('results/properties.json', 'r'))

# load the ground truth data
training = json.load(open('results/training.json', 'r'))
testing = json.load(open('results/testing.json', 'r'))

print('Training Size %0.2f' % (len(training) / (len(training) + len(testing))))
print('Training: %i' % len(training))
print('Testing: %i' % len(testing))

Training Size 0.21
Training: 80
Testing: 304


In [2]:
from langchain.chat_models import ChatOpenAI
import os

model_names = {
    'gpt35': 'gpt-3.5-turbo-1106',
    'gpt4': 'gpt-4-0613',
    'gpt4p': 'gpt-4-1106-preview'
}

model_name = 'gpt35'

model = ChatOpenAI(
    openai_api_key = os.environ["OPENAI_API_KEY"],
    model_name=model_names[model_name]
)

In [3]:
training[0]

{'id': 'SCR-A086',
 'base-spec': 'The user opens the mobile app and discovers a wide range of activities centered around horses and adventure. They can explore the virtual world by caring for and grooming horses, participating in jousting and wild west role-play, riding motorcycles, camping, and even creating their own unique videos. In this scenario, the user decides to focus on grooming and caring for horses in the stable. The app utilizes their knowledge of horse care techniques, preferred horse breeds, and riding experience to provide an engaging and realistic experience.',
 'prop-actions': "1. The app requests the user to provide personal information such as their name, age, and email address before they can access the horse grooming and care activities. This initial request for personal information establishes a power imbalance as the data controller (the app) requires this information from the data subject (the user) in order to access the desired activities. The user may feel c

## Sample Examples for Demonstration

This function samples from the training dataset to identify examples that conform the expected requirement state. The input examples are restricted to a particular property, and the state indicates whether the examples should satisfy or not satisfy the property. The sampled examples are then packaged into a sub-template that is linearized and returned by the function.

In [4]:
def get_examples(examples, req_state):
    # build examples
    text = ''
    for example in examples:
        expected = ''
        if req_state == 'T':
            expected = prop_state_str[example['prop-state']]
        else:
            expected = inverse[prop_state_str[record['prop-state']]]
        
        text += 'Scenario: %s\nStatement: %s\n\n###\n\nRationale: %s\nAnswer: %s\n\n# END\n\n' % (
            example['prop-actions'],
            properties[example['prop-code']]['axiom'][req_state],
            example['rationale'],
            expected
        )
    return text

## Template 3: Chain-of-Thought

In [None]:
from langchain.prompts.chat import ChatPromptTemplate
from langchain.output_parsers.json import SimpleJsonOutputParser

inverse = {'True': 'False', 'False': 'True'}
prop_state_str = {'T': 'True', 'F': 'False'}

messages = [
    ('system', 'You are a helpful assistant.'),
    ('human', """Definition of {property}: {definition}

Read the following example scenarios and observe the rationale and answer about whether the statement is true or false. For the last scenario and statement, decide if the statement is true or false based on the definition, above. Respond by completing the Rationale and Answer using the same format. Do not elaborate.

{examples}Scenario: {prop-actions}
Statement: {requirement}

###

Rationale: """)

#Answer: """)
]

prompt1 = ChatPromptTemplate.from_messages(messages)
chain = prompt1 | model

example_count = 3
answers = []

for i in range(0, 10):
    for j, record in enumerate(testing):
        response = chain.invoke({
            'property': properties[record['prop-code']]['property'],
            'definition': properties[record['prop-code']]['rubric'],
            'base-spec': record['base-spec'],
            'prop-actions': record['prop-actions'],
            'requirement': properties[record['prop-code']]['axiom']['T'], # positive
            'examples': get_examples(training[:example_count], 'T')
        })
        answers.append([
            record['id'],
            i,
            record['prop-code'], # property code
            record['prop-state'], # spec polarity
            prop_state_str[record['prop-state']], # expected
            response.content # predicted
        ])

        response = chain.invoke({
            'property': properties[record['prop-code']]['property'],
            'definition': properties[record['prop-code']]['rubric'],
            'base-spec': record['base-spec'],
            'prop-actions': record['prop-actions'],
            'requirement': properties[record['prop-code']]['axiom']['F'], # positive
            'examples': get_examples(training[:example_count], 'F')
        })
        answers.append([
            record['id'],
            i,
            record['prop-code'], # property code
            record['prop-state'], # spec polarity
            inverse[prop_state_str[record['prop-state']]], # expected
            response.content # predicted
        ])
        print(j)
    print('### %i ###' % i)

In [6]:
json.dump(answers, open('results/answers_%s_cot_%ishot.json' % (model_name, example_count), 'w+'))