# Extracting information from legal documents

We have a bunch of legal files, more precisely gift agreements that contain information about who gifted whom what. For each agreement we would like to extract the names of the donor and recipient as well as what was gifted. If you browse through the files, you will see that all of the information is there, but accomplishing this task by hand would be tedious. We will instead use ChatGPT to help us.

### Steps:

- Setup API keys in OpenAI and save it in an .env file
- Start simple prompt to work with
- Feed the legal documents
    - **Limitation**: If you are in a free version API, you can maximally send 3 requests per minute. Therfore we will use a 'sleep' comand in the code in this step.
- Extract results in pandas dataframe

In [1]:
import os
import openai
from dotenv import load_dotenv
_ = load_dotenv('chatgpt.env') 

openai.api_key  = os.getenv('OPENAI_API_KEY')

In [2]:
simple_prompt = 'How much is 2 + 2?'
response = openai.ChatCompletion.create(model='gpt-3.5-turbo',
                                        temperature=0, # degree of randomness of the output           
                                        messages=[{'role': 'user', 'content': simple_prompt}]
                                       )

In [3]:
print(response.choices[0].message.content)

2 + 2 equals 4.


Since our simple prompt is working fine we move forward with our original task. Let's first look at the contect of our legal gift documents 

In [4]:
with open('./documents/Gift_Agreement_2431.txt', 'r') as f:
    content = f.read()
    print(content)

Gift Agreement

This Gift Agreement ("Agreement") is entered into on this 20th day of September, 2022, ("Effective Date") between the following parties:

Donor:
Name: Michael Thompson
Address: 567 Pine Street, Cityville, USA
Email: michael.thompson@email.com

Recipient:
Name: Sarah Roberts
Address: 890 Oak Avenue, Cityville, USA
Email: sarah.roberts@email.com

    Gift Description:

The Donor, Michael Thompson, gifts and transfers to the Recipient, Sarah Roberts a Gift in the amount of $20,500 Cash as well as a Vintage armchair.

    Title and Ownership:

Michael Thompson represents that he is the legal owner of the Gift and has full authority to transfer ownership to Sarah Roberts.

    Consideration:

The Gift is given without any monetary or other valuable consideration by Sarah Roberts.

    Delivery:

Michael Thompson shall deliver the Gift to Sarah Roberts on or before the Effective Date, either physically or by any mutually agreed-upon method.

    Condition of Gift:

Michael Th

In [5]:
def generate_response(prompt):
    """
    Query OpenAI API to get response.
    """

    response = openai.ChatCompletion.create(model='gpt-3.5-turbo',
                                            temperature=0, # degree of randomness of output           
                                            messages=[{'role': 'user', 'content': prompt}]
                                           )
                                
    
    return response.choices[0].message.content


def generate_prompt(gift_contract_text):
    """
    Create prompt that gets sent to OpenAI API.
    """

    prompt = f'''Extract just the donor name, recipient name and the exact gift from 
                 the contract. Give the result as JSON with fields Donor, 
                 Recipient and Gift. If there are several gifts break it 
                 into a simple list of strings. {gift_contract_text}'''
    
    return prompt

We decided to ask ChatGPT to give us the results in the form of JSON so we could easily process the output further if we wanted to.

Let's try it on one contract.

In [6]:
output = generate_response(generate_prompt(content))

In [8]:
import json

In [12]:
import json
json.loads(output)

{'Donor': 'Michael Thompson',
 'Recipient': 'Sarah Roberts',
 'Gift': ['$20,500 Cash', 'Vintage armchair']}

Now let's implement it on all the documents that we have with us:

In [9]:
from glob import glob

files = glob('documents/*.txt')
import time


def process_agreement(filename):
    time.sleep(15)
    with open(filename, 'r') as f:
        contract_text = f.read()
    
    output = generate_response(generate_prompt(contract_text))
    
    return json.loads(output)


processed_agreements = []
for i in range(len(files)):
    print(f'Processing {i}-th file')
    processed_agreements.append(process_agreement(files[i]))

Processing 0-th file
Processing 1-th file
Processing 2-th file
Processing 3-th file


Let us display the result in a Pandas Dataframe

In [10]:
import pandas as pd

In [15]:
df = pd.DataFrame(processed_agreements)

In [16]:
df 

Unnamed: 0,Donor,Recipient,Gift
0,Michael Thompson,Sarah Roberts,"[$20,500 Cash, Vintage armchair]"
1,John Anderson,Sarah Johnson,"[Antique Gold Watch, Handcrafted Wooden Jewelr..."
2,Jane Kasich,Luis Kepler,"[Cash gift of $5,000 (USD)]"
3,Matthew Anderson,Jessica Miller,[residential property located at 789 Elm Avenu...
