## Entity Extraction using GPT-3

This example shows how to extract domain-specific entities from structured text such as legal documents and contracts using zero-shot learning.

### Step 1: Initialize Open AI client using your Azure Open AI subscription

Make sure to fill out values for API_KEY and RESOURCE_ENDPOINT using your Azure Open AI subscription.

In [1]:
from dotenv import load_dotenv
load_dotenv('../../src/.env') 

True

In [3]:
import openai
import os

GPT_ENGINE = "text-curie-001" #"curie-instruct"

API_KEY = os.getenv("AZURE_OPENAI_KEY")  # SET YOUR OWN API KEY HERE
RESOURCE_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")  # SET A LINK TO YOUR RESOURCE ENDPOINT

openai.api_type = "azure"
openai.api_key = API_KEY
openai.api_base = RESOURCE_ENDPOINT
openai.api_version = "2022-06-01-preview"

### Step 2: Define entities to be extracted

For this example, we will extract details about maximum fines, illegal activivies and types of punishments.

In [4]:
entity_types = [
    "Maximum Fine",
    "Illegal Activity" "Punishment for Corporation",
    "Punishment for Person",
]

### Step 3: Define texts to be analyzed

In this example we copied two paragraphs from the Clayton Antitrust Act of 1914.

In [5]:
paragraph_1 = """
[Trusts, etc., in restraint of trade illegal; penalty]
Every contract, combination in the form of trust or otherwise, or conspiracy, in restraint of 
trade or commerce among the several States, or with foreign nations, is declared to be illegal. 
Every person who shall make any contract or engage in any combination or conspiracy 
hereby declared to be illegal shall be deemed guilty of a felony, and, on conviction thereof, 
shall be punished by fine not exceeding $100,000,000 if a corporation, or, if any other 
person, $1,000,000, or by imprisonment not exceeding 10 years, or by both said 
punishments, in the discretion of the court.
"""

paragraph_2 = """
[Monopolizing trade a felony; penalty]
Every person who shall monopolize, or attempt to monopolize, or combine or conspire with 
any other person or persons, to monopolize any part of the trade or commerce among the 
several States, or with foreign nations, shall be deemed guilty of a felony, and, on conviction 
thereof, shall be punished by fine not exceeding $100,000,000 if a corporation, or, if any 
other person, $1,000,000, or by imprisonment not exceeding 10 years, or by both said 
punishments, in the discretion of the court
"""

### Step 4: Create prompts

A Prompt is a string that will be passed to the Azure Open AI API so that it can identify the context of the task we are trying to solve.

In [6]:
prompt_1 = f"Recognize the following entities: {', '.join(entity_types)} in the following paragraph using key:value pairs\n{paragraph_1}"
print(prompt_1)

Recognize the following entities: Maximum Fine, Illegal ActivityPunishment for Corporation, Punishment for Person in the following paragraph using key:value pairs

[Trusts, etc., in restraint of trade illegal; penalty]
Every contract, combination in the form of trust or otherwise, or conspiracy, in restraint of 
trade or commerce among the several States, or with foreign nations, is declared to be illegal. 
Every person who shall make any contract or engage in any combination or conspiracy 
hereby declared to be illegal shall be deemed guilty of a felony, and, on conviction thereof, 
shall be punished by fine not exceeding $100,000,000 if a corporation, or, if any other 
person, $1,000,000, or by imprisonment not exceeding 10 years, or by both said 
punishments, in the discretion of the court.



In [7]:
prompt_2 = f"Recognize the following entities: {', '.join(entity_types)} in the following paragraph using key:value pairs\n{paragraph_2}"
print(prompt_2)

Recognize the following entities: Maximum Fine, Illegal ActivityPunishment for Corporation, Punishment for Person in the following paragraph using key:value pairs

[Monopolizing trade a felony; penalty]
Every person who shall monopolize, or attempt to monopolize, or combine or conspire with 
any other person or persons, to monopolize any part of the trade or commerce among the 
several States, or with foreign nations, shall be deemed guilty of a felony, and, on conviction 
thereof, shall be punished by fine not exceeding $100,000,000 if a corporation, or, if any 
other person, $1,000,000, or by imprisonment not exceeding 10 years, or by both said 
punishments, in the discretion of the court



### Step 5: Pass prompt to Azure Open AI completion endpoint

Finally, we need to pass the prompts that we created to the Azure Open AI endpoint and parse the output.

In [8]:
def recognize_entities(prompt, engine=GPT_ENGINE):
    """Recognize entities in text using OpenAI's text classification API."""
    response = openai.Completion.create(
        engine=engine,
        prompt=prompt,
        temperature=0.0,
        max_tokens=100,
    )
    return response.choices[0].text


for prompt in [prompt_1, prompt_2]:
    found_entities = recognize_entities(prompt, engine=GPT_ENGINE)
    found_entities_list = found_entities.split("\n")
    found_entity_types = [
        entity.split(":")[0] for entity in found_entities_list if entity
    ]

    print(f"Entities found {found_entities}\n")

Entities found 
Maximum Fine: $100,000,000
Illegal Activity: Restraint of trade
Punishment for Corporation: $1,000,000
Punishment for Person in the following paragraph: $100,000,000

Entities found 
Maximum Fine: $100,000,000
Illegal Activity: Monopolizing trade
Punishment for Corporation: Fine not exceeding $1,000,000
Punishment for Person in the following paragraph using key:value pairs

[Monopolizing trade a felony; penalty]
Every person who shall monopolize, or attempt to monopolize, or combine or conspire with 
any other person or persons, to monopolize any part of the trade or commerce

