In [1]:
import langchain

##### Function calling. 
- Basically this helps us go beyond just text generation. We can now "format" the kind of output that we desire based on the task.
- dynamic retrival information. 

In [2]:
ceo_description = "Jane Doe, CEO of Tech Innovations Inc., has driven remarkable growth, increasing annual revenue by 50% over three years, launching industry-standard products, and expanding into five new international markets. She has also championed sustainability, significantly reducing the company's carbon footprint. Jane earns an annual salary of $2 million, with additional performance-based bonuses and stock options."

# Few-Shot Prompting for Information Extraction

## Overview

In this notebook, we explore how to use few-shot prompting with OpenAI's language models to extract specific pieces of information from a given text. Few-shot prompting helps guide the model to format its responses according to predefined criteria.

## Implementation

### Prompt Construction

We constructed a prompt that specifies the information to be extracted and provides the input text. Here’s an example of the prompt used:

In [12]:
prompt = f'''
Please extract the following information from the given text and return it as a JSON object:

name
company
Salary

This is the body of text to extract the information from:
{ceo_description}
'''

In [13]:
prompt

"\nPlease extract the following information from the given text and return it as a JSON object:\n\nname\ncompany\nSalary\n\nThis is the body of text to extract the information from:\nJane Doe, CEO of Tech Innovations Inc., has driven remarkable growth, increasing annual revenue by 50% over three years, launching industry-standard products, and expanding into five new international markets. She has also championed sustainability, significantly reducing the company's carbon footprint. Jane earns an annual salary of $2 million, with additional performance-based bonuses and stock options.\n"

In [14]:
from openai import OpenAI
client = OpenAI(api_key=my_key)

In [15]:
response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {
      "role": "user",
      "content": prompt
    }
  ]
)

In [16]:
response.choices[0].message.content

'{\n  "name": "Jane Doe",\n  "company": "Tech Innovations Inc.",\n  "Salary": "$2 million"\n}'

In [17]:
output = response.choices[0].message.content

In [18]:
import json
json.loads(output)

{'name': 'Jane Doe',
 'company': 'Tech Innovations Inc.',
 'Salary': '$2 million'}

So basically this way I can be efficient with how I spend on OpenAI ;). This is just a small case of how "my company" can have a system when they just extract information from a document. 

#### Lets define a function.

The structure can be gotten from openAI website: https://cookbook.openai.com/examples/how_to_call_functions_with_chat_models

In [38]:
ceo_custom_function = [
    {
        'name': "extract_ceo_info",
        'description': "Get the CEO information from the input documentation",
        'parameters': {
            'type': 'object',
            'properties': {
                'name': {
                    'type': 'string',
                    'description': "Name of the CEO."
                },
                'company': {
                    'type': 'string',
                    'description': "The company the CEO works for."
                },
                'salary': {
                    'type': 'string',
                    'description': "How much the CEO earns."
                }
            },
            "required": ["name", "company", "salary"]
        }
    }
]

In [39]:
response2 = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[{"role": "user","content": prompt}],
  functions= ceo_custom_function
)

In [43]:
response2

ChatCompletion(id='chatcmpl-9pUL1lnezgOJYUWhu2FsM49Rr86Ia', choices=[Choice(finish_reason='function_call', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=FunctionCall(arguments='{"name":"Jane Doe","company":"Tech Innovations Inc.","salary":"$2 million"}', name='extract_ceo_info'), tool_calls=None))], created=1722058375, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=30, prompt_tokens=188, total_tokens=218))

In [44]:
response2.choices[0].message

ChatCompletionMessage(content=None, role='assistant', function_call=FunctionCall(arguments='{"name":"Jane Doe","company":"Tech Innovations Inc.","salary":"$2 million"}', name='extract_ceo_info'), tool_calls=None)

In [45]:
output2 = response2.choices[0].message.function_call.arguments

In [46]:
json.loads(output2)

{'name': 'Jane Doe',
 'company': 'Tech Innovations Inc.',
 'salary': '$2 million'}

#### Advance use of a function call. 

In [47]:
ceo_description_1 = "Jane Doe, CEO of Tech Innovations Inc., has driven remarkable growth, increasing annual revenue by 50% over three years, launching industry-standard products, and expanding into five new international markets. She has also championed sustainability, significantly reducing the company's carbon footprint. Jane earns an annual salary of $2 million, with additional performance-based bonuses and stock options."
ceo_description_2 = "John Smith earns an annual salary of $2.5 million, complemented by performance bonuses and stock options. As CEO of Global Solutions Ltd., he has driven a 60% increase in annual revenue over the past four years, spearheaded the introduction of groundbreaking technologies, and expanded the company into seven new global markets. John has also emphasized employee well-being by launching innovative health and wellness programs"

In [49]:
ceo_info = [ceo_description_1, ceo_description_2]

In [51]:
for ceo in ceo_info:
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user","content": ceo}],
        functions= ceo_custom_function,
        function_call ='auto'
    )

    response = json.loads(response.choices[0].message.function_call.arguments)
    print(response)          

{'name': 'Jane Doe', 'company': 'Tech Innovations Inc.', 'salary': '$2 million'}
{'name': 'John Smith', 'company': 'Global Solutions Ltd.', 'salary': '$2.5 million'}


#### Using multipul functions. 

In [75]:
ceo_info = [
    {
        'name': "extract_ceo_info",
        'description': "Get the CEO information from the input documentation",
        'parameters': {
            'type': 'object',
            'properties': {
                'name': {
                    'type': 'string',
                    'description': "The CEO's name"
                },
                'company': {
                    'type': 'string',
                    'description': "The company the CEO works for."
                },
                'salary': {
                    'type': 'string',
                    'description': "How much is the CEO earning"
                }
            },
            "required": ["name", "company", "salary"]
        }
    }
]

ceo_achivement = [
    {
        'name': "extract_ceo_achievement",
        'description': "Get the CEO achievement from the input documentation",
        'parameters': {
            'type': 'object',
            'properties': {
                'field': {
                    'type': 'string',
                    'description': "The field in which the CEO made an achievement"
                },
                'revenue': {
                    'type': 'string',
                    'description': "By what percentage was revenue increased"
                },
                'achievement': {
                    'type': 'string',
                    'description': "What was the CEO achievement in short"
                }
            },
            "required": ["field", "revenue", "achievement"]
        }
    }
]

In [76]:
functions = [ceo_info[0],ceo_achivement[0]]

In [83]:
for ceo in ceo_info:
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user","content": ceo}],
        functions= ceo_achivement,
        function_call ='auto'
    )

    response = json.loads(response.choices[0].message.function_call.arguments)
    print(response)          

BadRequestError: Error code: 400 - {'error': {'message': "Invalid type for 'messages[0].content[0]': expected an object, but got a string instead.", 'type': 'invalid_request_error', 'param': 'messages[0].content[0]', 'code': 'invalid_type'}}