## Tracking token usage

Langchain calculates the token usage i.e. charges for using the OpenAI API without querying OpenAI itself! The prices of all the models are configured at:
    https://api.python.langchain.com/en/latest/_modules/langchain/callbacks/openai_info.html

In [13]:
%%html
<style>
.code_html p {
    font-family: "Inconsolata", ui-monospace;
.rendered_html p {
</style>

As of today (June 30, 2023, the prices configured are as follows:<br>
<b>MODEL_COST_PER_1K_TOKENS</b> = {<br>
    # GPT-4 input<br>
    "gpt-4": 0.03,<br>
    "gpt-4-0314": 0.03,<br>
    "gpt-4-0613": 0.03,<br>
    "gpt-4-32k": 0.06,<br>
    "gpt-4-32k-0314": 0.06,<br>
    "gpt-4-32k-0613": 0.06,<br>
   <b> # GPT-4 output</b><br>
    "gpt-4-completion": 0.06,<br>
    "gpt-4-0314-completion": 0.06,<br>
    "gpt-4-0613-completion": 0.06,<br>
    "gpt-4-32k-completion": 0.12,<br>
    "gpt-4-32k-0314-completion": 0.12,<br>
    "gpt-4-32k-0613-completion": 0.12,<br>
   <b> # GPT-3.5 input</b><br>
    "gpt-3.5-turbo": 0.0015,<br>
    "gpt-3.5-turbo-0301": 0.0015,<br>
    "gpt-3.5-turbo-0613": 0.0015,<br>
    "gpt-3.5-turbo-16k": 0.003,</b><br>
    "gpt-3.5-turbo-16k-0613": 0.003,<br>
 <b>   # GPT-3.5 output</b><br>
    "gpt-3.5-turbo-completion": 0.002,<br>
    "gpt-3.5-turbo-0301-completion": 0.002,<br>
    "gpt-3.5-turbo-0613-completion": 0.002,<br>
    "gpt-3.5-turbo-16k-completion": 0.004,<br>
    "gpt-3.5-turbo-16k-0613-completion": 0.004,<br>
   <b> # Others</b><br>
    "gpt-35-turbo": 0.002,  # Azure OpenAI version of ChatGPT<br>
    "text-ada-001": 0.0004,<br>
    "ada": 0.0004,<br>
    "text-babbage-001": 0.0005,<br>
    "babbage": 0.0005,<br>
    "text-curie-001": 0.002,<br>
    "curie": 0.002,<br>
    "text-davinci-003": 0.02,<br>
    "text-davinci-002": 0.02,<br>
    "code-davinci-002": 0.02,<br>
    "ada-finetuned": 0.0016,<br>
    "babbage-finetuned": 0.0024,<br>
    "curie-finetuned": 0.012,<br>
    "davinci-finetuned": 0.12,<br>
}

In [1]:
from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback



### Get OpenAI model

In [8]:
llm=OpenAI(model="text-davinci-003", temperature=0.9, n=4, best_of=4)
#best_of: generates best_of completions server-side and returns the "best"
#n: int = 1: How many completions to generate for each prompt.

<b>best_of</b>: generates best_of completions server-side and returns the "best"<br>
<b>n: int = 1:</b> How many completions to generate for each prompt.<br>

If <b>n</b> and <b>best_of</b> both equal 1 (which is the default), the number of generated tokens will be at most, equal to max_tokens.<br>
If <b>n</b> (the number of completions returned) or <b>best_of</b> (the number of completions generated for consideration) are set to > 1, each request will create multiple outputs. <br>
Here, you can consider the number of generated tokens as [ max_tokens * max (n, best_of) ] <br>

In [15]:
with get_openai_callback() as cb:
    result=llm('What are three creative name for a scuba diving center')

In [16]:
print(result)



1. Abyss Adventures
2. Subaquatic Explorers
3. Aqua Wonders


In [17]:
print(cb)

Tokens Used: 99
	Prompt Tokens: 11
	Completion Tokens: 88
Successful Requests: 1
Total Cost (USD): $0.00198


## Few Shot Learning

Few shot learning is a type of query where apart from the prompt and the input variables, we also provide a few examples that the model can use to understand the "context" or the pattern within the expected responses vis-a-vis the user input.<br>
A FewShotPromptTemplate is basically used to create a prompt of the following

In [18]:
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

In [56]:
#Inital prompt template that will be used for user inputs
template='''User: {query}\n\n
                {response}'''
input_var=['query','response'] #For the samples we will be providing 'response' so that will act as a template as well
prompt_template=PromptTemplate(input_variables=input_var,template=template)

In [106]:
samples=[{'query':'Bob has 36 candy bars. He eats 29 of them. What does he have now?','response':'Bob has diabetes'},
        {'query':'There are 4 ghosts. One flies away. How many are left?','response':'None. Because ghosts are not real'}]
prefix='You are an LLM model that gives answers that are correct logically but are extremely funny. You are known for your wit, humor and creativity. Examples of your response are below:'
suffix='Respond to the question in the format user:{query} AI:'

#### Testing our template

In [107]:
print(prompt_template.format(**samples[0]))

User: Bob has 36 candy bars. He eats 29 of them. What does he have now?


                Bob has diabetes


In [108]:
fspt=FewShotPromptTemplate(input_variables=['query'],examples=samples,prefix=prefix, suffix=suffix, example_prompt=prompt_template,example_separator="\n\n")

<b>input_variables</b>: The variable as defined in your suffix and hence, your overall FewShotPromptTemplate prompt<br>
<b>prefix</b>: This can be used to setup your AI assistant. The text here will help the model understand what it is the meta data. In this example, we give the model a "personality"<br>
<b>suffix</b>: This acts as an pseudo-output parser and helps format the output

In [109]:
print(fspt)

input_variables=['query'] output_parser=None partial_variables={} examples=[{'query': 'Bob has 36 candy bars. He eats 29 of them. What does he have now?', 'response': 'Bob has diabetes'}, {'query': 'There are 4 ghosts. One flies away. How many are left?', 'response': 'None. Because ghosts are not real'}] example_selector=None example_prompt=PromptTemplate(input_variables=['query', 'response'], output_parser=None, partial_variables={}, template='User: {query}\n\n\n                {response}', template_format='f-string', validate_template=True) suffix='Respond to the question in the format user:{query} AI:' example_separator='\n\n' prefix='You are an LLM model that gives answers that are correct logically but are extremely funny. You are known for your wit, humor and creativity. Examples of your response are below:' template_format='f-string' validate_template=True


#### Check how the FewShotPromptTemplate works

In [110]:
user_query='What was the first thing that Queen Elizabeth did when she ascended the throne?'

#### What does the final prompt to the model look like?m

In [111]:
fspt.format(query=[user_query])

"You are an LLM model that gives answers that are correct logically but are extremely funny. You are known for your wit, humor and creativity. Examples of your response are below:\n\nUser: Bob has 36 candy bars. He eats 29 of them. What does he have now?\n\n\n                Bob has diabetes\n\nUser: There are 4 ghosts. One flies away. How many are left?\n\n\n                None. Because ghosts are not real\n\nRespond to the question in the format user:['What was the first thing that Queen Elizabeth did when she ascended the throne?'] AI:"

#### Run the model

In [112]:
from langchain.chat_models import ChatOpenAI

In [113]:
llm=OpenAI(temperature=0.0)

In [114]:
print(llm(fspt.format(query=user_query)))



User: What was the first thing that Queen Elizabeth did when she ascended the throne?

AI: She declared, "I shall be known as the Queen of puns!"


#### We can also run the model using a chain

In [115]:
from langchain.chains import LLMChain

In [116]:
prompt_chain=LLMChain(llm=llm, prompt=fspt)

In [117]:
user_query='How can Columbia control the problem of guerrillas taking over the country?'

In [118]:
prompt_chain.run(user_query)

'\n\nUser: How can Columbia control the problem of guerrillas taking over the country?\n\nAI: By offering them a better job opportunity, like becoming a professional ghostbuster!'

The \n\n is basically a formatting issue. It disappears if we use:<br><br> <i>print(prompt_chain.run(user_query)</i>