## This notebook has the following topics -

1. LLM models with langchain, getting started
2. Async APIs for LLMs
3. Custom LLM wrapper class
4. Caching LLM Calls thorugh various design choices
5. Serialize LLM classes
6. Stream LLM and Chat model responses
7. Quantifying the cost based on the token usage

## The Models

1. **LLMs**
2. ChatModels
3. Text Embedding Models

### LLMs  - getting started
Large Language Models. It is a standard interface for LLM providers like OpenAI, HuggingFace etc.

In [1]:
from langchain.llms import OpenAI

llm = OpenAI(model_name = "text-ada-001", n = 2, best_of = 2)

llm("Tell me a joke with programing anecdote")

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

In [7]:
# batch of requests -
llm_result = llm.generate(["Tell me a Scientific fact"] * 5)

In [8]:
llm_result

LLMResult(generations=[[Generation(text='\n\nThe Earth is the only "edient" on the planet.', generation_info={'finish_reason': 'stop', 'logprobs': None}), Generation(text='\n\nThe moon has an area of 1,000 square kilometers.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\nThe moon has an\n\nassorted Slang words for these things meaning different things to different people\n\n1. its a place where people go to die\n2. its a place where people can dream\n3. its a place where people walk\n4. its a place where peopleECE (electric chair)', generation_info={'finish_reason': 'stop', 'logprobs': None}), Generation(text='\n\nThe universe is around one trillionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth 

In [9]:
len(llm_result.generations)

5

In [11]:
llm_result.generations[0:2]

[[Generation(text='\n\nThe Earth is the only "edient" on the planet.', generation_info={'finish_reason': 'stop', 'logprobs': None}),
  Generation(text='\n\nThe moon has an area of 1,000 square kilometers.', generation_info={'finish_reason': 'stop', 'logprobs': None})],
 [Generation(text='\n\nThe moon has an\n\nassorted Slang words for these things meaning different things to different people\n\n1. its a place where people go to die\n2. its a place where people can dream\n3. its a place where people walk\n4. its a place where peopleECE (electric chair)', generation_info={'finish_reason': 'stop', 'logprobs': None}),
  Generation(text='\n\nThe universe is around one trillionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of

In [12]:
llm_result.llm_output

{'token_usage': {'completion_tokens': 461,
  'total_tokens': 486,
  'prompt_tokens': 25},
 'model_name': 'text-ada-001'}

In [14]:
llm.get_num_tokens("what a joke") # make sure you have installed tiktoken

3

### Async API for LLMs
It uses asyncio library. This is useful for calling multiple LLMs concurrently, as these calls are network-bound.
We can use ```agenerate``` method to call an OpenAI LLM asynchronously

In [15]:
import time
import asyncio

from langchain.llms import OpenAI

In [20]:
def generate_serially():
    llm = OpenAI(temperature = 0.9)
    for _ in range(10):
        resp = llm.generate(["Hello, how are you?"])
        print(resp.generations[0][0].text)

In [21]:
async def async_generate(llm):
    resp = await llm.agenerate(["Hello, how are you?"])
    print(resp.generations[0][0].text)

In [22]:
async def generate_concurrently():
    llm = OpenAI(temperature = 0.9)
    tasks = [async_generate(llm) for _ in range(10)]
    await asyncio.gather(*tasks)

In [25]:
s = time.perf_counter()
# If running this outside of Jupyter, use asyncio.run(generate_concurrently())
await generate_concurrently()

elapsed = time.perf_counter() - s
print('\033[1m' + f"Concurrent executed in {elapsed:0.2f} seconds." + '\033[0m')

s = time.perf_counter()
generate_serially()
elapsed = time.perf_counter() - s
print('\033[1m' + f"Serial executed in {elapsed:0.2f} seconds." + '\033[0m')



I'm doing well, thank you. How about you?


I'm doing well, thanks for asking. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you! How about yourself?


I'm doing well, thank you. How about yourself?


I'm doing well, thanks for asking. How about you?


I'm doing well, thank you. How about you?
[1mConcurrent executed in 4.58 seconds.[0m


I'm doing well. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?

I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thanks. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?
[1mSerial executed in 10.07 seconds.[0m


### Custom LLM Wrapper
For a custom LLM we need to implement -
* Required Method: _call 
* Optional Property: _identifying_params

Implementing a simple LLM that returns the first N characters of the input

In [27]:
from langchain.llms.base import LLM
from typing import Optional, List, Mapping, Any

In [53]:
class CustomLLM(LLM):
    
    n: int
    
    @property
    def _llm_type(self) -> str:
        return 'custom'
    
    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        return prompt[:self.n]
    
    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """ Get the identiying parameters. """
        return {"n": self.n,
               "LLM type": self._llm_type}

In [54]:
llm = CustomLLM(n = 10)

In [55]:
llm("This is a foobar thing")

'This is a '

In [56]:
print(llm)

[1mCustomLLM[0m
Params: {'n': 10, 'LLM type': 'custom'}


### Fake LLMs
Fake LLM class are used for testing. This mimicks the real LLMs and simulate what would happen when a call goes to a real LLM.

In [57]:
from langchain.llms.fake import FakeListLLM

In [58]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType

In [59]:
# load tools -

tools = load_tools(["python_repl"])

In [62]:
# Simulating response - 

responses = [
    "Action: Python REPL\nAction Input: print(2 + 2)",
    "Final Answer: 4"
]
llm = FakeListLLM(responses = responses)

In [63]:
llm

FakeListLLM(cache=None, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x10e14e580>, responses=['Action: Python REPL\nAction Input: print(2 + 2)', 'Final Answer: 4'], i=0)

In [64]:
# Initialize Agent -
agent = initialize_agent(tools, llm, agen = AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose = True)

In [65]:
# Run the agent -
agent.run("what is 2 + 2")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: Python REPL
Action Input: print(2 + 2)[0m
Observation: [36;1m[1;3m4
[0m
Thought:[32;1m[1;3mFinal Answer: 4[0m

[1m> Finished chain.[0m


'4'

### Cache LLM Calls - In Memory Cache
How to cache results of individual LLM calls.

In [67]:
from langchain.llms import OpenAI
from langchain.cache import InMemoryCache
import langchain

In [68]:
langchain.llm_cache = InMemoryCache()

In [69]:
# To make the caching really obvious, let's use a slower model.
llm = OpenAI(model_name = 'text-davinci-002', n = 2, best_of = 2)

In [70]:
%%time
# this call to the model is not cached
llm("Tell me a scientic fun facts")

CPU times: user 24 ms, sys: 3.63 ms, total: 27.6 ms
Wall time: 1.22 s


'\n\nThe average person farts 14 times a day.'

In [71]:
%%time
# the second time it is cached and it goes faster
llm("Tell me a scientic fun facts")

CPU times: user 214 µs, sys: 3 µs, total: 217 µs
Wall time: 227 µs


'\n\nThe average person farts 14 times a day.'

### Cache LLM Calls - SQLLite Cache

In [83]:
!rm .langchain.db

rm: .langchain.db: No such file or directory


In [84]:
from langchain.cache import SQLiteCache
langchain.llm_cache = SQLiteCache(database_path = ".langchain.db")

In [85]:
%%time
# first time not cached
llm("Tell me a scientic fun facts")

CPU times: user 35.8 ms, sys: 12.3 ms, total: 48.1 ms
Wall time: 1.71 s


'\n\nSome scientific fun facts are:\n\n-The average person has about 100,000 hairs on their head.\n\n-The human heart beats about 100,000 times a day.\n\n-The average person breathes about 23,000 times a day.\n\n-The average person blinks about 15 times a minute.'

In [86]:
%%time
# it is cached second time onwards
llm("Tell me a scientic fun facts")

CPU times: user 2.3 ms, sys: 1.3 ms, total: 3.6 ms
Wall time: 2.33 ms


'\n\nSome scientific fun facts are:\n\n-The average person has about 100,000 hairs on their head.\n\n-The human heart beats about 100,000 times a day.\n\n-The average person breathes about 23,000 times a day.\n\n-The average person blinks about 15 times a minute.'

### Cache LLM Calls - Redis Cache
For that install and start redis server locally

```brew services start redis```

```brew services info redis```

```brew services stop redis```

In [77]:
from redis import Redis

In [78]:
from langchain.cache import RedisCache

In [79]:
langchain.llm_cache = RedisCache(redis_ = Redis())

In [80]:
%%time
# first time not cached
llm("Tell me a scientic fun facts")

CPU times: user 24.6 ms, sys: 5.08 ms, total: 29.6 ms
Wall time: 2.22 s


'\n\nThe average person produces about 1.5 liters of gas a day.'

In [81]:
%%time
# it is cached second time onwards
llm("Tell me a scientic fun facts")

CPU times: user 2.19 ms, sys: 2.17 ms, total: 4.37 ms
Wall time: 3.72 ms


'\n\nThe average person produces about 1.5 liters of gas a day.'

### Cache LLM Calls - GPTCache (Exact Match Caching)

GPTCache is a library for creating semantic cache to store responses from LLM queries.

In [82]:
import gptcache
from gptcache.processor.pre import get_prompt
from gptcache.manager.factory import get_data_manager
from langchain.cache import GPTCache

In [87]:
# Avoid multiple caches using the same file, causing different llm model caches to affect each other.

In [88]:
i = 0
file_prefix = "data_map"

def init_gptcache_map(cache_obj: gptcache.Cache):
    global i
    cache_path = f'{file_prefix}_{i}.txt'
    cache_obj.init(
        pre_embedding_func = get_prompt,
        data_manager = get_data_manager(data_path = cache_path),
    )
    i += 1

langchain.llm_cache = GPTCache(init_gptcache_map)

In [89]:
%%time
# first time not cached
llm("Tell me a scientic fun facts")



CPU times: user 25.7 ms, sys: 12.6 ms, total: 38.3 ms
Wall time: 1.23 s


'\n\n-The average person farts 14 times a day.\n-The human brain is about 75% water.\n-The average person blinks around 15 times a minute.\n- humans have more than 5 million smell receptors.'

In [90]:
%%time
# it is cached second time onwards
llm("Tell me a scientic fun facts")



CPU times: user 9.65 ms, sys: 2.56 ms, total: 12.2 ms
Wall time: 1.15 s


'\n\nA day on Venus lasts for 243 Earth days.'

### Cache LLM Calls - GPTCache (Similarity Match Caching)

In [91]:
import gptcache
from gptcache.processor.pre import get_prompt
from gptcache.manager.factory import get_data_manager
from langchain.cache import GPTCache
from gptcache.manager import get_data_manager, CacheBase, VectorBase
from gptcache import Cache
from gptcache.embedding import Onnx
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation

In [92]:
# Avoid multiple caches using the same file, causing different llm model caches to affect each other

In [96]:
i = 0
file_prefix = "data_map"
llm_cache = Cache()

def init_gptcache_map(cache_obj: gptcache.Cache):
    global i
    cache_path = f'{file_prefix}_{i}.txt'
    onnx = Onnx()
    cache_base = CacheBase('sqlite')
    vector_base = VectorBase('faiss', dimension = onnx.dimension)
    data_manager = get_data_manager(cache_base, vector_base, max_size = 10, clean_size = 2)
    cache_obj.init(
        pre_embedding_func = get_prompt,
        embedding_func = onnx.to_embeddings,
        data_manager = data_manager,
        similarity_evaluation = SearchDistanceEvaluation(),
    )
    i += 1
    
langchain.llm_cache = GPTCache(init_gptcache_map)

In [97]:
%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

  class QuestionTable(Base):
  class AnswerTable(Base):
  class QuestionDepTable(Base):


CPU times: user 1.08 s, sys: 69.7 ms, total: 1.15 s
Wall time: 1.96 s


'\n\nHow do you catch a cheetah? You tie him to a post!'

In [98]:
%%time
# This is an exact match, so it finds it in the cache
llm("Tell me a joke")



CPU times: user 1.56 s, sys: 13.7 ms, total: 1.57 s
Wall time: 1.19 s


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

### Cache LLM Calls - SQLAlchemy Cache
You can use SQLAlchemyCache to cache with any SQL database supported by SQLAlchemy.

In [99]:
# from langchain.cache import SQLAlchemyCache
# from sqlalchemy import create_engine

# engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
# langchain.llm_cache = SQLAlchemyCache(engine)

### Cache LLM Calls - Custom SQLAlchemy Schemas 
You can define your own declarative SQLAlchemyCache child class to customize the schema used for caching. For example, to support high-speed fulltext prompt indexing with Postgres, use:

In [107]:
from sqlalchemy import Column, Integer, String, Computed, Index, Sequence
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy_utils import TSVectorType
from langchain.cache import SQLAlchemyCache

In [108]:
Base = declarative_base()

In [110]:
# class FulltextLLMCache(Base):
#     """ Postgres table for fulltext-indexed LLM Cache """
    
#     __tablename__ = "llm_cache_fulltext"
#     __table_args__ = {'extend_existing': True} # added : https://stackoverflow.com/questions/27812250/sqlalchemy-inheritance-not-working
#     id = Column(Integer, Sequence('cache_id'), primary_key = True)
#     prompt = Column(String, nullable = False)
#     llm = Column(String, nullable = False)
#     idx = Column(String)
#     response = Column(String)
#     prompt_tsv = Column(TSVectorType(), Computed("to_tsvector('english', llm || ' ' || prompt)", persisted = True))
#     __table_args__ = (
#         Index("idx_fulltext_prompt_tsv", prompt_tsv, postgresql_using = "gin"),
#     )
    
# engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
# langchain.llm_cache = SQLAlchemyCache(engine, FulltextLLMCache)

### Optional Caching -
We can turn off caching for specifc LLMs if need be even though the global canching is enabled.

In [134]:
llm = OpenAI(model_name = "text-davinci-002", n = 2, best_of = 2, cache = False)

In [135]:
%%time
llm("Tell me a joke")

CPU times: user 9.12 ms, sys: 2.52 ms, total: 11.6 ms
Wall time: 699 ms


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

In [136]:
%%time
llm("Tell me a joke")

CPU times: user 7.93 ms, sys: 2.22 ms, total: 10.1 ms
Wall time: 905 ms


'\n\nA man walks into a bar and asks for a beer. The bartender says "You\'re out of luck. We\'ve been closed for fifteen minutes."'

### Optional Caching in Chains

We can turn off caching for particular nodes in chains. Let's see this example -

Here, we will load a summarizer map-reduce chain. We will cache results for the map-step, but then not freeze it for the combine step.

In [1]:
from langchain.llms import OpenAI

In [2]:
llm = OpenAI(model_name = "text-davinci-002")
no_cache_llm = OpenAI(model_name = "text-davinci-002", cache = False)

In [3]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain

In [4]:
text_splitter = CharacterTextSplitter()

In [5]:
with open('../../../sample_data/state_of_the_union.txt') as f:
    state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)

In [6]:
len(texts)

11

In [7]:
texts[0]

'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n\nGroups of citizens blocking tanks with 

In [8]:
from langchain.docstore.document import Document
docs = [Document(page_content = t) for t in texts[:3]]

In [9]:
len(docs)

3

In [10]:
docs[0]

Document(page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n\nGroups of citizen

In [11]:
from langchain.chains.summarize import load_summarize_chain

In [12]:
chain = load_summarize_chain(llm, chain_type = "map_reduce", reduce_llm = no_cache_llm)

In [13]:
%%time
chain.run(docs)

CPU times: user 241 ms, sys: 29.6 ms, total: 270 ms
Wall time: 5.65 s


'\n\nPresident Biden discusses the recent aggression from Russia and the response from the free world. He explains how America and its allies are working together to hold Russia accountable and outlines the various ways in which they are doing so. Finally, he warns the Russian oligarchs that the U.S. is coming for their ill-gotten gains.'

**Note** 

When we run it again, we see that it runs substantially faster but the final answer is different. This is due to caching at the map steps, but not at the reduce step.

In [14]:
%%time
chain.run(docs)

CPU times: user 19 ms, sys: 3.92 ms, total: 22.9 ms
Wall time: 2.84 s


"\n\nPresident Biden discusses the sanctions against Russia and the importance of the NATO alliance. He also addresses the issue of Russian oligarchs and their ill-gotten gains. The United States is joining with its European allies to seize assets from Russians in response to Putin's actions in Ukraine. American airspace will be closed to Russian flights, and military, economic, and humanitarian aid will be given to Ukraine. Putin's aggression will not be tolerated, and the United States is prepared to defend its NATO allies if necessary."

### Serialize LLM classes

Write and read an LLM Configuration to and from disk. This is useful if you want to save the configuration for a given LLM (e.g., the provider, the temperature, etc).

In [16]:
from langchain.llms import OpenAI
from langchain.llms.loading import load_llm

In [15]:
### Loading

In [18]:
llm = load_llm("llm.json")

In [19]:
llm

OpenAI(cache=None, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x10ae73760>, client=<class 'openai.api_resources.completion.Completion'>, model_name='text-davinci-003', temperature=0.7, max_tokens=256, top_p=1.0, frequency_penalty=0.0, presence_penalty=0.0, n=1, best_of=1, model_kwargs={}, openai_api_key=None, openai_api_base=None, openai_organization=None, batch_size=20, request_timeout=None, logit_bias={}, max_retries=6, streaming=False, allowed_special=set(), disallowed_special='all')

In [20]:
llm = load_llm('llm.yaml')

In [21]:
llm

OpenAI(cache=None, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x10ae73760>, client=<class 'openai.api_resources.completion.Completion'>, model_name='text-davinci-003', temperature=0.7, max_tokens=256, top_p=1.0, frequency_penalty=0.0, presence_penalty=0.0, n=1, best_of=1, model_kwargs={}, openai_api_key=None, openai_api_base=None, openai_organization=None, batch_size=20, request_timeout=None, logit_bias={}, max_retries=6, streaming=False, allowed_special=set(), disallowed_special='all')

In [22]:
### Saving

In [23]:
# llm.save("llm.json")
# llm.save("llm.yaml")

### Stream LLM and Chat Model Responses

In [24]:
from langchain.llms import OpenAI, Anthropic
from langchain.chat_models import ChatOpenAI
from langchain.callbacks.base import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage

In [25]:
llm = OpenAI(streaming = True, callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]), verbose = True, temperature = 0)
resp = llm("Write me a poem about midnight sun.")



The midnight sun is a sight to behold
It's beauty is something to behold
The sky is lit up in a golden hue
The stars twinkle in the night sky too

The midnight sun is a sight to see
It's a sight that's hard to believe
The sky is lit up in a golden hue
The stars twinkle in the night sky too

The midnight sun is a sight to behold
It's beauty is something to behold
The sky is lit up in a golden hue
The stars twinkle in the night sky too

The midnight sun is a sight to behold
It's beauty is something to behold
The sky is lit up in a golden hue
The stars twinkle in the night sky too

The midnight sun is a sight to behold
It's beauty is something to behold
The sky is lit up in a golden hue
The stars twinkle in the night sky too

The midnight sun is a sight to behold
It's beauty is something to behold
The sky is lit up in a golden hue
The stars twinkle in the night sky too

The midnight sun is a sight to behold
It's beauty is something to behold
The sky is lit up in a

In [27]:
llm.generate(["Tell me a joke."])



Q: What did the fish say when it hit the wall?
A: Dam!

LLMResult(generations=[[Generation(text='\n\nQ: What did the fish say when it hit the wall?\nA: Dam!', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {}, 'model_name': 'text-davinci-003'})

In [26]:
chat = ChatOpenAI(streaming = True, callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]), verbose = True, temperature = 0)
resp = chat([HumanMessage(content = "Write me a sing about sparkling water.")])

Verse 1:
Clear and crisp, refreshing to the taste
Bubbles dance, effervescence in the glass
A thirst quencher, a healthy choice
Sparkling water, my favorite voice

Chorus:
Sparkling water, oh how you shine
A bubbly sensation, so divine
No sugar, no calories, just pure delight
Sparkling water, my drink of the night

Verse 2:
A perfect mixer, for any cocktail
A splash of lime, or a slice of grapefruit
A healthy alternative, to soda pop
Sparkling water, never gonna stop

Chorus:
Sparkling water, oh how you shine
A bubbly sensation, so divine
No sugar, no calories, just pure delight
Sparkling water, my drink of the night

Bridge:
From the mountains to the sea
Sparkling water, you're the key
To hydration and a healthy life
Sparkling water, you're my delight

Chorus:
Sparkling water, oh how you shine
A bubbly sensation, so divine
No sugar, no calories, just pure delight
Sparkling water, my drink of the night

Outro:
Sparkling water, you're the one
A refreshing drink, under the sun
I'll never

### How to track token usage

In [29]:
from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback

In [30]:
llm = OpenAI(model_name = "text-davinci-002", n = 2, best_of = 2)
llm

OpenAI(cache=None, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x10ae73760>, client=<class 'openai.api_resources.completion.Completion'>, model_name='text-davinci-002', temperature=0.7, max_tokens=256, top_p=1, frequency_penalty=0, presence_penalty=0, n=2, best_of=2, model_kwargs={}, openai_api_key=None, openai_api_base=None, openai_organization=None, batch_size=20, request_timeout=None, logit_bias={}, max_retries=6, streaming=False, allowed_special=set(), disallowed_special='all')

In [31]:
with get_openai_callback() as cb:
    result = llm("tell me a joke")
    print(cb)

Tokens Used: 42
	Prompt Tokens: 4
	Completion Tokens: 38
Successful Requests: 1
Total Cost (USD): $0.00084


Anything inside the context manager will get tracked. Here's an example of using it to track multiple calls in sequenece

In [32]:
with get_openai_callback() as cb:
    result = llm("tell me a joke")
    result2 = llm("tell me a joke")
    print(cb.total_tokens)

84


If a chain or agent with multiple steps in it is used, it will track all those steps.

In [33]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType

from langchain.llms import OpenAI

In [34]:
llm = OpenAI(temperature = 0)
tools = load_tools(["serpapi", "llm-math"], llm = llm)
agent = initialize_agent(tools, llm, agent = AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose = True)

In [35]:
with get_openai_callback() as cb:
    response = agent.run("Who is Sachin Tendulkar's son and what is his current age. Divide his age by 3")
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Total Cost (USD): ${cb.total_cost}")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out who Sachin Tendulkar's son is and his current age
Action: Search
Action Input: "Sachin Tendulkar's son"[0m
Observation: [36;1m[1;3mArjun Tendulkar[0m
Thought:[32;1m[1;3m I need to find out Arjun Tendulkar's age
Action: Search
Action Input: "Arjun Tendulkar age"[0m
Observation: [36;1m[1;3m23 years[0m
Thought:[32;1m[1;3m I need to divide 23 by 3
Action: Calculator
Action Input: 23/3[0m
Observation: [33;1m[1;3mAnswer: 7.666666666666667[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: Arjun Tendulkar is Sachin Tendulkar's son and his current age is 23 years old. His age divided by 3 is 7.666666666666667.[0m

[1m> Finished chain.[0m
Total Tokens: 1349
Prompt Tokens: 1199
Completion Tokens: 150
Total Cost (USD): $0.02698
