## The Models

1. **LLMs**
2. ChatModels
3. Text Embedding Models

### LLMs  - getting started
Large Language Models. It is a standard interface for LLM providers like OpenAI, HuggingFace etc.

In [1]:
from langchain.llms import OpenAI

llm = OpenAI(model_name = "text-ada-001", n = 2, best_of = 2)

llm("Tell me a joke with programing anecdote")

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

In [7]:
# batch of requests -
llm_result = llm.generate(["Tell me a Scientific fact"] * 5)

In [8]:
llm_result

LLMResult(generations=[[Generation(text='\n\nThe Earth is the only "edient" on the planet.', generation_info={'finish_reason': 'stop', 'logprobs': None}), Generation(text='\n\nThe moon has an area of 1,000 square kilometers.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\nThe moon has an\n\nassorted Slang words for these things meaning different things to different people\n\n1. its a place where people go to die\n2. its a place where people can dream\n3. its a place where people walk\n4. its a place where peopleECE (electric chair)', generation_info={'finish_reason': 'stop', 'logprobs': None}), Generation(text='\n\nThe universe is around one trillionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth 

In [9]:
len(llm_result.generations)

5

In [11]:
llm_result.generations[0:2]

[[Generation(text='\n\nThe Earth is the only "edient" on the planet.', generation_info={'finish_reason': 'stop', 'logprobs': None}),
  Generation(text='\n\nThe moon has an area of 1,000 square kilometers.', generation_info={'finish_reason': 'stop', 'logprobs': None})],
 [Generation(text='\n\nThe moon has an\n\nassorted Slang words for these things meaning different things to different people\n\n1. its a place where people go to die\n2. its a place where people can dream\n3. its a place where people walk\n4. its a place where peopleECE (electric chair)', generation_info={'finish_reason': 'stop', 'logprobs': None}),
  Generation(text='\n\nThe universe is around one trillionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of a billionth of

In [12]:
llm_result.llm_output

{'token_usage': {'completion_tokens': 461,
  'total_tokens': 486,
  'prompt_tokens': 25},
 'model_name': 'text-ada-001'}

In [14]:
llm.get_num_tokens("what a joke") # make sure you have installed tiktoken

3

### Async API for LLMs
It uses asyncio library. This is useful for calling multiple LLMs concurrently, as these calls are network-bound.
We can use ```agenerate``` method to call an OpenAI LLM asynchronously

In [15]:
import time
import asyncio

from langchain.llms import OpenAI

In [20]:
def generate_serially():
    llm = OpenAI(temperature = 0.9)
    for _ in range(10):
        resp = llm.generate(["Hello, how are you?"])
        print(resp.generations[0][0].text)

In [21]:
async def async_generate(llm):
    resp = await llm.agenerate(["Hello, how are you?"])
    print(resp.generations[0][0].text)

In [22]:
async def generate_concurrently():
    llm = OpenAI(temperature = 0.9)
    tasks = [async_generate(llm) for _ in range(10)]
    await asyncio.gather(*tasks)

In [25]:
s = time.perf_counter()
# If running this outside of Jupyter, use asyncio.run(generate_concurrently())
await generate_concurrently()

elapsed = time.perf_counter() - s
print('\033[1m' + f"Concurrent executed in {elapsed:0.2f} seconds." + '\033[0m')

s = time.perf_counter()
generate_serially()
elapsed = time.perf_counter() - s
print('\033[1m' + f"Serial executed in {elapsed:0.2f} seconds." + '\033[0m')



I'm doing well, thank you. How about you?


I'm doing well, thanks for asking. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you! How about yourself?


I'm doing well, thank you. How about yourself?


I'm doing well, thanks for asking. How about you?


I'm doing well, thank you. How about you?
[1mConcurrent executed in 4.58 seconds.[0m


I'm doing well. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?

I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thanks. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?
[1mSerial executed in 10.07 seconds.[0m


### Custom LLM Wrapper
For a custom LLM we need to implement -
* Required Method: _call 
* Optional Property: _identifying_params

Implementing a simple LLM that returns the first N characters of the input

In [27]:
from langchain.llms.base import LLM
from typing import Optional, List, Mapping, Any

In [53]:
class CustomLLM(LLM):
    
    n: int
    
    @property
    def _llm_type(self) -> str:
        return 'custom'
    
    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        return prompt[:self.n]
    
    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """ Get the identiying parameters. """
        return {"n": self.n,
               "LLM type": self._llm_type}

In [54]:
llm = CustomLLM(n = 10)

In [55]:
llm("This is a foobar thing")

'This is a '

In [56]:
print(llm)

[1mCustomLLM[0m
Params: {'n': 10, 'LLM type': 'custom'}


### Fake LLMs
Fake LLM class are used for testing. This mimicks the real LLMs and simulate what would happen when a call goes to a real LLM.

In [57]:
from langchain.llms.fake import FakeListLLM

In [58]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType

In [59]:
# load tools -

tools = load_tools(["python_repl"])

In [62]:
# Simulating response - 

responses = [
    "Action: Python REPL\nAction Input: print(2 + 2)",
    "Final Answer: 4"
]
llm = FakeListLLM(responses = responses)

In [63]:
llm

FakeListLLM(cache=None, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x10e14e580>, responses=['Action: Python REPL\nAction Input: print(2 + 2)', 'Final Answer: 4'], i=0)

In [64]:
# Initialize Agent -
agent = initialize_agent(tools, llm, agen = AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose = True)

In [65]:
# Run the agent -
agent.run("what is 2 + 2")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: Python REPL
Action Input: print(2 + 2)[0m
Observation: [36;1m[1;3m4
[0m
Thought:[32;1m[1;3mFinal Answer: 4[0m

[1m> Finished chain.[0m


'4'

### Cache LLM Calls - In Memory Cache

In [67]:
from langchain.llms import OpenAI
from langchain.cache import InMemoryCache
import langchain

In [68]:
langchain.llm_cache = InMemoryCache()

In [69]:
# To make the caching really obvious, let's use a slower model.
llm = OpenAI(model_name = 'text-davinci-002', n = 2, best_of = 2)

In [70]:
%%time
# this call to the model is not cached
llm("Tell me a scientic fun facts")

CPU times: user 24 ms, sys: 3.63 ms, total: 27.6 ms
Wall time: 1.22 s


'\n\nThe average person farts 14 times a day.'

In [71]:
%%time
# the second time it is cached and it goes faster
llm("Tell me a scientic fun facts")

CPU times: user 214 µs, sys: 3 µs, total: 217 µs
Wall time: 227 µs


'\n\nThe average person farts 14 times a day.'

### Cache LLM Calls - SQLLite Cache

In [83]:
!rm .langchain.db

rm: .langchain.db: No such file or directory


In [84]:
from langchain.cache import SQLiteCache
langchain.llm_cache = SQLiteCache(database_path = ".langchain.db")

In [85]:
%%time
# first time not cached
llm("Tell me a scientic fun facts")

CPU times: user 35.8 ms, sys: 12.3 ms, total: 48.1 ms
Wall time: 1.71 s


'\n\nSome scientific fun facts are:\n\n-The average person has about 100,000 hairs on their head.\n\n-The human heart beats about 100,000 times a day.\n\n-The average person breathes about 23,000 times a day.\n\n-The average person blinks about 15 times a minute.'

In [86]:
%%time
# it is cached second time onwards
llm("Tell me a scientic fun facts")

CPU times: user 2.3 ms, sys: 1.3 ms, total: 3.6 ms
Wall time: 2.33 ms


'\n\nSome scientific fun facts are:\n\n-The average person has about 100,000 hairs on their head.\n\n-The human heart beats about 100,000 times a day.\n\n-The average person breathes about 23,000 times a day.\n\n-The average person blinks about 15 times a minute.'

### Cache LLM Calls - Redis Cache
For that install and start redis server locally

```brew services start redis```

```brew services info redis```

```brew services stop redis```

In [77]:
from redis import Redis

In [78]:
from langchain.cache import RedisCache

In [79]:
langchain.llm_cache = RedisCache(redis_ = Redis())

In [80]:
%%time
# first time not cached
llm("Tell me a scientic fun facts")

CPU times: user 24.6 ms, sys: 5.08 ms, total: 29.6 ms
Wall time: 2.22 s


'\n\nThe average person produces about 1.5 liters of gas a day.'

In [81]:
%%time
# it is cached second time onwards
llm("Tell me a scientic fun facts")

CPU times: user 2.19 ms, sys: 2.17 ms, total: 4.37 ms
Wall time: 3.72 ms


'\n\nThe average person produces about 1.5 liters of gas a day.'

### Cache LLM Calls - GPTCache (Exact Matching)

GPTCache is a library for creating semantic cache to store responses from LLM queries.

In [82]:
import gptcache
from gptcache.processor.pre import get_prompt
from gptcache.manager.factory import get_data_manager
from langchain.cache import GPTCache