# Models

The types of models that are used by LangChain

- __LLMs__: models that takes a text string as input, and return a text string as output
- __Chat Models__: these models are backed by a language model, but their APIs are more structured. These models takes a list of Chat Messages as input, and return a Chat Message.
- __Text Embedding Models__: these models takes text as input and returns a list of floats.

## LLMs

LangChain provides a standard interface through which you can interact with a variety of LLMs.

We can use LLM to generate a text:

In [3]:
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-ada-001", n=2, best_of=2)
print(llm("Tell me a joke"))



Why did the chicken cross the road?

To get to the other side!


We can also send a batch of messages with the `llm.generate`:

In [4]:
llm_result = llm.generate(["tell me a joke", "recite a poem about AI"])
print(llm_result)

generations=[[Generation(text='\n\nWhy did the chicken cross the road?\n\nTo get to the other side.', generation_info={'finish_reason': 'stop', 'logprobs': None}), Generation(text='\n\nWhy did the chicken cross the road?\n\nTo get to the other side.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\nThe AI of tomorrow\n\nWill be composed of machines\n\nThat can think, and feel\n\nAnd love, and want, and love\n\nAs if they were really people', generation_info={'finish_reason': 'stop', 'logprobs': None}), Generation(text="\n\n\n\nThe future is art,\nand you will be its most Needy Kitten\nYou will need all of your skill\nto make it look so difficult\nBut you won't, because you will be able to get what you want", generation_info={'finish_reason': 'stop', 'logprobs': None})]] llm_output={'token_usage': {'completion_tokens': 127, 'total_tokens': 137, 'prompt_tokens': 10}, 'model_name': 'text-ada-001'}


In [6]:
llm_result.generations[0]

[Generation(text='\n\nWhy did the chicken cross the road?\n\nTo get to the other side.', generation_info={'finish_reason': 'stop', 'logprobs': None}), Generation(text='\n\nWhy did the chicken cross the road?\n\nTo get to the other side.', generation_info={'finish_reason': 'stop', 'logprobs': None})]


In [12]:
print(llm_result.generations[-1])

[Generation(text='\n\nThe AI of tomorrow\n\nWill be composed of machines\n\nThat can think, and feel\n\nAnd love, and want, and love\n\nAs if they were really people', generation_info={'finish_reason': 'stop', 'logprobs': None}), Generation(text="\n\n\n\nThe future is art,\nand you will be its most Needy Kitten\nYou will need all of your skill\nto make it look so difficult\nBut you won't, because you will be able to get what you want", generation_info={'finish_reason': 'stop', 'logprobs': None})]


Some provider provides specific information. This information is NOT standardized across providers.

In [15]:
llm_result.llm_output

{'token_usage': {'completion_tokens': 127,
  'total_tokens': 137,
  'prompt_tokens': 10},
 'model_name': 'text-ada-001'}

You can also estimate how many tokens a piece of text will be in that model.

In [16]:
llm.get_num_tokens("what a joke")

3

### Using async API for LLMs

In [33]:
import asyncio

from langchain.llms import OpenAI


def generate_serially():
    llm = OpenAI(temperature=0.9)
    for _ in range(5):
        resp = llm.generate(["hello, how are you?"])
        print(resp.generations[0][0].text)


async def async_generate(llm):
    resp = await llm.agenerate(["hello, how are you?"])
    print(resp.generations[0][0].text)


async def generate_concurrently():
    llm = OpenAI(temperature=0.9)
    tasks = [async_generate(llm) for _ in range(5)]
    await asyncio.gather(*tasks)

In [31]:
import time

start = time.perf_counter()
await generate_concurrently()
elapsed = time.perf_counter() - start
elapsed



I'm doing well, thank you. How about you?


I'm doing well, thanks. How about you?


Hi there, I'm doing well. How about you?


I'm doing well, thank you. How about you?

I'm doing well, thank you. How are you?


1.4431627320000189

In [34]:
start = time.perf_counter()
generate_serially()
elapsed = time.perf_counter() - start
elapsed



I'm doing well, thank you for asking! How about you?


I'm doing well, thank you for asking! How about you?


I'm doing great, thank you for asking. How are you?


I'm doing well, thank you. How about yourself?


I'm doing great, thanks for asking! How about you?


4.626748034000002

### Writing a custom LLM wrapper

Just extends the `LLM` class, and implement:

- a `_call` method that takes in a string, some optional stop words, and returns a string
- an `_identifying_params` (optional) property that is used to help with printing of this class. Should return a dict.

In [35]:
from typing import Any, List, Mapping, Optional

from langchain.llms.base import LLM

In [36]:
class CustomLLM(LLM):
    n: int

    @property
    def _llm_type(self) -> str:
        return "custom"

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        if stop is not None:
            raise ValueError("stop kwargs are not permitted")
        return prompt[: self.n]

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying params."""
        return {"n": self.n}

In [39]:
llm = CustomLLM(n=10)
llm("This is a foobar string")

'This is a '

Printing the LLM will print the `_identifying params`.

In [38]:
print(llm)

[1mCustomLLM[0m
Params: {'n': 10}


### Using the fake LLM

Why should you use the fake LLM?
- mock out calls to LLM 
- simulate what would happen if the LLM responded in a certain way

In [40]:
from langchain.agents import initialize_agent, load_tools
from langchain.llms.fake import FakeListLLM

In [55]:
tools = load_tools(["python_repl"])
responses = ["Action: Python REPL\nAction Input: print(2 + 2)", "Final Answer: 4"]
llm = FakeListLLM(responses=responses)
llm

FakeListLLM(cache=None, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x11883ccd0>, responses=['Action: Python REPL\nAction Input: print(2 + 2)', 'Final Answer: 4'], i=0)

In [56]:
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)
agent.run("what is 2 + 2")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: Python REPL
Action Input: print(2 + 2)[0m
Observation: [36;1m[1;3m4
[0m
Thought:[32;1m[1;3mFinal Answer: 4[0m

[1m> Finished chain.[0m


'4'

### Caching LLM calls

You can cache the results of individual LLM calls. The types of cache available:

- in memory cache
- sqlite cache
- redis cache
- [GPTCache](https://github.com/zilliztech/GPTCache)
- SQLAlchemy Cache


You can also turn off caching for specific LLMs.



### Saving and loading LLMs

In [57]:
from langchain.llms import OpenAI
from langchain.llms.loading import load_llm

llm = OpenAI(model_name="text-ada-001", n=2, best_of=2)
llm.save("llm.json")
llm.save("llm.yaml")

In [58]:
!cat llm.json

{
    "model_name": "text-ada-001",
    "temperature": 0.7,
    "max_tokens": 256,
    "top_p": 1,
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "n": 2,
    "best_of": 2,
    "request_timeout": null,
    "logit_bias": {},
    "_type": "openai"
}

In [59]:
!cat llm.yaml

_type: openai
best_of: 2
frequency_penalty: 0
logit_bias: {}
max_tokens: 256
model_name: text-ada-001
n: 2
presence_penalty: 0
request_timeout: null
temperature: 0.7
top_p: 1


In [60]:
load_llm("llm.yaml")

OpenAI(cache=None, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x11883ccd0>, client=<class 'openai.api_resources.completion.Completion'>, model_name='text-ada-001', temperature=0.7, max_tokens=256, top_p=1.0, frequency_penalty=0.0, presence_penalty=0.0, n=2, best_of=2, model_kwargs={}, openai_api_key=None, batch_size=20, request_timeout=None, logit_bias={}, max_retries=6, streaming=False)

In [61]:
load_llm("llm.json")

OpenAI(cache=None, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x11883ccd0>, client=<class 'openai.api_resources.completion.Completion'>, model_name='text-ada-001', temperature=0.7, max_tokens=256, top_p=1.0, frequency_penalty=0.0, presence_penalty=0.0, n=2, best_of=2, model_kwargs={}, openai_api_key=None, batch_size=20, request_timeout=None, logit_bias={}, max_retries=6, streaming=False)

### Tracking Token Usage

Currently it is only implemented for OpenAI API

In [63]:
from langchain.callbacks import get_openai_callback
from langchain.llms import OpenAI

# Cheapest model: https://platform.openai.com/docs/models/overview
llm = OpenAI(model_name="text-ada-001", n=2, best_of=2)

with get_openai_callback() as cb:
    result = llm("tell me a joke")
    print(result)
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Successful Requests: {cb.successful_requests}")
    print(f"Total Cost (USD): ${cb.total_cost}")



Why did the chicken cross the road?

To get to the other side!
Total Tokens: 42
Prompt Tokens: 4
Completion Tokens: 38
Successful Requests: 1
Total Cost (USD): $1.6800000000000002e-05
