# LLMs

Modelos de Linguagem Graandes (LLMs, na sigla em inglês) são um componente central do LangChain. O LangChain não fornece seus próprios LLMs, mas sim oferece uma interface padrão para interagir com diversos LLMs diferentes. Para ser específico, essa interface é uma que recebe como entrada uma string e retorna uma string.

Existem muitos provedores de LLMs (OpenAI, Cohere, Hugging Face, etc) - a classe LLM é projetada para fornecer uma interface padrão para todos eles.

In [8]:
from dotenv import load_dotenv
load_dotenv()

from langchain.llms import OpenAI

llm = OpenAI(model='gpt-3.5-turbo-instruct')

  llm = OpenAI(model='gpt-3.5-turbo-instruct')


### Chamando a LLM

In [3]:
question = 'Tell a brief story about learning how to program'
for text in llm.stream(question):
    print(text, end='')



Samantha had always been fascinated by technology and computers. She spent hours tinkering with gadgets and exploring different software programs. As she grew older, she became more and more interested in how these programs were created and decided to learn how to code.

At first, Samantha was overwhelmed by the seemingly complex world of programming. She didn't know where to start or what language to learn. But with determination, she began reading books and watching online tutorials. She started with the basics of HTML and CSS, slowly building her knowledge and skills.

As she continued to learn, Samantha faced challenges and encountered errors in her code. But she didn't give up. She would spend hours debugging and troubleshooting, determined to find a solution. Over time, she started to understand the logic behind programming and could write simple programs on her own.

Samantha's hard work paid off when she landed an internship at a software company. There, she had the opportuni

### Chamadas simultâneas

In [4]:
questions = [
    'What is sky?',
    'What is Earth?',
    'What are stars?'
]

llm.batch(questions)

["\n\nSky is the atmosphere or outer space seen from the Earth. It is the area above the Earth's surface that appears to be blue during the day and black at night. The sky is made up of gases, including oxygen and nitrogen, as well as particles such as water vapor, dust, and pollutants. The sky also contains clouds, which are formed by water droplets or ice crystals suspended in the atmosphere. The sky provides a backdrop for celestial bodies such as the sun, moon, and stars. ",
 "\n\nEarth is the third planet from the Sun and the only known planet to support life. It has a diverse ecosystem with a variety of plants, animals, and microorganisms. It is a terrestrial planet with a solid surface and is primarily composed of rock and minerals. It has a thin atmosphere that protects life from harmful radiation and regulates the planet's temperature. Earth is also known for its vast oceans, which cover about 71% of its surface. It rotates on its axis, causing day and night, and orbits around

### ChatModels

ChatModels são um componente central do LangChain. 

Um modelo de chat é um modelo de linguagem que utiliza mensagens de chat como entradas e retorna mensagens de chat como saídas (ao invés de usar texto puro).

O LangChain possui integrações com vários provedores de modelos (OpenAI, Cohere, Hugging Face, etc.) e expõe uma interface padrão para interagir com todos esses modelos.

In [6]:
from langchain_openai import ChatOpenAI

chat = ChatOpenAI(model='gpt-3.5-turbo-0125')

In [8]:
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content='You are an assistant that tells jokes'),
    HumanMessage(content='Quanto é 1 + 1?')
]

answer = chat.invoke(messages)

In [9]:
print(answer.content)

Depende, se for em um mundo matemático, é 2. Mas se for em um mundo onde um pato está perto de outro pato, a resposta é "quack quack"!


In [10]:
answer.response_metadata

{'token_usage': {'completion_tokens': 45,
  'prompt_tokens': 27,
  'total_tokens': 72,
  'completion_tokens_details': {'accepted_prediction_tokens': 0,
   'audio_tokens': 0,
   'reasoning_tokens': 0,
   'rejected_prediction_tokens': 0},
  'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}},
 'model_name': 'gpt-3.5-turbo-0125',
 'system_fingerprint': None,
 'finish_reason': 'stop',
 'logprobs': None}

In [11]:
# Usando streams

from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content='You are an assistant that tells jokes'),
    HumanMessage(content='Quanto é 1 + 1?')
]

for text in chat.stream(messages):
    print(text.content, end = '')


Depende, estamos falando de matemática ou de um jogo de adivinhação? Porque se for matemática, a resposta é 2. Mas se for um jogo de adivinhação, a resposta poderia ser "onze" se você juntar os dois números!

Existem 5 tipos de mensagens diferentes:
 - HumanMessage: representa uma mensagem do usuário. Geralmente consiste apenas de conteúdo.
 - AIMessage: representa uma mensagem do modelo. Pode ter additional_kwargs incluídos - por exemplo, tool_calls se estiver usando chamadas de ferramentas da OpenAI.
 - SystemMessage: representa uma mensagem do sistema, que indica ao modelo como se comportar. Geralmente consiste apenas de conteúdo. Nem todo modelo suporta isso.
 - FunctionMessage: representa o resultado de uma chamada de função. Além do papel e conteúdo, esta mensagem tem um parâmetro de nome que transmite o nome da função que foi chamada para produzir este resultado.
 - ToolMessage: representa o resultado de uma chamada de ferramenta. Isso é diferente de uma FunctionMessage a fim de corresponder aos tipos de mensagens de função e ferramenta da OpenAI. Além do papel e conteúdo, essa mensagem tem um parâmetro tool_call_id que transmite o id da chamada à ferramenta que foi feita para produzir esse resultado.

### Prompt Few-Shot

In [4]:
from langchain_openai import ChatOpenAI

chat = ChatOpenAI()

In [5]:
from langchain_core.messages import HumanMessage, AIMessage

messages = [
    HumanMessage(content='How much is 1 + 1?'),
    AIMessage(content='2'),
    HumanMessage(content='How much is 10 * 5?'),
    AIMessage(content='50'),
    HumanMessage(content='How much is 10 + 3?'),
]

chat.invoke(messages)

AIMessage(content='13', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 2, 'prompt_tokens': 52, 'total_tokens': 54, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-02421162-b6ed-4c41-894a-50044c094279-0', usage_metadata={'input_tokens': 52, 'output_tokens': 2, 'total_tokens': 54, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

## Utilizando outros modelos

In [2]:
from langchain_community.chat_models.huggingface import ChatHuggingFace
from langchain_community.llms.huggingface_endpoint import HuggingFaceEndpoint

In [4]:
model = 'mistralai/Mixtral-8x7B-Instruct-v0.1'

llm = HuggingFaceEndpoint(repo_id=model)
chat = ChatHuggingFace(llm=llm)

In [5]:
from langchain_core.messages import HumanMessage, AIMessage

messages = [
    HumanMessage(content='How much is 1 + 1?'),
    AIMessage(content='2'),
    HumanMessage(content='How much is 10 * 5?'),
    AIMessage(content='50'),
    HumanMessage(content='How much is 10 + 3?'),
]

chat.invoke(messages)

AIMessage(content=' 13. The sum of 10 and 3 is 13.', additional_kwargs={}, response_metadata={}, id='run-f3ccd200-a853-49a2-9b2f-ced044b8c48f-0')

A estrutura de chat_model utiliza a estrutura de LLM como backend

In [13]:
import langchain

langchain.debug = True
chat.invoke(messages)
langchain.debug = False

[32;1m[1;3m[llm/start][0m [1m[llm:ChatOpenAI] Entering LLM run with input:
[0m{
  "prompts": [
    "System: You are an assistant that tells jokes\nHuman: Quanto é 1 + 1?"
  ]
}
[36;1m[1;3m[llm/end][0m [1m[llm:ChatOpenAI] [0ms] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": "Depende, em que base você está contando? Em base 2, 1 + 1 = 10!",
        "generation_info": {
          "finish_reason": "stop",
          "logprobs": null
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "Depende, em que base você está contando? Em base 2, 1 + 1 = 10!",
            "additional_kwargs": {
              "refusal": null
            },
            "response_metadata": {
              "token_usage": {
                "com

## Caching

### Cache em memória

In [9]:
from langchain_openai.chat_models import ChatOpenAI

chat = ChatOpenAI(model='gpt-3.5-turbo-0125')

In [10]:
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content='You are an assistant that tells jokes'),
    HumanMessage(content='Quanto é 1 + 1?')
]

In [11]:
from langchain.cache import InMemoryCache
from langchain.globals import set_llm_cache

set_llm_cache(InMemoryCache())

Executando pela primeira vez

In [15]:
%%time

chat.invoke(messages)

CPU times: total: 0 ns
Wall time: 0 ns


AIMessage(content='Depende, em que base você está contando? Em base 2, 1 + 1 = 10!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 26, 'prompt_tokens': 27, 'total_tokens': 53, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-d057a4f7-4cb2-4041-8494-da3200a57a53-0', usage_metadata={'input_tokens': 27, 'output_tokens': 26, 'total_tokens': 53, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

### Cache SQLite

In [16]:
from langchain.cache import SQLiteCache
from langchain.globals import set_llm_cache

set_llm_cache(SQLiteCache(database_path='files/langchain_cache_db.sqlite'))

In [17]:
%%time

chat.invoke(messages)

CPU times: total: 0 ns
Wall time: 685 ms


AIMessage(content='Depende, estás a falar de matemática ou de amor? Porque, se for matemática, é 2. Mas se for amor, é 11!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 39, 'prompt_tokens': 27, 'total_tokens': 66, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-faff50d0-3c1d-4b51-b231-d8f149fa5778-0', usage_metadata={'input_tokens': 27, 'output_tokens': 39, 'total_tokens': 66, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

Executando novamente

In [18]:
%%time

chat.invoke(messages)

CPU times: total: 93.8 ms
Wall time: 87.4 ms


AIMessage(content='Depende, estás a falar de matemática ou de amor? Porque, se for matemática, é 2. Mas se for amor, é 11!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 39, 'prompt_tokens': 27, 'total_tokens': 66, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-faff50d0-3c1d-4b51-b231-d8f149fa5778-0', usage_metadata={'input_tokens': 27, 'output_tokens': 39, 'total_tokens': 66, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})