# How to Use LangChain to Build With LLMs - A Beginner's Guide

This notebook will walk through the fundamentals of building with LLMs and LangChain's Python library. The only requirement is basic familiarity with Python, - no machine learning experience needed!

It accompanies [this freeCodeCamp article](https://www.freecodecamp.org/news/beginners-guide-to-langchain), which has additional explanations of steps and concepts.

This guide defaults to Anthropic and their Claude 3 Chat Models, but LangChain also has a [wide range of other integrations](https://python.langchain.com/docs/integrations/chat/) to choose from, including OpenAI models like GPT-4.

Let's first install the required dependencies:

In [1]:
#%pip install -qU langchain_core

In [None]:
key = "key"

You can initialize a model like this:

In [None]:
import getpass
import os

if not os.getenv("HUGGINGFACEHUB_API_TOKEN"):
    os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass.getpass("Enter your token: ")

Enter your token: ··········


In [None]:
%pip install --upgrade --quiet  langchain-huggingface text-generation transformers google-search-results numexpr langchainhub sentencepiece jinja2 bitsandbytes accelerate

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m137.5/137.5 MB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m245.3/245.3 kB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for google-search-results (setup.py) ... [?25l[?25hdone


In [None]:
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    repo_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    max_new_tokens=512,
    do_sample=False,
    repetition_penalty=1.03,
)

chat_model = ChatHuggingFace(llm=llm)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

Then, we can invoke it like this:

In [None]:
chat_model.invoke("Tell me a joke about bears!")

AIMessage(content='Why did the bear break up with his girlfriend?\n\nBecause she kept being a pusher bear!\n\n(The punchline plays on the words "pusher" and "bear" both having similar beginning sounds, with "pusher bear" being a playful and made-up word for a bear that tries to influence or persuade an addiction in their partner.)', additional_kwargs={}, response_metadata={'token_usage': ChatCompletionOutputUsage(completion_tokens=77, prompt_tokens=30, total_tokens=107), 'model': '', 'finish_reason': 'eos_token'}, id='run-cd021e00-4619-477b-a6c4-25590260df9d-0')

You can see that the output is something called an AIMessage. This is because Chat Models use Chat Messages as input and output.

To illustrate what's going on, you can call the above with a more explicit list of messages:

In [None]:
from langchain_core.messages import HumanMessage

chat_model.invoke([
    HumanMessage("Tell me a joke about bears!")
])

AIMessage(content='Why did the bear break up with his girlfriend?\n\nBecause she kept being a pusher bear!\n\n(The punchline plays on the words "pusher" and "bear" both having similar beginning sounds, with "pusher bear" being a playful and made-up word for a bear that tries to influence or persuade an addiction in their partner.)', additional_kwargs={}, response_metadata={'token_usage': ChatCompletionOutputUsage(completion_tokens=77, prompt_tokens=30, total_tokens=107), 'model': '', 'finish_reason': 'eos_token'}, id='run-de45205e-793c-4b39-84d1-b11d87a60e41-0')

## Prompt Templates
Models are useful on their own, but it's often convenient to parameterize inputs so that you don't repeat boilerplate. LangChain provides Prompt Templates for this purpose.

In [None]:
from langchain_core.prompts import ChatPromptTemplate

joke_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a world class comedian."),
    ("human", "Tell me a joke about {topic}")
])

In [None]:
joke_prompt.invoke({"topic": "beets"})

ChatPromptValue(messages=[SystemMessage(content='You are a world class comedian.', additional_kwargs={}, response_metadata={}), HumanMessage(content='Tell me a joke about beets', additional_kwargs={}, response_metadata={})])

## Chaining

You may have noticed that both the Prompt Template and Chat Model implement the `.invoke()` method. In LangChain terms, they are both instances of [Runnables](https://python.langchain.com/docs/expression_language/interface/).

You can compose Runnables into “chains” using the pipe (|) operator where you `.invoke()` the next step with the output of the previous one. Here's an example:

In [None]:
chain = joke_prompt | chat_model

In [None]:
chain.invoke({"topic": "beets"})

AIMessage(content='Why did the beetroot blush and run away?\n\nBecause the carrot called it crunchy!\n\n(I apologize for the poor joke, but beets might not be our funniest vegetable material.)', additional_kwargs={}, response_metadata={'token_usage': ChatCompletionOutputUsage(completion_tokens=50, prompt_tokens=51, total_tokens=101), 'model': '', 'finish_reason': 'eos_token'}, id='run-f1b9910d-3473-4656-b4bd-7c076ffa7c06-0')

The resulting chain is itself a Runnable and automatically implements `.invoke()` (as well as several other methods, as we'll see later). This is the foundation of [LangChain Expression Language (LCEL)](https://python.langchain.com/docs/expression_language/get_started/).

Now, let's say you want to work with just the raw string output of the message. LangChain has a component called an Output Parser, which, as the name implies, is responsible for parsing the output of a model into a more accessible format. Since composed chains are also Runnable, you can again use the pipe operator:

In [None]:
from langchain_core.output_parsers import StrOutputParser

str_chain = chain | StrOutputParser()

# Equivalent to:
# str_chain = joke_prompt | chat_model | StrOutputParser()

In [None]:
str_chain.invoke({"topic": "beets"})

'Why did the beetroot blush and run away?\n\nBecause the carrot called it crunchy!\n\n(I apologize for the poor joke, but beets might not be our funniest vegetable material.)'

## Streaming

One of the biggest advantages to composing chains with LCEL is the streaming experience.

All Runnables implement the `.stream()` method (and `.astream()` if you're working in async environments), including chains. This method returns a generator that will yield output as soon as it's available, which allows us to get output as quickly as possible.

While every Runnable implements `.stream()`, not all of them support multiple chunks. For example, if you call `.stream()` on a Prompt Template, it will just yield a single chunk with the same output as `.invoke()`.

In [None]:
for chunk in str_chain.stream({"topic": "beets"}):
    print(chunk, end="|")

Why did the beetroot blush and run away?

Because the carrot called it crunchy!

(I apologize for the poor joke, but beets might not be our funniest vegetable material.)|

Chains composed like str_chain will start streaming as early as possible, which in this case is the Chat Model in the chain.

Some Output Parsers (like the StrOutputParser used here) and many LCEL Primitives are able to process streamed chunks from previous steps as they are generated - essentially acting as transform streams or passthroughs - and do not disrupt streaming.

## How to Guide Generation with Context

LLMs are trained on large quantities of data and have some innate “knowledge” of various topics. Still, it's common to pass the model private or more specific data as context when answering to glean useful information or insights. If you've heard the term "RAG", or "retrieval-augmented generation" before, this is the core principle behind it.

In [None]:
chat_model.invoke("What is the current date?")

AIMessage(content="I do not have real-time access to the date and time. Please check your device's clock or any other reliable source to find the current date. Here is the current date in utc as of july 6, 2021, 2:00 pm: wednesday, july 7, 2021.\n\nto find out the current date and time in your local time zone, please use online tools or consult your device's", additional_kwargs={}, response_metadata={'token_usage': ChatCompletionOutputUsage(completion_tokens=100, prompt_tokens=29, total_tokens=129), 'model': '', 'finish_reason': 'length'}, id='run-dcaf74f1-2ebc-4947-a6e9-fb573c304a01-0')

Now, let's pass it the correct date and see what happens:

In [None]:
from datetime import date

prompt = ChatPromptTemplate.from_messages([
  ("system", 'You know that the current date is "{current_date}".'),
  ("human", "{question}")
])

chain = prompt | chat_model | StrOutputParser()

chain.invoke({
  "question": "What is the current date?",
  "current_date": date.today()
})

"I don't have real-time access to the current date or time. However, when you ask me this question, I assume that you want to know the current date as of now. I suggest you check the date on your device's clock or any other reliable source to confirm the current date."

Here's a more concrete example with a very specific question about a local restaurant:

In [None]:
chat_model.invoke(
    "What was the Old Ship Saloon's total revenue in Q1 2023?"
 )

AIMessage(content="I do not have access to real-time financial information or the financial statements of specific companies. Therefore, I am unable to provide you with the exact total revenue of the old ship saloon in q1 2023. You may refer to the company's financial reports or contact their accounting department for this information.", additional_kwargs={}, response_metadata={'token_usage': ChatCompletionOutputUsage(completion_tokens=67, prompt_tokens=43, total_tokens=110), 'model': '', 'finish_reason': 'eos_token'}, id='run-7e41be77-ee25-43b1-8f69-e371bfafa542-0')

However, if we can give the model more context, we can guide it to come up with a good answer:

In [None]:
SOURCE = """
Old Ship Saloon 2023 quarterly revenue numbers:
Q1: $174782.38
Q2: $467372.38
Q3: $474773.38
Q4: $389289.23
"""

rag_prompt = ChatPromptTemplate.from_messages([
    ("system", 'You are a helpful assistant. Use the following context when responding:\n\n{context}.'),
    ("human", "{question}")
])

rag_chain = rag_prompt | chat_model | StrOutputParser()

rag_chain.invoke({
    "question": "What was the Old Ship Saloon's total revenue in Q1 2023?",
    "context": SOURCE
})


"Based on the given context, the Old Ship Saloon's total revenue in Q1 2023 was $174,782.38."

## Debugging

Because LLMs are non-deterministic, it becomes more and more important to see the internals of what's going on as your chains get more complex.

LangChain has a `set_debug()` method that will return more granular logs of the chain internals:

In [None]:
%pip install -qU langchain

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.0 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.0 MB[0m [31m720.7 kB/s[0m eta [36m0:00:02[0m[2K   [91m━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.1/1.0 MB[0m [31m1.6 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.3/1.0 MB[0m [31m2.9 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━[0m [32m0.7/1.0 MB[0m [31m5.2 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from langchain.globals import set_debug

set_debug(True)

from datetime import date

prompt = ChatPromptTemplate.from_messages([
    ("system", 'You know that the current date is "{current_date}".'),
    ("human", "{question}")
])

chain = prompt | chat_model | StrOutputParser()

chain.invoke({
    "question": "What is the current date?",
    "current_date": date.today()
})

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] Entering Prompt run with input:
[0m[inputs]
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] [1ms] Exiting Prompt run with output:
[0m[outputs]
[32;1m[1;3m[llm/start][0m [1m[chain:RunnableSequence > llm:ChatHuggingFace] Entering LLM run with input:
[0m{
  "prompts": [
    "System: You know that the current date is \"2024-09-22\".\nHuman: What is the current date?"
  ]
}
[36;1m[1;3m[llm/end][0m [1m[chain:RunnableSequence > llm:ChatHuggingFace] [39ms] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": "I don't have real-time access to the current date or time. However, when you ask me this question, I assume that you want to know the current date as of now. I suggest you check the date on your device's clock or any othe

"I don't have real-time access to the current date or time. However, when you ask me this question, I assume that you want to know the current date as of now. I suggest you check the date on your device's clock or any other reliable source to confirm the current date."

You can see [this guide](https://python.langchain.com/docs/guides/development/debugging/) for more information on debugging.

You can also use the `astream_events()` method to return this data. This is useful if you want to use intermediate steps in your application logic. Note that this is an async method, and requires an extra version flag since it's still in beta:

In [None]:
set_debug(False)

stream = chain.astream_events({
    "question": "What is the current date?",
    "current_date": date.today()
}, version="v1")

async for event in stream:
    print(event)

{'event': 'on_chain_start', 'run_id': 'f9142e0c-2707-43a0-8641-0c2abe80950f', 'name': 'RunnableSequence', 'tags': [], 'metadata': {}, 'data': {'input': {'question': 'What is the current date?', 'current_date': datetime.date(2024, 9, 22)}}, 'parent_ids': []}
{'event': 'on_prompt_start', 'name': 'ChatPromptTemplate', 'run_id': 'a7017f12-7dfe-49fa-b77f-0cb24d84f05e', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'input': {'question': 'What is the current date?', 'current_date': datetime.date(2024, 9, 22)}}, 'parent_ids': []}
{'event': 'on_prompt_end', 'name': 'ChatPromptTemplate', 'run_id': 'a7017f12-7dfe-49fa-b77f-0cb24d84f05e', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'input': {'question': 'What is the current date?', 'current_date': datetime.date(2024, 9, 22)}, 'output': ChatPromptValue(messages=[SystemMessage(content='You know that the current date is "2024-09-22".', additional_kwargs={}, response_metadata={}), HumanMessage(content='What is the current date?', additional_kw

Finally, you can use an external service like [LangSmith](https://smith.langchain.com) to add tracing. Here's an example:

In [None]:
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "LANGCHAIN_API_KEY"

chain.invoke({
    "question": "What is the current date?",
    "current_date": date.today()
})

"I don't have real-time access to the current date or time. However, when you ask me this question, I assume that you want to know the current date as of now. I suggest you check the date on your device's clock or any other reliable source to confirm the current date."

LangSmith will capture the internals at each step, giving you a result [like this](https://smith.langchain.com/public/628a15bb-45c8-4d39-987a-2896684a66c2/r).

You can also tweak prompts and rerun model calls in a playground. Due to the non-deterministic nature of LLMs, you can also tweak prompts and rerun model calls in a playground, as well as create datasets and test cases to evaluate changes to your app and catch regressions.