# RAG From Scratch

## Resources
- [Youtube Course](https://www.youtube.com/watch?v=sVcwVQRHIc8)
- [Github](https://github.com/langchain-ai/rag-from-scratch)
- [LangChain -Freecodecamp](https://www.freecodecamp.org/news/beginners-guide-to-langchain/)


## How to Use LangChain to Build With LLMs – A Beginner's Guide

- [LangChain - Python Library](https://python.langchain.com/v0.2/docs/introduction/)

### Project Setup

In [None]:
%%capture
!pip install langchain_core langchain_anthropic

#### Export ANTHROPIC_API_KEY

In [None]:
!export ANTHROPIC_API_KEY=sk-ant-api03-..............

In [None]:
import os
from google.colab import userdata
api_key = userdata.get("ANTHROPIC_API_KEY")

# print(api_key)


In [None]:
from langchain_anthropic import ChatAnthropic

In [None]:
chat_model = ChatAnthropic(
    model="claude-3-sonnet-20240229",
    temperature=0,
    api_key=api_key
)

The `model` parameter is a string that matches one of [Anthropic’s supported models](https://docs.anthropic.com/claude/docs/models-overview#model-comparison). At the time of writing, Claude 3 Sonnet strikes a good balance between speed, cost, and reasoning capability.

`temperature` is a measure of the amount of randomness the model uses to generate responses. For consistency, in this tutorial, we set it to `0` but you can experiment with higher values for creative use cases.

Now, let’s try running it:

In [None]:
chat_model.invoke("Tell me a joke about bears!")

AIMessage(content="Here's a bear joke for you:\n\nWhy did the bear dissolve in water?\nBecause it was a polar bear!", response_metadata={'id': 'msg_01LZCLdUs6i6v2PBZHsffYAW', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 14, 'output_tokens': 30}}, id='run-1db9df40-af4a-45dc-ad79-4c9197d1a9c1-0', usage_metadata={'input_tokens': 14, 'output_tokens': 30, 'total_tokens': 44})

In [None]:
from langchain_core.messages import HumanMessage

In [None]:
chat_model.invoke([
    HumanMessage("Tell me a joke about bears!")
])

AIMessage(content="Here's a bear joke for you:\n\nWhy did the bear dissolve in water?\nBecause it was a polar bear!", response_metadata={'id': 'msg_01Pbwx3fQ8oFse5PHxjbSirq', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 14, 'output_tokens': 30}}, id='run-2a62538b-c9a6-4d83-974d-4e6bc9599179-0', usage_metadata={'input_tokens': 14, 'output_tokens': 30, 'total_tokens': 44})

### Prompt Templates

Models are useful on their own, but it’s often convenient to parameterize inputs so that you don’t repeat boilerplate. LangChain provides [Prompt Templates](https://python.langchain.com/docs/modules/model_io/prompts/) for this purpose.

![prompts](https://www.freecodecamp.org/news/content/images/2024/04/prompt_and_model--1-.png)

Prompt templates in LangChain

A simple example would be something like this:

In [None]:
from langchain_core.prompts import ChatPromptTemplate

In [None]:
joke_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a world class comedian."),
    ("human", "Tell me a joke about {topic}")
])

You can apply the templating using the same .invoke() method as with Chat Models:

In [None]:
joke_prompt.invoke({"topic": "beats"})

ChatPromptValue(messages=[SystemMessage(content='You are a world class comedian.'), HumanMessage(content='Tell me a joke about beats')])

Let’s go over each step:

- You construct a prompt template consisting of templates for a `SystemMessage` and a `HumanMessage` using `from_messages`.
- You can think of `SystemMessages` as meta-instructions that are not part of the current conversation, but purely guide input.
- The prompt template contains `{topic}` in curly braces. This denotes a required parameter named `"topic"`.
- You invoke the prompt template with a dict with a key named `"topic"` and a value `"beets"`.
- The result contains the formatted messages.

Next, we'll learn how to use this prompt template with your Chat Model.


### Chaining
You may have noticed that both the Prompt Template and Chat Model implement the `.invoke()` method. In LangChain terms, they are both instances of [Runnables](https://python.langchain.com/docs/expression_language/interface/).

You can compose Runnables into “chains” using the pipe (`|`) operator where you `.invoke()` the next step with the output of the previous one. Here’s an example:

In [None]:
chain = joke_prompt | chat_model

In [None]:
chain

ChatPromptTemplate(input_variables=['topic'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are a world class comedian.')), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['topic'], template='Tell me a joke about {topic}'))])
| ChatAnthropic(model='claude-3-sonnet-20240229', temperature=0.0, anthropic_api_url='https://api.anthropic.com', anthropic_api_key=SecretStr('**********'), _client=<anthropic.Anthropic object at 0x7a0958f26620>, _async_client=<anthropic.AsyncAnthropic object at 0x7a0958f26dd0>)

The resulting `chain` is itself a Runnable and automatically implements `.invoke()` (as well as several other methods, as we’ll see later). This is the foundation of [LangChain Expression Language (LCEL)](https://python.langchain.com/docs/expression_language/get_started/).

Let’s invoke this new chain:

In [None]:
chain.invoke({"topic": "beets"})

AIMessage(content="Here's a beet joke for you:\n\nWhy did the beet blush? Because it saw the salad dressing!", response_metadata={'id': 'msg_01PJML6wq7vAes7sWoeUF5zn', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 21, 'output_tokens': 30}}, id='run-70636f8f-8d4b-42e7-8ab2-543badca3cfc-0', usage_metadata={'input_tokens': 21, 'output_tokens': 30, 'total_tokens': 51})

Now, let’s say you want to work with just the raw string output of the message. LangChain has a component called an [Output Parser](https://python.langchain.com/docs/modules/model_io/output_parsers/), which, as the name implies, is responsible for parsing the output of a model into a more accessible format. Since composed chains are also Runnable, you can again use the pipe operator:

In [None]:
from langchain_core.output_parsers import StrOutputParser

In [None]:
str_chain = chain | StrOutputParser()

# Equivalent to:
# str_chain = joke_prompt | chat_model | StrOutputParser()

In [None]:
joke = str_chain.invoke({"topic": "beets"})
joke

"Here's a beet joke for you:\n\nWhy did the beet blush? Because it saw the salad dressing!"

In [None]:
print(joke)

Here's a beet joke for you:

Why did the beet blush? Because it saw the salad dressing!


You still pass `{"topic": "beets"}` as input to the new `str_chain` because the first Runnable in the sequence is still the Prompt Template you declared before.

**Prompt model and output parser**

![prompt chain](https://www.freecodecamp.org/news/content/images/2024/04/prompt_model_and_output_parser--1-.png)

### Streaming
One of the biggest advantages to composing chains with LCEL is the streaming experience.

All Runnables implement the `.stream()` method (and `.astream()` if you’re working in async environments), including chains. This method returns a generator that will yield output as soon as it’s available, which allows us to get output as quickly as possible.

While every Runnable implements `.stream()`, not all of them support multiple chunks. For example, if you call `.stream()` on a Prompt Template, it will just yield a single chunk with the same output as `.invoke()`.

You can iterate over the output using `for ... in` syntax. Try it with the `str_chain` you just declared:

In [None]:
for chunk in str_chain.stream({"topic": "beets"}):
  print(chunk, end="|")

|Here|'s a b|eet joke for| you:

Why| did the beet| bl|ush? Because it| saw| the sal|ad dressing!||

Chains composed like `str_chain` will start streaming as early as possible, which in this case is the Chat Model in the chain.

Some Output Parsers (like the `StrOutputParser` used here) and many LCEL [Primitives](https://python.langchain.com/docs/expression_language/primitives/) are able to process streamed chunks from previous steps as they are generated – essentially acting as transform streams or passthroughs – and do not disrupt streaming.

### How to Guide Generation with Context
LLMs are trained on large quantities of data and have some innate “knowledge” of various topics. Still, it’s common to pass the model private or more specific data as context when answering to glean useful information or insights. If you've heard the term "RAG", or "retrieval-augmented generation" before, this is the core principle behind it.

One of the simplest examples of this is telling the LLM what the current date is. Because LLMs are snapshots of when they are trained, they can’t natively determine the current time. Here’s an example:

In [None]:
# chat_model = ChatAnthropic(model_name="claude-3-sonnet-20240229")

chat = chat_model.invoke("What is the current date?")
chat

AIMessage(content="Unfortunately, I don't actually have a concept of the current date and time. As an AI assistant without an integrated calendar, I don't have a way to track the specific date. I can only provide responses based on the conversational context provided to me.", response_metadata={'id': 'msg_019MwjWTjy99fSN3WUCoYEkX', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 13, 'output_tokens': 55}}, id='run-afb5a49d-9b31-4780-8c09-e1d7b51265b8-0', usage_metadata={'input_tokens': 13, 'output_tokens': 55, 'total_tokens': 68})

In [None]:
chat.content

"Unfortunately, I don't actually have a concept of the current date and time. As an AI assistant without an integrated calendar, I don't have a way to track the specific date. I can only provide responses based on the conversational context provided to me."

Now, let’s see what happens when you give the model the current date as context:

In [None]:
from datetime import date

In [None]:
prompt = ChatPromptTemplate.from_messages([
    ("system", "You know that the current date is '{current_date}'."),
    ("human", "{question}")
])

chain = prompt | chat_model | StrOutputParser()

chain.invoke({
    "question": "What is the current date?",
    "current_date": date.today()
})

'The current date is 2024-08-16.'

Nice! Now, let's take it a step further. Language models are trained on vast quantities of data, but they don't know everything. Here's what happens if you directly ask the Chat Model a very specific question about a local restaurant:

In [None]:
chat_model.invoke(
    "What was the Old Ship Saloon's total revenue in Q1 2023?"
 )

AIMessage(content="I'm sorry, I don't have access to specific financial data for a particular business like the Old Ship Saloon. As an AI assistant without direct connections to private company records, I don't have information about their revenues or other confidential financial details.", response_metadata={'id': 'msg_015qFped3YvjAeSxv3dt1pUx', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 25, 'output_tokens': 55}}, id='run-bb3a8b2f-4fd3-4fcf-bdf9-54149c96fd08-0', usage_metadata={'input_tokens': 25, 'output_tokens': 55, 'total_tokens': 80})

The model doesn't know the answer natively, or even know which of the many Old Ship Saloons in the world we may be talking about:

However, if we can give the model more context, we can guide it to come up with a good answer:

In [None]:
SOURCE = """
Old Ship Saloon 2023 quarterly revenue numbers:
Q1: $174782.38
Q2: $467372.38
Q3: $474773.38
Q4: $389289.23
"""

rag_prompt = ChatPromptTemplate.from_messages([
    ("system", 'You are a helpful assistant. Use the following context when responding:\n\n{context}.'),
    ("human", "{question}")
])

rag_chain = rag_prompt | chat_model | StrOutputParser()

rag_chain.invoke({
    "question": "What was the Old Ship Saloon's total revenue in Q1 2023?",
    "context": SOURCE
})

"According to the provided context, the Old Ship Saloon's revenue in Q1 2023 was $174,782.38."

The result looks good! Note that augmenting generation with additional context is a very deep topic - in the real world, this would likely take the form of a longer financial document or portion of a document retrieved from some other data source. RAG is a powerful technique to answer questions over large quantities of information.

You can check out [LangChain’s retrieval-augmented generation (RAG) docs](https://python.langchain.com/docs/use_cases/question_answering/) to learn more.

### Debugging
Because LLMs are non-deterministic, it becomes more and more important to see the internals of what’s going on as your chains get more complex.

LangChain has a `set_debug()` method that will return more granular logs of the chain internals: Let’s see it with the above example.

First, we'll need to install the main `langchain` package for the entrypoint to import the method:

In [None]:
%%capture
!pip install langchain

In [None]:
from langchain.globals import set_debug

set_debug(True)

from datetime import date

prompt = ChatPromptTemplate.from_messages([
    ("system", 'You know that the current date is "{current_date}".'),
    ("human", "{question}")
])

chain = prompt | chat_model | StrOutputParser()

chain.invoke({
    "question": "What is the current date?",
    "current_date": date.today()
})

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] Entering Prompt run with input:
[0m[inputs]
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] [2ms] Exiting Prompt run with output:
[0m[outputs]
[32;1m[1;3m[llm/start][0m [1m[chain:RunnableSequence > llm:ChatAnthropic] Entering LLM run with input:
[0m{
  "prompts": [
    "System: You know that the current date is \"2024-08-16\".\nHuman: What is the current date?"
  ]
}
[36;1m[1;3m[llm/end][0m [1m[chain:RunnableSequence > llm:ChatAnthropic] [698ms] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": "The current date is 2024-08-16.",
        "generation_info": null,
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
       

'The current date is 2024-08-16.'

You can see this [guide](https://python.langchain.com/docs/guides/development/debugging/) for more information on debugging.

You can also use the `astream_events()` [method](https://python.langchain.com/docs/expression_language/streaming/#using-stream-events) to return this data. This is useful if you want to use intermediate steps in your application logic. Note that this is an async method, and requires an extra `version` flag since it’s still in beta:

In [None]:
# Turn off debug mode for clarity
set_debug(False)

async def astream_events():
  stream = chain.astream_events({
      "question": "What is the current date?",
      "current_date": date.today()
  }, version="v1")

  async for event in stream:
      print(event)
      print("-----")

In [None]:
await astream_events()

{'event': 'on_chain_start', 'run_id': 'c924aadc-b89a-4818-ae83-0619c4beea15', 'name': 'RunnableSequence', 'tags': [], 'metadata': {}, 'data': {'input': {'question': 'What is the current date?', 'current_date': datetime.date(2024, 8, 16)}}, 'parent_ids': []}
-----
{'event': 'on_prompt_start', 'name': 'ChatPromptTemplate', 'run_id': '37a31e6c-47e7-41fb-aeaf-cfb85808126a', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'input': {'question': 'What is the current date?', 'current_date': datetime.date(2024, 8, 16)}}, 'parent_ids': []}
-----
{'event': 'on_prompt_end', 'name': 'ChatPromptTemplate', 'run_id': '37a31e6c-47e7-41fb-aeaf-cfb85808126a', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'input': {'question': 'What is the current date?', 'current_date': datetime.date(2024, 8, 16)}, 'output': ChatPromptValue(messages=[SystemMessage(content='You know that the current date is "2024-08-16".'), HumanMessage(content='What is the current date?')])}, 'parent_ids': []}
-----
{'event': 'on_cha

  warn_beta(


{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': '779601d2-d8c1-444f-a1df-b46322b3e9b3', 'tags': ['seq:step:2'], 'metadata': {'ls_provider': 'anthropic', 'ls_model_name': 'claude-3-sonnet-20240229', 'ls_model_type': 'chat', 'ls_temperature': 0.0, 'ls_max_tokens': 1024}, 'data': {'chunk': AIMessageChunk(content='', id='run-779601d2-d8c1-444f-a1df-b46322b3e9b3', usage_metadata={'input_tokens': 28, 'output_tokens': 0, 'total_tokens': 28})}, 'parent_ids': []}
-----
{'event': 'on_parser_start', 'name': 'StrOutputParser', 'run_id': 'd96de59a-630b-4279-93c9-0773394d24a7', 'tags': ['seq:step:3'], 'metadata': {}, 'data': {}, 'parent_ids': []}
-----
{'event': 'on_parser_stream', 'name': 'StrOutputParser', 'run_id': 'd96de59a-630b-4279-93c9-0773394d24a7', 'tags': ['seq:step:3'], 'metadata': {}, 'data': {'chunk': ''}, 'parent_ids': []}
-----
{'event': 'on_chain_stream', 'run_id': 'c924aadc-b89a-4818-ae83-0619c4beea15', 'tags': [], 'metadata': {}, 'name': 'RunnableSequence', 'da

Finally, you can use an external service like [LangSmith](https://smith.langchain.com/) to add tracing. Here’s an example:

In [None]:
%%capture
!pip install -U langsmith

In [None]:
# Sign up at <https://smith.langchain.com/>
# Set environment variables

import os

from google.colab import userdata

set_debug(False)

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = f"https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = userdata.get("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_PROJECT"] = "pr-somber-escalator-100"

chain.invoke({
  "question": "What is the current date?",
  "current_date": date.today()
})

'The current date is 2024-08-16.'

LangSmith will capture the internals at each step, giving you a result [like this](https://smith.langchain.com/public/628a15bb-45c8-4d39-987a-2896684a66c2/r).

We can also tweak prompts and rerun model calls in a playground. Due to the non-deterministic nature of LLMs, you can also tweak prompts and rerun model calls in a playground, as well as create datasets and test cases to evaluate changes to your app and catch regressions.