# LangChain

This notebook has RAG example with Ollama.

Following this tutorial: [Build a Retrieval Augmented Generation (RAG) App
](https://python.langchain.com/docs/tutorials/rag/)

## What is RAG?

RAG is a technique for augmenting LLM knowledge with additional data.

LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model's cutoff date, you need to augment the knowledge of the model with the specific information it needs. The process of bringing and inserting appropriate information into the model prompt is known as Retrieval Augmented Generation (RAG).

LangChain has a number of components designed to help build Q&A applications, and RAG applications more generally.

[reference](https://python.langchain.com/docs/tutorials/rag/#what-is-rag).

## Concepts

![rag_indexing](images/rag_indexing.png)
[reference](https://python.langchain.com/docs/tutorials/rag/#indexing)

## RAG steps

1. **Load**: First we need to load our data. This is done with Document Loaders.   
2. **Split**: Text splitters break large Documents into smaller chunks. This is useful both for indexing data and passing it into a model, as large chunks are harder to search over and won't fit in a model's finite context window.
3. **Embed**: The embbedings were generated and are ready to be stored.
4. **Store**: We need somewhere to store and index our splits, so that they can be searched over later. This is often done using a VectorStore and Embeddings model.
5. **Retrieve**: Given a user input, relevant splits are retrieved from storage using a Retriever.
6. **Generate**: A ChatModel / LLM produces an answer using a prompt that includes both the question with the retrieved data

## Retrieval and generation

![rag_retrieval_generation](images/rag_retrieval_generation.png)
[reference](https://python.langchain.com/docs/tutorials/rag/#retrieval-and-generation)

## Building the example

The Langchain tutorial, asks to use a "LLM service" like openai. In this example, i'll use my local Ollama server with local LLMs.

To do that I used this tutorial [OllamaEmbeddings](https://python.langchain.com/docs/integrations/text_embedding/ollama/)

In [1]:
# Installing packages
%pip install --quiet --upgrade langchain langchain-community langchain-chroma
# Installing Ollama embeddings
%pip install -qU langchain-ollama

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
# imports

# Ollama local imports
import requests
import time
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM

# Ollama embeddings
from langchain_ollama import OllamaEmbeddings
# Adjusts this variables as needed
# base_url="http://ollama.tempobr.com"
# base_url="http://ollama:11434"
base_url="http://ollama-cpu:11434"
model="llama3.2"
embeddings = OllamaEmbeddings(
    base_url=base_url,
    model=model,
    
)

In [3]:
# Create a vector store with a sample text
from langchain_core.vectorstores import InMemoryVectorStore

text = "LangChain is the framework for building context-aware reasoning applications"

vectorstore = InMemoryVectorStore.from_texts(
    [text],
    embedding=embeddings,
)

In [4]:
# Use the vectorstore as a retriever
retriever = vectorstore.as_retriever()

# Retrieve the most similar text
retrieved_documents = retriever.invoke("What is LangChain?")

# show the retrieved document's content
retrieved_documents[0].page_content

'LangChain is the framework for building context-aware reasoning applications'

In [5]:
single_vector = embeddings.embed_query(text)
print(str(single_vector)[:100])  # Show the first 100 characters of the vector

[-0.012316747, 0.0007064336, 0.016245157, -0.020877853, -0.0022321339, -0.011591902, -0.011317743, 0


In [6]:
text2 = (
    "LangGraph is a library for building stateful, multi-actor applications with LLMs"
)
two_vectors = embeddings.embed_documents([text, text2])
for vector in two_vectors:
    print(str(vector)[:100])  # Show the first 100 characters of the vector

[-0.012316747, 0.0007064336, 0.016245157, -0.020877853, -0.0022321339, -0.011591902, -0.011317743, 0
[-0.004660674, 0.0026602747, 0.004631577, -0.0038133785, -0.0017676974, -0.017241802, -0.0041689477,


In [7]:
vectorstore = InMemoryVectorStore.from_texts(
    [text, text2],
    embedding=embeddings,
)
retriever = vectorstore.as_retriever()

retrieved_documents = retriever.invoke("What is LangChain?")

retrieved_documents

[Document(id='392596b3-8aad-45cc-a9c3-5cf873343375', metadata={}, page_content='LangChain is the framework for building context-aware reasoning applications'),
 Document(id='976cea32-8d41-4dc5-a728-5082ec6b55e5', metadata={}, page_content='LangGraph is a library for building stateful, multi-actor applications with LLMs')]

In [8]:
# This is the template of how the LLM should behave
template = """Question: {question}

Context: {context}

Answer: Answer the question according to the context."""

In [9]:
# Instantiating the prompt object
prompt = ChatPromptTemplate.from_template(template)

In [10]:
model = OllamaLLM(base_url=base_url, model=model)

In [11]:
# Before send the request, test if the integration with the Ollama server is running:
response = requests.get(base_url)
if response.status_code == 200:
    print(response.text)
else:
    print(f"Error: {response.status_code}")

Ollama is running


In [12]:
context = "\n".join([doc.page_content for doc in retrieved_documents])
print(context)

LangChain is the framework for building context-aware reasoning applications
LangGraph is a library for building stateful, multi-actor applications with LLMs


In [13]:
# Run the prompt on the LLM
chain = prompt | model

start_time = time.time()
response = chain.invoke({"question": "What is LangChain?", "context": context})
end_time = time.time()
execution_time = end_time - start_time
print(response)
print("Execution Time:", execution_time)

LangChain is an open-source Python framework designed specifically for building context-aware reasoning applications that leverage large language models (LLMs) and graph neural networks. It enables developers to easily integrate various libraries such as Hugging Face's Transformers and PyTorch with their own custom logic, allowing them to build more sophisticated AI models tailored to specific domain requirements.
Execution Time: 8.75894021987915


In [14]:
# Run the prompt on the LLM
chain = prompt | model

start_time = time.time()
response = chain.invoke({"question": "What is LangGraph?", "context": context})
end_time = time.time()
execution_time = end_time - start_time
print(response)
print("Execution Time:", execution_time)

LangGraph is a library used in conjunction with LangChain, which enables developers to build stateful, multi-actor applications that leverage Large Language Models (LLMs). This allows for more complex and dynamic reasoning capabilities within the framework of LangChain.
Execution Time: 6.536516189575195
