# LangChain

This notebook has RAG example with Ollama.

Following this tutorial: [Build a Retrieval Augmented Generation (RAG) App
](https://python.langchain.com/docs/tutorials/rag/)

## What is RAG?

RAG is a technique for augmenting LLM knowledge with additional data.

LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model's cutoff date, you need to augment the knowledge of the model with the specific information it needs. The process of bringing and inserting appropriate information into the model prompt is known as Retrieval Augmented Generation (RAG).

LangChain has a number of components designed to help build Q&A applications, and RAG applications more generally.

[reference](https://python.langchain.com/docs/tutorials/rag/#what-is-rag).

## Concepts



In [1]:
# Imports
import requests
import time
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM

In [2]:
# This is the template of how the LLM should behave
template = """Question: {question}

Answer: Let's think step by step."""

In [3]:
# Instantiating the prompt object
prompt = ChatPromptTemplate.from_template(template)

In [4]:
# Adjusts this variables as needed
# base_url="http://ollama.tempobr.com"
# base_url="http://ollama:11434"
base_url="http://ollama-cpu:11434"
model="gemma2"
model = OllamaLLM(base_url=base_url, model=model)

In [5]:
# Before send the request, test if the integration with the Ollama server is running:
response = requests.get(base_url)
if response.status_code == 200:
    print(response.text)
else:
    print(f"Error: {response.status_code}")

Ollama is running


In [6]:
# List all the models on the Ollama server
response = requests.get(base_url+"/api/tags")
if response.status_code == 200:
    print(response.text)
else:
    print(f"Error: {response.status_code}")

{"models":[{"name":"gemma2:latest","model":"gemma2:latest","modified_at":"2024-10-30T21:42:24.954979883Z","size":5443152417,"digest":"ff02c3702f322b9e075e9568332d96c0a7028002f1a5a056e0a6784320a4db0b","details":{"parent_model":"","format":"gguf","family":"gemma2","families":["gemma2"],"parameter_size":"9.2B","quantization_level":"Q4_0"}}]}


In [7]:
# Run the prompt on the LLM
chain = prompt | model

start_time = time.time()
chain.invoke({"question": "What is LangChain?"})
end_time = time.time()

In [8]:
# Calculating the execution time

execution_time = end_time - start_time
print("Execution Time:", execution_time)

Execution Time: 41.726380348205566
