<a target="_blank" href="https://colab.research.google.com/github/sergiopaniego/RAG_local_tutorial/blob/main/example_rag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Simple RAG example with Langchain, Ollama and and open-source LLM model

In this example, we first connect to an LLM locally and make request to the LLM that Ollama is serving using LangChain. After that, we generate our RAG application from a PDF file and extract details from that document.

<p align="center">
  <img src="https://cdn.analyticsvidhya.com/wp-content/uploads/2023/07/langchain3.png" alt="Langchain Logo" width="20%">
  <img src="https://bookface-images.s3.amazonaws.com/logos/ee60f430e8cb6ae769306860a9c03b2672e0eaf2.png" alt="Ollama Logo" width="20%">
</p>

Sources:

* https://github.com/svpino/llm
* https://github.com/AIAnytime/Gemma-7B-RAG-using-Ollama/blob/main/Ollama%20Gemma.ipynb
* https://www.youtube.com/watch?v=-MexTC18h20&ab_channel=AIAnytime
* https://www.youtube.com/watch?v=HRvyei7vFSM&ab_channel=Underfitted


# Requirements

* Ollama installed locally

# Install the requirements

If an error is raised related to docarray, refer to this solution: https://stackoverflow.com/questions/76880224/error-using-using-docarrayinmemorysearch-in-langchain-could-not-import-docarray

In [None]:
!pip3 install langchain
!pip3 install langchain_pinecone
!pip3 install langchain[docarray]
!pip3 install docarray
!pip3 install pypdf

Collecting langchain_pinecone
  Downloading langchain_pinecone-0.2.5-py3-none-any.whl.metadata (1.3 kB)
Collecting pinecone<7.0.0,>=6.0.0 (from pinecone[async]<7.0.0,>=6.0.0->langchain_pinecone)
  Downloading pinecone-6.0.2-py3-none-any.whl.metadata (9.0 kB)
Collecting aiohttp<3.11,>=3.10 (from langchain_pinecone)
  Downloading aiohttp-3.10.11-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting langchain-tests<1.0.0,>=0.3.7 (from langchain_pinecone)
  Downloading langchain_tests-0.3.19-py3-none-any.whl.metadata (3.2 kB)
Collecting pytest-asyncio<1,>=0.20 (from langchain-tests<1.0.0,>=0.3.7->langchain_pinecone)
  Downloading pytest_asyncio-0.26.0-py3-none-any.whl.metadata (4.0 kB)
Collecting syrupy<5,>=4 (from langchain-tests<1.0.0,>=0.3.7->langchain_pinecone)
  Downloading syrupy-4.9.1-py3-none-any.whl.metadata (38 kB)
Collecting pytest-socket<1,>=0.6.0 (from langchain-tests<1.0.0,>=0.3.7->langchain_pinecone)
  Downloading pytest_socket-0.7.0-py3-non

# Select the LLM model to use

The model must be downloaded locally to be used, so if you want to run llama3, you should run:

```

ollama pull llama3

```

Check the list of models available for Ollama here: https://ollama.com/library

In [None]:
!pip install ollama

Collecting ollama
  Downloading ollama-0.4.8-py3-none-any.whl.metadata (4.7 kB)
Downloading ollama-0.4.8-py3-none-any.whl (13 kB)
Installing collected packages: ollama
Successfully installed ollama-0.4.8


In [None]:
!apt install pciutils -y
!curl -fsSL https://ollama.com/install.sh | sh
!ollama run llama3

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libpci3 pci.ids
The following NEW packages will be installed:
  libpci3 pci.ids pciutils
0 upgraded, 3 newly installed, 0 to remove and 34 not upgraded.
Need to get 343 kB of archives.
After this operation, 1,581 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 pci.ids all 0.0~2022.01.22-1ubuntu0.1 [251 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 libpci3 amd64 1:3.7.0-6 [28.9 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/main amd64 pciutils amd64 1:3.7.0-6 [63.6 kB]
Fetched 343 kB in 0s (1,182 kB/s)
Selecting previously unselected package pci.ids.
(Reading database ... 126333 files and directories currently installed.)
Preparing to unpack .../pci.ids_0.0~2022.01.22-1ubuntu0.1_all.deb ...
Unpacking pci.ids (0.0~2022.01.22-1ubuntu0.1) ...
Selecting previously unse

In [None]:
!pip install langchain_community

Collecting langchain_community
  Downloading langchain_community-0.3.22-py3-none-any.whl.metadata (2.4 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain_community)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Downloading langchain_community-0.3.22-py3-none-any.whl (2.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m35.4 MB/s[0m 

In [None]:
#MODEL = "gpt-3.5-turbo"
#MODEL = "mixtral:8x7b"
#MODEL = "gemma:7b"
#MODEL = "llama2"
MODEL = "llama3" # https://ollama.com/library/llama3

In [None]:
import os
import asyncio

# NB: You may need to set these depending and get cuda working depending which backend you are running.
# Set environment variable for NVIDIA library
# Set environment variables for CUDA
os.environ['PATH'] += ':/usr/local/cuda/bin'
# Set LD_LIBRARY_PATH to include both /usr/lib64-nvidia and CUDA lib directories
os.environ['LD_LIBRARY_PATH'] = '/usr/lib64-nvidia:/usr/local/cuda/lib64'

async def run_process(cmd):
    print('>>> starting', *cmd)
    process = await asyncio.create_subprocess_exec(
        *cmd,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE
    )

    # define an async pipe function
    async def pipe(lines):
        async for line in lines:
            print(line.decode().strip())

        await asyncio.gather(
            pipe(process.stdout),
            pipe(process.stderr),
        )

    # call it
    await asyncio.gather(pipe(process.stdout), pipe(process.stderr))

In [None]:
import asyncio
import threading

async def start_ollama_serve():
    await run_process(['ollama', 'serve'])

def run_async_in_thread(loop, coro):
    asyncio.set_event_loop(loop)
    loop.run_until_complete(coro)
    loop.close()

# Create a new event loop that will run in a new thread
new_loop = asyncio.new_event_loop()

# Start ollama serve in a separate thread so the cell won't block execution
thread = threading.Thread(target=run_async_in_thread, args=(new_loop, start_ollama_serve()))
thread.start()

>>> starting ollama serve


In [None]:
!ollama pull llama3

Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICy8D5jZhgPTQlBwm6KS2ZrDmidR/w8OCAFrq320ezOw

2025/04/27 16:43:35 routes.go:1232: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0

# We instanciate the LLM model and the Embedding model

In [None]:
from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings

model = Ollama(model=MODEL)
embeddings = OllamaEmbeddings(model=MODEL)

model.invoke("Give me an inspirational quote")

  model = Ollama(model=MODEL)
  embeddings = OllamaEmbeddings(model=MODEL)


time=2025-04-27T16:44:09.910Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:44:09.911Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z
time=2025-04-27T16:44:10.079Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:44:10.105Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:44:10.105Z level=WARN source=ggml.go:152 msg="key not found" key=llama.vision.block_count default=0
time=2025-04-27T16:44:10.106Z level=WARN source=ggml.go:152 msg="key not found" key=llama.attention.key_length default=128
time=2025-04-27T16:44:10.106Z level=WARN source=ggml.go:152 msg="key not found" key=llama.attention.value_length default=128
time=2025-04-27T16:44:10.106Z level=INFO source=sched.go:722 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec5386

'Here\'s one:\n\n"Believe you can and you\'re halfway there." - Theodore Roosevelt\n\nI hope it inspires you to tackle your goals and dreams with confidence!'

In [None]:
model.invoke("What is 2+2?")

time=2025-04-27T16:44:13.520Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:44:13.521Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z
[GIN] 2025/04/27 - 16:44:13 | 200 |  159.753445ms |       127.0.0.1 | POST     "/api/generate"


'The answer to 2+2 is... 4!'

## Using a parser provided by LangChain, we can transform the LLM output to something more suitable to be read

In [None]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()
response_from_model = model.invoke("Give me an inspirational quote")
parsed_response = parser.parse(response_from_model)
print(parsed_response)

time=2025-04-27T16:44:13.697Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:44:13.698Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z
[GIN] 2025/04/27 - 16:44:13 | 200 |  222.935808ms |       127.0.0.1 | POST     "/api/generate"Here's one:

"Believe you can and you're halfway there." - Theodore Roosevelt



# We generate the template for the conversation with the instruct-based LLM

We can create a template to structure the conversation effectively.

This template allows us to provide some general context to the Language Learning Model (LLM), which will be utilized for every prompt. This ensures that the model has a consistent background understanding for all interactions.

Additionally, we can include specific context relevant to the particular prompt. This helps the model understand the immediate scenario or topic before addressing the actual question. Following this specific context, we then present the actual question we want the model to answer.

By using this approach, we enhance the model's ability to generate accurate and relevant responses based on both the general and specific contexts provided.

In [None]:
from langchain.prompts import PromptTemplate

template = """
Answer the question based on the context below. If you can't
answer the question, answer with "I don't know".

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template)
prompt.format(context="Here is some context", question="Here is a question")

'\nAnswer the question based on the context below. If you can\'t\nanswer the question, answer with "I don\'t know".\n\nContext: Here is some context\n\nQuestion: Here is a question\n'

The model can answer prompts based on the context:

In [None]:
formatted_prompt = prompt.format(context="My parents named me Sergio", question="What's your name?")
response_from_model = model.invoke(formatted_prompt)
parsed_response = parser.parse(response_from_model)
print(parsed_response)

time=2025-04-27T16:44:14.049Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:44:14.050Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z
[GIN] 2025/04/27 - 16:44:14 | 200 |   94.462647ms |       127.0.0.1 | POST     "/api/generate"Sergio!



But it can't answer what is not provided as context:

In [None]:
formatted_prompt = prompt.format(context="My parents named me Sergio", question="What's my age?")
response_from_model = model.invoke(formatted_prompt)
parsed_response = parser.parse(response_from_model)
print(parsed_response)

time=2025-04-27T16:44:14.151Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:44:14.152Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z
[GIN] 2025/04/27 - 16:44:14 | 200 |   82.220114ms |       127.0.0.1 | POST     "/api/generate"
I don't know


Even previously known info!

In [None]:
formatted_prompt = prompt.format(context="My parents named me Sergio", question="What is 2+2?")
response_from_model = model.invoke(formatted_prompt)
parsed_response = parser.parse(response_from_model)
print(parsed_response)

time=2025-04-27T16:44:14.241Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:44:14.242Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z
[GIN] 2025/04/27 - 16:44:14 | 200 |   85.012077ms |       127.0.0.1 | POST     "/api/generate"I don't know



# Load an example PDF to do Retrieval Augmented Generation (RAG)

For the example, you can select your own PDF.

In [None]:
from langchain_community.document_loaders import PyPDFLoader
# let colab access my google drive
from google.colab import drive
drive.mount('/content/drive')


loader = PyPDFLoader("/content/drive/MyDrive/AIMLTraining/LangChain/files/genAI.pdf")
pages = loader.load_and_split()
#pages = loader.load()
pages

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


[Document(metadata={'producer': 'GPL Ghostscript 10.01.1', 'creator': 'PyPDF', 'creationdate': '2024-03-07T17:18:45+01:00', 'moddate': '2024-03-07T17:18:45+01:00', 'source': '/content/drive/MyDrive/AIMLTraining/LangChain/files/genAI.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}, page_content='Austin ¥ Boston ¥ Chicago ¥ Denver ¥ Harrisburg ¥ O lympia ¥ Sacramento ¥ Silicon Valley ¥ Washington, D.C.  \n \nARTIFICIAL INTELLIGENCE (AI) & GENERATIVE AI \n \nWhat is Artificial Intelligence? \n \nArtificial Intelligence (AI) is a field of science concerned with building machines that can reason, \nlearn, and act in such a way that would normally re quire human intelligence or that involves data \nwhose scale exceeds what humans can analyze.1 \n \nWhat is Generative Artificial Intelligence? \n \nAI has been around for decades, but the field has recently garnered significant attention due to \nadvancements in the subfield of generative AI, and the subsequent release of generative AI ch

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=20)
text_documents = text_splitter.split_documents(pages)[:]

pages

[Document(metadata={'producer': 'GPL Ghostscript 10.01.1', 'creator': 'PyPDF', 'creationdate': '2024-03-07T17:18:45+01:00', 'moddate': '2024-03-07T17:18:45+01:00', 'source': '/content/drive/MyDrive/AIMLTraining/LangChain/files/genAI.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}, page_content='Austin ¥ Boston ¥ Chicago ¥ Denver ¥ Harrisburg ¥ O lympia ¥ Sacramento ¥ Silicon Valley ¥ Washington, D.C.  \n \nARTIFICIAL INTELLIGENCE (AI) & GENERATIVE AI \n \nWhat is Artificial Intelligence? \n \nArtificial Intelligence (AI) is a field of science concerned with building machines that can reason, \nlearn, and act in such a way that would normally re quire human intelligence or that involves data \nwhose scale exceeds what humans can analyze.1 \n \nWhat is Generative Artificial Intelligence? \n \nAI has been around for decades, but the field has recently garnered significant attention due to \nadvancements in the subfield of generative AI, and the subsequent release of generative AI ch

# Store the PDF in a vector space.

From Langchain docs:

`DocArrayInMemorySearch is a document index provided by Docarray that stores documents in memory. It is a great starting point for small datasets, where you may not want to launch a database server.`

The execution time of the following block depends on the complexity and longitude of the PDF provided. Try to keep it small and simple for the example.

In [None]:
from langchain_community.vectorstores import DocArrayInMemorySearch

vectorstore = DocArrayInMemorySearch.from_documents(text_documents, embedding=embeddings)

time=2025-04-27T16:47:04.846Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:47:04.847Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z
[GIN] 2025/04/27 - 16:47:04 | 200 |   66.334473ms |       127.0.0.1 | POST     "/api/embeddings"
time=2025-04-27T16:47:04.916Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:47:04.916Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z
[GIN] 2025/04/27 - 16:47:04 | 200 |   61.593702ms |       127.0.0.1 | POST     "/api/embeddings"
time=2025-04-27T16:47:04.982Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:47:04.983Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z
[GIN] 2025/04/27 - 16:47:05 | 200 |    62.58857ms |       127.0.0.1 | POST     "/api/embeddings"
time=2025-04-27T16:47:05.050Z level=WARN source=ggml.go:15

# Create retriever of vectors that are similar to be used as context

In [None]:
retriever = vectorstore.as_retriever()
retriever.invoke("artificial intelligence")

time=2025-04-27T16:47:12.093Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:47:12.094Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z
[GIN] 2025/04/27 - 16:47:12 | 200 |   41.435406ms |       127.0.0.1 | POST     "/api/embeddings"


[Document(metadata={'producer': 'GPL Ghostscript 10.01.1', 'creator': 'PyPDF', 'creationdate': '2024-03-07T17:18:45+01:00', 'moddate': '2024-03-07T17:18:45+01:00', 'source': '/content/drive/MyDrive/AIMLTraining/LangChain/files/genAI.pdf', 'total_pages': 2, 'page': 1, 'page_label': '2'}, page_content='9 Ashish Vaswani et al., ÒAttention Is All You Need,Ó 2017, https://arxiv.org/pdf/1706.03762.pdf. \nFigure 2: Google Translate'),
 Document(metadata={'producer': 'GPL Ghostscript 10.01.1', 'creator': 'PyPDF', 'creationdate': '2024-03-07T17:18:45+01:00', 'moddate': '2024-03-07T17:18:45+01:00', 'source': '/content/drive/MyDrive/AIMLTraining/LangChain/files/genAI.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}, page_content='Until recently, AI was largely used for categorization and pattern recognition to understand and \nrecommend information.  Now, recent advancements in the field of AI enable us to use AI as a tool \nto create novel content.2 \n \nGenerative Artificial Intelligence (

# Generate conversation with the document to extract the details

In [None]:
# Assuming retriever is an instance of a retriever class and has a method to retrieve context
retrieved_context = retriever.invoke("What is impact of AI?")
print(retrieved_context)

time=2025-04-27T16:47:18.129Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:47:18.130Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z
[GIN] 2025/04/27 - 16:47:18 | 200 |   46.716707ms |       127.0.0.1 | POST     "/api/embeddings"
[Document(metadata={'producer': 'GPL Ghostscript 10.01.1', 'creator': 'PyPDF', 'creationdate': '2024-03-07T17:18:45+01:00', 'moddate': '2024-03-07T17:18:45+01:00', 'source': '/content/drive/MyDrive/AIMLTraining/LangChain/files/genAI.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}, page_content='to create new content, including text, images, musi c, audio, video, and even computer code. 3  \nGenerative AI builds on existing foundation models, which are models trained on massive generalized \ndatasets that provide a starting point to develop s pecialized applications more quickly and cost-\neffectively.4 \n \nAI in Our Daily Lives'), Document(metadata={'producer': 'GPL Ghos

In [None]:
questions = [
    "What is the impact of AI in our daily life?",
    "Please summarize the document",
    "Does he know about Tensorflow?"
]

for question in questions:
    formatted_prompt = prompt.format(context=retrieved_context, question=question)
    response_from_model = model.invoke(formatted_prompt)
    parsed_response = parser.parse(response_from_model)

    print(f"Question: {question}")
    print(f"Answer: {parsed_response}")
    print()

time=2025-04-27T16:47:22.952Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:47:22.953Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z
[GIN] 2025/04/27 - 16:47:23 | 200 |   708.68436ms |       127.0.0.1 | POST     "/api/generate"
Question: What is the impact of AI in our daily life?
Answer: I don't know. The provided context does not mention the specific impact of AI in daily life. It only provides general information about generative AI, its applications, and a brief history of artificial intelligence.

time=2025-04-27T16:47:23.665Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:47:23.666Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z
[GIN] 2025/04/27 - 16:47:24 | 200 |  663.164291ms |       127.0.0.1 | POST     "/api/generate"
Question: Please summarize the document
Answer: I don't know. The document appears to be a colle

# Loop to ask-answer questions continously

In [None]:
while True:
    print("Say 'exit' or 'quit' to exit the loop")
    question = input('User question: ')
    print(f"Question: {question}")
    if question.lower() in ["exit", "quit"]:
        print("Exiting the conversation. Goodbye!")
        break
    formatted_prompt = prompt.format(context=retrieved_context, question=question)
    response_from_model = model.invoke(formatted_prompt)
    parsed_response = parser.parse(response_from_model)
    print(f"Answer: {parsed_response}")
    print()

Say 'exit' or 'quit' to exit the loop
User question: tell me about AI
Question: tell me about AI
time=2025-04-27T16:47:52.430Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-27T16:47:52.431Z level=WARN source=types.go:570 msg="invalid option provided" option=tfs_z
[GIN] 2025/04/27 - 16:47:53 | 200 |  1.474840189s |       127.0.0.1 | POST     "/api/generate"
Answer: According to the context, AI is not explicitly mentioned as a topic within this document. However, we can extract some information about AI from the text.

AI is mentioned in the following sections:

1. Page 0: "to create new content, including text, images, music, audio, video, and even computer code." - This suggests that AI has the capability to generate content across various mediums.
2. Page 1 (Section 1956): "John McCarthy is credited for coining the phrase 'artificial intelligence'..." - This indicates that AI has been around since at least the mid-20th century.

Unfort