In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
%cd drive/MyDrive/Colab\ Notebooks/rag_training

/content/drive/MyDrive/Colab Notebooks/rag_training


In [3]:
%%bash

uv pip install haystack-ai
uv pip install "datasets"
uv pip install "sentence-transformers>=4.1.0"

Using Python 3.11.13 environment at: /usr
Audited 1 package in 252ms
Using Python 3.11.13 environment at: /usr
Audited 1 package in 245ms
Using Python 3.11.13 environment at: /usr
Audited 1 package in 249ms


In [4]:
import os
import huggingface_hub as hf_hub
from getpass import getpass
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from datasets import load_dataset
from haystack import Document
from haystack.components.embedders import SentenceTransformersTextEmbedder, OpenAIDocumentEmbedder, OpenAITextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.dataclasses import ChatMessage
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator

In [5]:
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")

Enter OpenAI API key:··········


In [7]:
os.getenv("OPENAI_API_KEY")

'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'

In [8]:
dataset = load_dataset("bilgeyucel/seven-wonders", split="train")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [9]:
docs = [Document(content=doc["content"], meta=doc["meta"]) for doc in dataset]

In [10]:
for doc_idx, doc in enumerate(docs[:2]):
    print(f"Document idx {doc_idx}\n")
    print(f"Content: {doc.content}")
    print(f"Meta: {doc.meta}\n\n")

Document idx 0

Content: The Colossus of Rhodes (Ancient Greek: ὁ Κολοσσὸς Ῥόδιος, romanized: ho Kolossòs Rhódios Greek: Κολοσσός της Ρόδου, romanized: Kolossós tes Rhódou)[a] was a statue of the Greek sun-god Helios, erected in the city of Rhodes, on the Greek island of the same name, by Chares of Lindos in 280 BC. One of the Seven Wonders of the Ancient World, it was constructed to celebrate the successful defence of Rhodes city against an attack by Demetrius Poliorcetes, who had besieged it for a year with a large army and navy.
According to most contemporary descriptions, the Colossus stood approximately 70 cubits, or 33 metres (108 feet) high – approximately the height of the modern Statue of Liberty from feet to crown – making it the tallest statue in the ancient world.[2] It collapsed during the earthquake of 226 BC, although parts of it were preserved. In accordance with a certain oracle, the Rhodians did not build it again.[3] John Malalas wrote that Hadrian in his reign re-er

In [11]:
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore()

In [12]:
doc_embedder = OpenAIDocumentEmbedder()

In [13]:
docs_with_embeddings = doc_embedder.run(docs)
document_store.write_documents(docs_with_embeddings["documents"])
print(f"Stored {len(docs_with_embeddings['documents'])} documents with embeddings in the document store.")

Calculating embeddings: 5it [00:03,  1.58it/s]

Stored 151 documents with embeddings in the document store.





In [14]:
template = [
    ChatMessage.from_user(
        """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{question}}
Answer:
"""
    )
]

prompt_builder = ChatPromptBuilder(template=template)



In [15]:
text_embedder = OpenAITextEmbedder()
retriever = InMemoryEmbeddingRetriever(document_store)
chat_generator = OpenAIChatGenerator(model="gpt-4o-mini")

In [16]:
basic_rag_pipeline = Pipeline()

basic_rag_pipeline.add_component("text_embedder", text_embedder)
basic_rag_pipeline.add_component("retriever", retriever)
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", chat_generator)

In [17]:
basic_rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
basic_rag_pipeline.connect("retriever", "prompt_builder")
basic_rag_pipeline.connect("prompt_builder.prompt", "llm.messages")

<haystack.core.pipeline.pipeline.Pipeline object at 0x7f337bd63050>
🚅 Components
  - text_embedder: OpenAITextEmbedder
  - retriever: InMemoryEmbeddingRetriever
  - prompt_builder: ChatPromptBuilder
  - llm: OpenAIChatGenerator
🛤️ Connections
  - text_embedder.embedding -> retriever.query_embedding (List[float])
  - retriever.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> llm.messages (List[ChatMessage])

In [18]:
question = "What does Rhodes Statue look like?"

response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})

print(response["llm"]["replies"][0].text)

The Rhodes Statue, known as the Colossus of Rhodes, was a colossal statue of the Greek sun-god Helios. It is described as being approximately 33 meters (108 feet) high, making it one of the tallest statues of the ancient world. Although the exact appearance of the statue is not known, it is believed to have depicted Helios with curly hair and flame-like spikes radiating from the head, similar to representations found on contemporary Rhodian coins. It likely stood on a pedestal near the harbor entrance, and it is thought that the statue may have been constructed in a pose where Helios shielded his eyes with one hand, a common depiction associated with looking toward the sun. The statue was initially constructed using iron tie bars and bronze plates, with a stone-filled interior for stability.
