### 1.  Installations and Settings 🛠️

In [1]:
%%bash
uv pip install -q llama-index-core
uv pip install -q llama-index-llms-groq
uv pip install -q llama-index-readers-file
uv pip install -q llama-index-embeddings-huggingface
uv pip install -q llama-index-embeddings-instructor

In [5]:
import os

In [7]:
from google.colab import userdata

# Set the token as an environ variable
os.environ["GROQ_API_KEY"] = userdata.get("GROQ_API_KEY")
os.environ["sina_hug_tocken"] = userdata.get("sina_hug_tocken")

In [29]:
from llama_index.llms.groq import Groq

# This info's at the top of each HuggingFace model page
model = "llama3-70b-8192"

llm = Groq(
    model=model,
    # api_key=os.environ.get(
    #     "GROQ_API_KEY"
    # ),  # you can also enter your API key here, either hard-coded or read from another file
)

Now our model need some information to work.

In [15]:
%mkdir -p /content/data
!wget -O /content/data/The_Republic_of_Plato.txt https://www.gutenberg.org/cache/epub/55201/pg55201.txt

--2025-05-06 14:03:11--  https://www.gutenberg.org/cache/epub/55201/pg55201.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1457578 (1.4M) [text/plain]
Saving to: ‘/content/data/The_Republic_of_Plato.txt’


2025-05-06 14:03:12 (3.41 MB/s) - ‘/content/data/The_Republic_of_Plato.txt’ saved [1457578/1457578]



In [16]:
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("/content/data").load_data()

Large documents, like books, are too big for models to handle all at once. To make them easier to process, we split them into smaller parts, called chunks. These chunks can be split:

By `sentences` – keeping full sentences together.


By `tokens` – based on model input limits (tokens = pieces of words).


By `meaning` (semantics) – keeping related ideas grouped.

In [17]:
from llama_index.core.node_parser import SentenceSplitter

text_splitter = SentenceSplitter(chunk_size=800, chunk_overlap=150)

docs = text_splitter.get_nodes_from_documents(documents)

**Creating vectors with embeddings**

Embeddings are a fancy way of saying we turn words into numbers that computers can understand. Each word gets its own unique code, based on its meaning and relationship to other words. The list of numbers produced is known as a vector. Vectors allow us to compare text and find chunks that contain similar information

In [18]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding


embedding_model = "sentence-transformers/all-MiniLM-L6-v2"
embeddings_folder = "/content/embedding_model/"


embeddings = HuggingFaceEmbedding(
                                   model_name = embedding_model,
                                   cache_folder = embeddings_folder)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [21]:
test_text = "Sina is my friend"
query_result = embeddings.get_text_embedding(test_text)
query_result

[-0.08157619088888168,
 0.04829923063516617,
 -0.09897464513778687,
 -0.017182614654302597,
 -0.03681368753314018,
 0.03711341693997383,
 0.07499756664037704,
 -0.028513623401522636,
 0.04212306812405586,
 0.006139792501926422,
 0.01407662034034729,
 -0.06968077272176743,
 -0.06041228026151657,
 0.06358020007610321,
 0.06774589419364929,
 -0.033906325697898865,
 0.07246381789445877,
 -0.003503041807562113,
 -0.034613173454999924,
 -0.06648339331150055,
 -0.04274279996752739,
 0.03157821670174599,
 0.008334643207490444,
 -0.027281727641820908,
 -0.08411352336406708,
 0.05708814784884453,
 0.056152693927288055,
 -0.0463726781308651,
 -0.03786627948284149,
 -0.102239228785038,
 -0.025849023833870888,
 0.12577001750469208,
 -0.06357301026582718,
 -0.05562366172671318,
 -0.03052680753171444,
 0.06544873863458633,
 -0.030589813366532326,
 -0.04039536789059639,
 -0.0035911700688302517,
 0.05368359386920929,
 -0.029433434829115868,
 0.07776978611946106,
 0.01670188643038273,
 -0.06749193370342

In [22]:
characters = len(test_text)
dimensions = len(query_result)
print(
    f"The {characters} character sentence was transformed into a {dimensions} dimension vector"
)

The 17 character sentence was transformed into a 384 dimension vector


**Creating a vector database**

Imagine a library where books aren't just filed alphabetically, but also by their themes, characters, and emotions. That's the magic of vector databases: they unlock information beyond keywords, connecting ideas in unexpected ways.

In [23]:
from llama_index.core import VectorStoreIndex

documents = SimpleDirectoryReader("/content/data").load_data()

vector_index = VectorStoreIndex.from_documents(
                                                documents,
                                                transformations = [text_splitter],
                                                embed_model = embeddings)

Now, I want to save the database

In [24]:
vector_index.storage_context.persist(persist_dir="/content/vector_index")

In [26]:
# code below is for future, when we want to load the database:

# from llama_index.core import StorageContext, load_index_from_storage

# storage_context = StorageContext.from_defaults(persist_dir="/content/vector_index")
# vector_index = load_index_from_storage(storage_context, embed_model=embeddings)

**Adding a prompt**

We can guide our model's behavior with a prompt.

In [28]:
from llama_index.core.prompts import PromptTemplate

input_template = """Here is the context:
{context_str}

Answer the question based only on the following context. Keep your answers short and succinct.
Question to be answered: {query_str}
Answer:"""

prompt = PromptTemplate(template=input_template)

**RAG - chaining it all together**

This is the final piece of the puzzle:

we now drive everything with an engine. Our vector database, our prompt, and our LLM join to give us retrieval augmented generation

In [30]:
query_engine = vector_index.as_query_engine(
                                            llm=llm,
                                            text_qa_template=prompt,
                                            similarity_top_k=2,
                                            response_mode="tree_summarize")

Let's test our model

In [36]:
answer = await query_engine.aquery("Who is Plato?")
print(answer)

Plato is a philosopher who has deeply meditated on the 'way of life of Pythagoras' and his followers, and has written about ideal states and forms of government, criticizing both democracy and tyranny, and envisioning a voluntary rule over voluntary subjects.
