## Embedding and Retrieval

- LLMs have been trained with huge corpus of data. Yet they lack information about internal/confidential data within organization.
- Methods to make LLM infer your data
  - Fine-Tuning
  - RAG
- RAG is a first step to provide additional context to LLM to start using your internal data
- Components of RAG
    - Data Preparation (Extraction, Chunking)
    - Embedding
    - Storing in Vector DB
    - Retrieval

In [14]:
# Use Llama-index as a LLM framework

## Data Preparation

- Using a Youtube Video Transcript Extractor as Data Loader from YT videos
- Load the source and split into granular chunks for embedding

In [5]:

# Using Youtube transcripts as a data source
from llama_index.readers.youtube_transcript import YoutubeTranscriptReader
YOUTUBE_VIDEO_LINKS = ["https://youtu.be/LCEmiRjPEtQ?si=YYiRBrq7Ho3NYf6F", "https://youtu.be/c3b-JASoPi0?si=9kk_0wF7e5d5oXBY"]

data_loader = YoutubeTranscriptReader()
documents = data_loader.load_data(
    ytlinks=YOUTUBE_VIDEO_LINKS,
)

print(f"Loaded {len(documents)} documents from YouTube transcripts.")

Loaded 2 documents from YouTube transcripts.


In [9]:
from llama_index.core.text_splitter import SentenceSplitter

CHUNK_SIZE = 500  # Define the chunk size for splitting sentences
CHUNK_OVERLAP = 50

sentence_splitter = SentenceSplitter(
    chunk_size=CHUNK_SIZE,
    chunk_overlap=CHUNK_OVERLAP,
)

chunks = sentence_splitter.get_nodes_from_documents(documents, show_progress=True)

print(f"The documents have been split into {len(chunks)} chunks.\n\n")

print(chunks[0].text)

Parsing nodes: 100%|██████████| 2/2 [00:00<00:00, 18.86it/s]

The documents have been split into 50 chunks.


Please welcome former director of AI
Tesla Andre Carpathy.
[Music]
Hello.
[Music]
Wow, a lot of people here. Hello.
Um, okay. Yeah. So I'm excited to be
here today to talk to you about software
in the era of AI. And I'm told that many
of you are students like bachelors,
masters, PhD and so on. And you're about
to enter the industry. And I think it's
actually like an extremely unique and
very interesting time to enter the
industry right now. And I think
fundamentally the reason for that is
that um software is changing uh again.
And I say again because I actually gave
this talk already. Um but the problem is
that software keeps changing. So I
actually have a lot of material to
create new talks and I think it's
changing quite fundamentally. I think
roughly speaking software has not
changed much on such a fundamental level
for 70 years. And then it's changed I
think about twice quite rapidly in the
last few years. And so there's just a
huge a




## Embedding the source

- Use local embedding model from Hugging Face
- Embedding Model used here is - `intfloat/multilingual-e5-large-instruct`

In [7]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

EMBEDDING_MODEL_NAME = "intfloat/multilingual-e5-large-instruct"
embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL_NAME, trust_remote_code=True)

In [15]:
for chunk in chunks:
    chunk.embedding = embed_model.get_text_embedding(chunk.text)

print(len(chunks[0].embedding))

1024


In [16]:
# Use Qdrant-DB from as a vector database
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client

QDRANT_URL = "http://localhost:6333"
COLLECTION_NAME = "Understanding_AI_Agents"

def get_vector_store():
    client = qdrant_client.QdrantClient(
        url=QDRANT_URL,

    )

    vector_store = QdrantVectorStore(
        client=client,
        collection_name=COLLECTION_NAME,
    )
    return vector_store

qdrant_vector_store = get_vector_store()

In [17]:
# Add the chunks to the Qdrant vector store

_ = qdrant_vector_store.add(
    nodes=chunks,
    show_progress=True,
)

['0dc628ed-2777-488d-a5b2-5f6a0db62ef1',
 '29148353-c4d0-4b6c-9d04-881a4e0ef786',
 '0dfe7f7d-374d-4b2d-884f-39d296cc1e5f',
 '76b0661b-65f6-47e6-a400-2be2128adf9d',
 'caab1d8e-d675-4bab-a47d-cf64c6918fa4',
 '22b39cb1-60b8-4466-b32f-c529414c421b',
 'dc4bbb96-bfa3-4f7c-b807-f7e477f86918',
 'ccab07b2-3209-4a13-9e99-f2402fda35c0',
 'df3ecf17-1a1b-4d07-9f5f-83aac58fb852',
 '79f3bb4f-a675-4709-bda7-fc42882f8ffd',
 'e7426219-08a8-4754-b115-41ef9953a64d',
 'df394fcc-7c7c-41ef-a99c-2782ce6caa2e',
 'cddb69c0-f9c5-4836-8c4e-8754691d55a7',
 'f8658ccb-da32-46c4-abdc-29c854cb86a6',
 '8eaf5bd1-4fb7-4964-9f24-4d427127369b',
 '9ded4459-0935-41ea-97c4-067c9f1b7124',
 '33a1ca58-507e-43bf-b7b1-7b5df09fcd6e',
 'dd5ac7cf-3a57-40b2-b558-4d9c391b81ae',
 '00de744b-2f9d-4b96-b46c-1d796f3abbf3',
 '2cf829c2-57dd-4e70-a532-22ff8abc175f',
 '127c310d-5185-43e3-9112-948b79e7bcbb',
 '36814c9b-ba99-4f7e-9c66-78dab5b4f7b7',
 '14ac6d04-7de9-4f16-a70e-28e684b840c1',
 '9229249c-9750-49d7-8c73-398b657fc3ff',
 'b358d3a6-9d32-

## Retrieval

1. Since we have embedded the current document, we can use a query and retrieve a document before summarization.
2. Create a retriever from vector Store for retrieving relevant documents

In [22]:
from llama_index.core import QueryBundle

query = "what is software 3.0? by Andrej Karpathy"

query_embeddings = embed_model.get_text_embedding(query)

query_bundle = QueryBundle(query_str="", embedding=query_embeddings)

print(len(query_embeddings))

1024


In [29]:
from llama_index.core import VectorStoreIndex

SIMILARITY_TOP_K = 3

vector_store_index = VectorStoreIndex.from_vector_store(
    vector_store=qdrant_vector_store, embed_model=embed_model
)

qdrant_retriever = vector_store_index.as_retriever(similarity_top_k=SIMILARITY_TOP_K)
retrieved_nodes = qdrant_retriever.retrieve(query_bundle)

In [30]:
for node in retrieved_nodes:
    print(f"Retrieved Node: {node.text}\n")
    print(f"Similarity Score: {node.score}\n")
    print("-" * 80)

Retrieved Node: Software 2.0
know are basically neural networks and
in particular the weights of a neural
network and you're not writing this code
directly you are most you are more kind
of like tuning the data sets and then
you're running an optimizer to create to
create the parameters of this neural net
and I think like at the time neural nets
were kind of seen as like just a
different kind of classifier like a
decision tree or something like that and
so I think it was kind of like um I
think this framing was a lot more
appropriate and now actually what we
have is kind of like an equivalent of
GitHub in the realm of software 2.0 And
I think the hugging face is basically
equivalent of GitHub in software 2.0.
And there's also model atlas and you can
visualize all the code written there. In
case you're curious, by the way, the
giant circle, the point in the middle,
uh these are the parameters of flux, the
image generator. And so anytime someone
tunes a on top of a flux model, you
basica

## Further Reads & References

1. Embedding Modes Benchmark - https://huggingface.co/spaces/mteb/leaderboard
2. llama-hub - https://llamahub.ai/