# Rag From Scratch: Overview [Open in Colab](https://colab.research.google.com/github/yonanicodes/rag/blob/main/rag_1.ipynb)

These notebooks walk through the process of building RAG app(s) from scratch.

They will build towards a broader understanding of the RAG langscape, as shown here:
<!--
![Screenshot 2024-03-25 at 8.30.33 PM.png](attachment:c566957c-a8ef-41a9-9b78-e089d35cf0b7.png) -->

## Enviornment

`(1) Packages`

In [88]:
! pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain
print("[INFO] Running in Google Colab, installing requirements.")
!pip install PyMuPDF # for reading PDFs with Python
!pip install tqdm # for progress bars
# !pip install sentence-transformers # for embedding models
# !pip install accelerate # for quantization model loading
# !pip install bitsandbytes # for quantizing models (less storage space)
# !pip install flash-attn --no-build-isolation # for faster attention mechanism = faster LLM inference
!pip install sentence-transformers # for embbeding a sentence in to numbers
!pip install langchain langchain_community

[INFO] Running in Google Colab, installing requirements.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pip/_internal/cli/base_command.py", line 179, in exc_logging_wrapper
    status = run_func(*args)
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pip/_internal/cli/req_command.py", line 67, in wrapper
    return func(self, options, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pip/_internal/commands/install.py", line 447, in run
    conflicts = self._determine_conflicts(to_install)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pip/_internal/commands/install.py", line 578, in _determine_conflicts
    return check_install_conflicts(to_install)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pip/_internal/operations/check.py", line 101, in check_install_conflicts
    package_se

`(2) LangSmith`

https://docs.smith.langchain.com/

In [None]:
import os
os.environ['LANGSMITH_TRACING'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] ='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'

`(3) API Keys`

In [None]:
# os.environ['OPENAI_API_KEY'] = <your-api-key>

## load the pdf data

In [None]:
# Requires !pip install PyMuPDF, see: https://github.com/pymupdf/pymupdf
import fitz # (pymupdf, found this is better than pypdf for our use case, note: licence is AGPL-3.0, keep that in mind if you want to use any code commercially)
from tqdm.auto import tqdm # for progress bars, requires !pip install tqdm

def text_formatter(text: str) -> str:
    """Performs minor formatting on text."""
    cleaned_text = text.replace("\n", " ").strip() # note: this might be different for each doc (best to experiment)

    # Other potential text formatting functions can go here
    return cleaned_text

# Open PDF and get lines/pages
# Note: this only focuses on text, rather than images/figures etc
def open_and_read_pdf(pdf_path: str) -> list[dict]:
    """
    Opens a PDF file, reads its text content page by page, and collects statistics.

    Parameters:
        pdf_path (str): The file path to the PDF document to be opened and read.

    Returns:
        list[dict]: A list of dictionaries, each containing the page number
        (adjusted), character count, word count, sentence count, token count, and the extracted text
        for each page.
    """
    doc = fitz.open(pdf_path)  # open a document
    pages_and_texts = []
    for page_number, page in tqdm(enumerate(doc)):  # iterate the document pages
        text = page.get_text()  # get plain text encoded as UTF-8
        text = text_formatter(text)
        pages_and_texts.append({"page_number": page_number - 41,  # adjust page numbers since our PDF starts on page 42
                                "page_char_count": len(text),
                                "page_word_count": len(text.split(" ")),
                                "page_sentence_count_raw": len(text.split(". ")),
                                "page_token_count": len(text) / 4,  # 1 token = ~4 chars, see: https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
                                "text": text})
    return pages_and_texts






--- Page 1 ---
የዜግነት ትምህርት የመምህራን መምሪያ 8ኛ ክፍል የዜግነት ትምህርት የመምህራን መምሪያ 8ኛ ክፍል በትምህርት ሚኒስቴር አስተባባነት ተዘጋጅቶ ተስማምቶ የተተረጎመ ተርጓሚዎች 1. አስመራ ቤጊ 2. ሞቲ ድንቁ 3. ፍራንኦል ዋቅጅራ 4. ኃይሉ ብሩ 5. አብደና አመኑ አርታዒዎች 1. እየሩሳሌም ገርባባ 2. ፈጠነ ቡልቻ 3. አሸብር ሀብታሙ 4. ሰለሞን ንጉሴ 5. ስሪቃ ገመቹ ግራፍክስ ታደሰ ድንቁ ገምጋሚ ደረጀ መኮንን ዴሬሳ i

--- Page 2 ---
የዜግነት ትምህርት የመምህራን መምሪያ 8ኛ ክፍል © ኦሮሚያ ትምህርት ቢሮ 2014/2022 ይህ የተማሪዎች መጽሃፍ በኢፌዴሪ የትምህርት ምንስቴር አስተባባሪነት ተዘጋጅቶ በኦሮሚያ ትምህርት ቢሮ አሰተባባርነት ተስማምቶ ወደ አፋን ኦሮሞ የተተረጎመ ነዉ፡፡ የዚህ መጽሐፍ የባለቤትነት መብት በህግ የተጠበቀ ነው፡፡ከኦሮሚያ ትምህርት ቢሮ ፈቃድ ውጪ በሙሉም ሆነ በከፊል ማሳተምም ሆነ አባዝተው ማሰራጨት በህግ ያስጠይቃል፡፡ ii

--- Page 3 ---
የዜግነት ትምህርት የመምህራን መምሪያ 8ኛ ክፍል ማዉጫ ገጽ ምዕራፍ አንድ፣ መልካምነት ....................................................................................................................... 1 ይዘት1. የመልካምነት ፅንሰ ሃሳብ .......................................................................................................... 1 ይዘት2. የመልካምነት መርሆዎች .........................................................................................

In [47]:
pdf_path="./drive/MyDrive/Ethiopia_Constitution.pdf"
eng_pages_and_texts = open_and_read_pdf(pdf_path=pdf_path)
eng_pages_and_texts[:2]

0it [00:00, ?it/s]

[{'page_number': 1,
  'page_char_count': 1733,
  'page_word_count': 295,
  'page_sentence_count_raw': 1,
  'page_token_count': 433.25,
  'text': 'Constitution  of  The Federal Democratic Republic of Ethiopia    PREAMBLE    We, the Nations, Nationalities and Peoples of Ethiopia:   Strongly committed, in full and free exercise of our right to self-determination, to  building a political community founded on the rule of law and capable of ensuring  a lasting peace, guaranteeing a democratic order, and advancing our economic  and social development;   Firmly convinced that the fulfillment of this objective requires full respect of  individual and people’s fundamental freedoms and rights, to live together on the  basis of equality and without any sexual, religious or cultural discrimination;   Further convinced that by continuing to live with our rich and proud cultural  legacies in territories we have long inhabited, have, through continuous  interaction on various levels and forms of life

In [45]:
import random

random.sample(eng_pages_and_texts, k=3)

[{'page_number': 23,
  'page_char_count': 1358,
  'page_word_count': 250,
  'page_sentence_count_raw': 15,
  'page_token_count': 339.5,
  'text': '3. All international agreements and relations concluded, established or conducted by  the State shall protect and ensure Ethiopia’s right to sustainable development.   4. The basic aim of development activities shall be to enhance the capacity of  citizens for development and to meet their basic needs.   Article 44  Environmental Rights  1. All persons have the right to a clean and healthy environment.   2. All persons who have been displaced or whose livelihoods have been adversely  affected as a result of State programmes have the right to commensurate monetary  or alternative means of compensation, including relocation with adequate State  assistance.   CHAPTER FOUR  STATE STRUCTURE  Article 45  Form of Government  The Federal Democratic Republic of Ethiopia shall have a parliamentarian form  of government.   Article 46  States of the Fed

In [52]:
import pandas as pd

df = pd.DataFrame(eng_pages_and_texts)
df.head()

Unnamed: 0,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count,text
0,1,1733,295,1,433.25,Constitution of The Federal Democratic Repub...
1,2,1563,282,17,390.75,CHAPTER ONE GENERAL PROVISIONS Article 1 No...
2,3,1658,304,24,414.5,Article 6 Nationality 1. Any person of eithe...
3,4,1520,289,20,380.0,Article 11 Separation of State and Religion ...
4,5,2036,385,23,509.0,Article 16 The Right of the Security of Perso...


In [53]:
df.describe()

Unnamed: 0,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count
count,50.0,50.0,50.0,50.0,50.0
mean,25.5,2094.12,378.34,22.4,523.53
std,14.57738,460.86802,81.131943,6.770283,115.217005
min,1.0,674.0,120.0,1.0,168.5
25%,13.25,1799.75,329.25,18.0,449.9375
50%,25.5,2044.5,370.0,23.0,511.125
75%,37.75,2454.75,438.0,26.0,613.6875
max,50.0,2907.0,518.0,39.0,726.75


In [55]:
from spacy.lang.en import English # see https://spacy.io/usage for install instructions

nlp = English()

# Add a sentencizer pipeline, see https://spacy.io/api/sentencizer/
nlp.add_pipe("sentencizer")
for item in tqdm(eng_pages_and_texts):
    item["sentences"] = list(nlp(item["text"]).sents)

    # Make sure all sentences are strings
    item["sentences"] = [str(sentence) for sentence in item["sentences"]]

    # Count the sentences
    item["page_sentence_count_spacy"] = len(item["sentences"])

  0%|          | 0/50 [00:00<?, ?it/s]

In [56]:
# Inspect an example
random.sample(eng_pages_and_texts, k=1)

[{'page_number': 5,
  'page_char_count': 2036,
  'page_word_count': 385,
  'page_sentence_count_raw': 23,
  'page_token_count': 509.0,
  'text': 'Article 16  The Right of the Security of Person  Every one has the right to protection against bodily harm.   Article 17  Right to Liberty  1. No one shall be deprived of his or her liberty except on such grounds and in  accordance with such procedure as are established by law.   2. No person may be subjected to arbitrary arrest, and no person may be detained  without a charge or conviction against him.   Article 18  Prohibition against Inhuman Treatment  1. Everyone has the right to protection against cruel, inhuman or degrading  treatment or punishment.   2. No one shall be held in slavery or servitude. Trafficking in human beings for  whatever purpose is prohibited.   3. No one shall be required to perform forced or compulsory labour.   4. For the purpose of sub-Article 3 of this Article the phrase "forced or compulsory  labour" shall not 

In [58]:
df = pd.DataFrame(eng_pages_and_texts)
df.describe().round(2)

Unnamed: 0,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count,page_sentence_count_spacy
count,50.0,50.0,50.0,50.0,50.0,50.0
mean,25.5,2094.12,378.34,22.4,523.53,22.12
std,14.58,460.87,81.13,6.77,115.22,6.81
min,1.0,674.0,120.0,1.0,168.5,1.0
25%,13.25,1799.75,329.25,18.0,449.94,17.25
50%,25.5,2044.5,370.0,23.0,511.12,22.5
75%,37.75,2454.75,438.0,26.0,613.69,26.0
max,50.0,2907.0,518.0,39.0,726.75,38.0


In [59]:
# Define split size to turn groups of sentences into chunks
num_sentence_chunk_size = 11

# Create a function that recursively splits a list into desired sizes
def split_list(input_list: list,
               slice_size: int) -> list[list[str]]:
    """
    Splits the input_list into sublists of size slice_size (or as close as possible).

    For example, a list of 17 sentences would be split into two lists of [[10], [7]]
    """
    return [input_list[i:i + slice_size] for i in range(0, len(input_list), slice_size)]

# Loop through pages and texts and split sentences into chunks
for item in tqdm(eng_pages_and_texts):
    item["sentence_chunks"] = split_list(input_list=item["sentences"],
                                         slice_size=num_sentence_chunk_size)
    item["num_chunks"] = len(item["sentence_chunks"])

  0%|          | 0/50 [00:00<?, ?it/s]

In [63]:
# Sample an example from the group (note: many samples have only 1 chunk as they have <=10 sentences total)
random.sample(eng_pages_and_texts, k=1)

[{'page_number': 25,
  'page_char_count': 2601,
  'page_word_count': 452,
  'page_sentence_count_raw': 37,
  'page_token_count': 650.25,
  'text': '4. Residents of Addis Ababa shall in accordance with the provisions of this  Constitution, be represented in the House of Peoples’ Representatives.   5. The special interest of the State of Oromia in Addis Ababa, regarding the  provision of social services or the utilization of natural resources and other similar  matters, as well as joint administrative matters arising from the location of Addis  Ababa within the State of Oromia, shall be respected. Particulars shall be  determined by law.   CHAPTER FIVE  THE STRUCTURE AND DIVISION OF POWERS   Article 50  Structure of the Organs of State  1. The Federal Democratic Republic of Ethiopia comprises the Federal Government  and the State members.   2. The Federal Government and the States shall have legislative, executive and  judicial powers.   3. The House of Peoples’ Representatives is the hi

In [65]:
# Create a DataFrame to get stats
df = pd.DataFrame(eng_pages_and_texts)
df.describe().round(2)

Unnamed: 0,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count,page_sentence_count_spacy,num_chunks
count,50.0,50.0,50.0,50.0,50.0,50.0,50.0
mean,25.5,2094.12,378.34,22.4,523.53,22.12,2.5
std,14.58,460.87,81.13,6.77,115.22,6.81,0.65
min,1.0,674.0,120.0,1.0,168.5,1.0,1.0
25%,13.25,1799.75,329.25,18.0,449.94,17.25,2.0
50%,25.5,2044.5,370.0,23.0,511.12,22.5,2.5
75%,37.75,2454.75,438.0,26.0,613.69,26.0,3.0
max,50.0,2907.0,518.0,39.0,726.75,38.0,4.0


In [72]:
import re

# Split each chunk into its own item
eng_pages_and_chunks = []
for item in tqdm(eng_pages_and_texts):
    for sentence_chunk in item["sentence_chunks"]:
        chunk_dict = {}
        chunk_dict["page_number"] = item["page_number"]

        # Join the sentences together into a paragraph-like structure, aka a chunk (so they are a single string)
        joined_sentence_chunk = "".join(sentence_chunk).replace("  ", " ").strip()
        joined_sentence_chunk = re.sub(r'\.([A-Z])', r'. \1', joined_sentence_chunk) # ".A" -> ". A" for any full-stop/capital letter combo
        chunk_dict["sentence_chunk"] = joined_sentence_chunk

        # Get stats about the chunk
        chunk_dict["chunk_char_count"] = len(joined_sentence_chunk)
        chunk_dict["chunk_word_count"] = len([word for word in joined_sentence_chunk.split(" ")])
        chunk_dict["chunk_token_count"] = len(joined_sentence_chunk) / 4 # 1 token = ~4 characters

        eng_pages_and_chunks.append(chunk_dict)

# How many chunks do we have?
len(eng_pages_and_chunks)

  0%|          | 0/50 [00:00<?, ?it/s]

125

In [75]:
# View a random sample
random.sample(eng_pages_and_chunks, k=1)

[{'page_number': 24,
  'sentence_chunk': 'Article 49 Capital City 1. Addis Ababa shall be the capital city of the Federal State. 2. The residents of Addis Ababa shall have a full measure of self-government. Particulars shall be determined by law. 3. The Administration of Addis Ababa shall be responsible to the Federal Government.',
  'chunk_char_count': 289,
  'chunk_word_count': 48,
  'chunk_token_count': 72.25}]

In [77]:
# Get stats about our chunks
df = pd.DataFrame(eng_pages_and_chunks)
df.describe().round(2)

Unnamed: 0,page_number,chunk_char_count,chunk_word_count,chunk_token_count
count,125.0,125.0,125.0,125.0
mean,25.14,819.18,133.46,204.79
std,13.86,370.08,60.76,92.52
min,1.0,33.0,6.0,8.25
25%,14.0,618.0,99.0,154.5
50%,25.0,830.0,133.0,207.5
75%,37.0,1052.0,172.0,263.0
max,50.0,1787.0,299.0,446.75


In [78]:
# Show random chunks with under 30 tokens in length if they are worth watching
min_token_length = 30
for row in df[df["chunk_token_count"] <= min_token_length].sample(5).iterrows():
    print(f'Chunk token count: {row[1]["chunk_token_count"]} | Text: {row[1]["sentence_chunk"]}')

Chunk token count: 18.5 | Text: 2. Human and democratic rights of citizens and peoples shall be respected.
Chunk token count: 8.25 | Text: On appearing before a court, they
Chunk token count: 22.25 | Text: He exercises overall supervision over the implementation of the country’s foreign policy.
Chunk token count: 26.5 | Text: 3. In all its decisions, the Council of Ministers is responsible to the House of Peoples’ Representatives.
Chunk token count: 18.25 | Text: 4. The armed forces shall at all times obey and respect the Constitution.


##Extract chunks





In [85]:
chunks =[doc["sentence_chunk"] for doc in eng_pages_and_chunks]
chunks[0]

'Constitution of The Federal Democratic Republic of Ethiopia  PREAMBLE  We, the Nations, Nationalities and Peoples of Ethiopia:  Strongly committed, in full and free exercise of our right to self-determination, to building a political community founded on the rule of law and capable of ensuring a lasting peace, guaranteeing a democratic order, and advancing our economic and social development;  Firmly convinced that the fulfillment of this objective requires full respect of individual and people’s fundamental freedoms and rights, to live together on the basis of equality and without any sexual, religious or cultural discrimination;  Further convinced that by continuing to live with our rich and proud cultural legacies in territories we have long inhabited, have, through continuous interaction on various levels and forms of life, built up common interest and have also contributed to the emergence of a common outlook;  Fully cognizant that our common destiny can best be served by rectify

##Define the Generative Ai

In [86]:

from langchain_core.runnables import Runnable
import google.generativeai as genai

genai.configure(api_key="AIzaSyB-6JkVlNsg89fp8tIJfpTwVcVS6g-Y5uQ")
gemini_model = genai.GenerativeModel("gemini-2.0-flash")

class GeminiLLM(Runnable):
    def invoke(self, input, config=None):
        # input is usually a dict with "messages" or a formatted string prompt
        # LangChain passes a dict like {'messages': [HumanMessage(...), ...]}
        if isinstance(input, dict) and "messages" in input:
            # Extract and join message contents
            prompt_str = "\n".join(m.content for m in input["messages"])
        else:
            prompt_str = str(input)

        response = gemini_model.generate_content(prompt_str)
        return response.text


llm = GeminiLLM()


In [87]:

from langchain.embeddings import HuggingFaceEmbeddings
# from langchain.embedddings import OpenAIEmbeddings
# Load a model like all-MiniLM-L6-v2 (fast and small)

# embeddings= OpenAIEmbeddings()

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

  embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")


RuntimeError: Failed to import transformers.integrations.integration_utils because of the following error (look up to see its traceback):
Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
operator torchvision::nms does not exist

In [None]:
embeddings

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='all-MiniLM-L6-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

## Part 1: Overview

[RAG quickstart](https://python.langchain.com/docs/use_cases/question_answering/quickstart)

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(pages)
len(splits)

209

In [None]:
import bs4
from langchain import hub

from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

#### INDEXING ####

# Embed
vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=embeddings)


retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

#### RETRIEVAL and GENERATION ####

# Prompt
prompt = hub.pull("rlm/rag-prompt")

# LLM


# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)





In [None]:
# Question
docs = retriever.get_relevant_documents("ኮንፌዴራላዊ የመንግስት አወቃቀር ?")
# rag_chain.invoke("ሰባዊ መብት ምንድን ነው, if you dont know return me back the docs")
docs[0].page_content,len(docs)

('ነበር? \n➢ ተማሪዎቹ በጥንድ፣በቡድን የተከናወኑ ስራዎችን ምን ያህል አብረው ሰርተዋል? \n➢ በትምህርት  ክፍለ ጊዜ ተማሪዎቹ የሰለቹበት ነጥብ/ጉዳይ  አለ? \nበተጨማሪም \nለጋስነት ማለት ምን ማለት እንደሆነ ተማሪዎቹን  ጠይቋቸው። ተማሪዎቹ የለግስነትን \nአስፈላጊነት እንዲያስታውሱ የተለያዩ ጥያቄዎችን ጠይቁ።ተማሪዎቹ የተግባር ክንውኑን \nሲሰሩ በመከታታል ብቃታቸውን መዝግቡ። የእያንዳንዱን ተማሪ ተሳትፎና ሙከራ \nበመገምገም ግብረ መልስ  በመስጠት የክፍለ ትምህርቱን ማጠቃለያ ስጡ።',
 2)

## Part 2: Indexing

<!-- ![Screenshot 2024-02-12 at 1.36.56 PM.png](attachment:d1c0f19e-1f5f-4fc6-a860-16337c1910fa.png) -->

In [None]:
# Documents
question = "What kinds of pets do I like?"
document = "My favorite pet is a cat."

[Count tokens](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb) considering [~4 char / token](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them)

In [None]:
import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

num_tokens_from_string(question, "cl100k_base")

8

[Text embedding models](https://python.langchain.com/docs/integrations/text_embedding/openai)

In [None]:
# from langchain_openai import OpenAIEmbeddings
# embd = OpenAIEmbeddings()
# query_result = embd.embed_query(question)
# document_result = embd.embed_query(document)
# len(query_result)

from langchain.embeddings import HuggingFaceEmbeddings

# Load a model like all-MiniLM-L6-v2 (fast and small)
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

query_result = embeddings.embed_query("your question here")
document_result = embeddings.embed_query("your document here")

print(len(query_result))  # Should be 384 for MiniLM


Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=1.11.0->sentence-transformers)
 

  embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

384


[Cosine similarity](https://platform.openai.com/docs/guides/embeddings/frequently-asked-questions) is reccomended (1 indicates identical) for OpenAI embeddings.

In [None]:
import numpy as np

def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    return dot_product / (norm_vec1 * norm_vec2)

similarity = cosine_similarity(query_result, document_result)
print("Cosine Similarity:", similarity)

Cosine Similarity: 0.4266147973019836


[Document Loaders](https://python.langchain.com/docs/integrations/document_loaders/)

In [None]:
#### INDEXING ####

# Load blog
import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

In [None]:
len(blog_docs)



1

[Splitter](https://python.langchain.com/docs/modules/data_connection/document_transformers/recursive_text_splitter)

> This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.

In [None]:
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300,
    chunk_overlap=50)

# Make splits
splits = text_splitter.split_documents(blog_docs)

In [None]:
len(splits)

52

In [None]:

type(splits[0]),splits[0]

(langchain_core.documents.base.Document,
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over p

[Vectorstores](https://python.langchain.com/docs/integrations/vectorstores/)

In [None]:
# Index


# from langchain_openai import OpenAIEmbeddings
# from langchain_community.vectorstores import Chroma
# vectorstore = Chroma.from_documents(documents=splits,
#                                     embedding=OpenAIEmbeddings())

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=embeddings)

retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

## Retrival

In [None]:
docs = retriever.get_relevant_documents("What is Task Decomposition?")

  docs = retriever.get_relevant_documents("What is Task Decomposition?")


In [None]:
len(docs)

docs[0]

Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search)

## Part 4: Generation

![Screenshot 2024-02-12 at 1.37.38 PM.png](attachment:f9b0e284-58e4-4d33-9594-2dad351c569a.png)

In [None]:
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

# Prompt
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n'), additional_kwargs={})])

In [None]:
# # LLM
# llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)



from langchain_core.runnables import Runnable
import google.generativeai as genai

genai.configure(api_key="AIzaSyB-6JkVlNsg89fp8tIJfpTwVcVS6g-Y5uQ")
gemini_model = genai.GenerativeModel("gemini-2.0-flash")

class GeminiLLM(Runnable):
    def invoke(self, input, config=None):
        # input is usually a dict with "messages" or a formatted string prompt
        # LangChain passes a dict like {'messages': [HumanMessage(...), ...]}
        if isinstance(input, dict) and "messages" in input:
            # Extract and join message contents
            prompt_str = "\n".join(m.content for m in input["messages"])
        else:
            prompt_str = str(input)

        response = gemini_model.generate_content(prompt_str)
        return response.text

from langchain.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(
    "Use the following context to answer the question:\n\n{context}\n\nQuestion: {question}"
)

llm = GeminiLLM()





In [None]:
# Chain
chain = prompt | llm

In [None]:
# Run
chain.invoke({"context":docs,"question":"What is Task Decomposition?"})



'Task Decomposition is a method of breaking down a complex task into smaller, more manageable steps. It can be done by LLMs with simple prompting, using task-specific instructions, or with human inputs. Chain of Thought (CoT) and Tree of Thoughts are techniques used for task decomposition.\n'

In [None]:
from langchain import hub
prompt_hub_rag = hub.pull("rlm/rag-prompt")

In [None]:
prompt_hub_rag

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})])

[RAG chains](https://python.langchain.com/docs/expression_language/get_started#rag-search-example)

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is Task Decomposition?")