<a href="https://colab.research.google.com/github/Aditya100300/LLMs_from_scratch/blob/main/Chapter_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Cell 1: Installation of Dependencies

# We install multiple libraries in one cell:
# - unstructured (for reading text-based docs, e.g. PDFs, etc.)
# - sentence_transformers (for local embedding models if you want)
# - openai (only if you want to do OpenAI calls, though not required)
# - plotly (just in case we want to do some charting)
# - langchain (the main library for retrieval-based LLM stuff)
# - tiktoken (helps handle token counting for LLMs)
# - matplotlib (common Python library for plotting)
# - And more advanced (gradio, pdf2image, wandb, chromadb, faiss-gpu, etc.)

!pip install sentence_transformers openai unstructured
!pip install plotly
!pip install langchain
!pip install tiktoken
!pip install matplotlib

# This line uses %pip to quietly install an extended set of libs (like chromadb, faiss, etc.)
# If you see many lines of output, it's just installing packages. Should be fine.
%pip install -Uqqq rich openai tiktoken wandb langchain unstructured tabulate pdf2image chromadb gradio faiss-gpu

# Additional library for text link detection, if needed
!pip install linkify-it-py


Collecting unstructured
  Downloading unstructured-0.17.2-py3-none-any.whl.metadata (24 kB)
Collecting filetype (from unstructured)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting python-magic (from unstructured)
  Downloading python_magic-0.4.27-py2.py3-none-any.whl.metadata (5.8 kB)
Collecting emoji (from unstructured)
  Downloading emoji-2.14.1-py3-none-any.whl.metadata (5.7 kB)
Collecting dataclasses-json (from unstructured)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting python-iso639 (from unstructured)
  Downloading python_iso639-2025.2.18-py3-none-any.whl.metadata (14 kB)
Collecting langdetect (from unstructured)
  Downloading langdetect-1.0.9.tar.gz (981 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting rapidfuzz (from unstructured)
  Downloading rapidfuzz-3.12.2-cp311

In [2]:
# Cell 2: Imports and Basic Setup

import os, sys
import pandas as pd
import numpy as np
from pathlib import Path
import torch
import csv

# Because CSV lines can be large, let's raise the field_size_limit
csv.field_size_limit(sys.maxsize)

# Optional: If you want to store or type your OpenAI key for certain calls
# from getpass import getpass
# openai_key = getpass("Enter your OpenAI API key...")

print("Libraries imported successfully!")


Libraries imported successfully!


In [3]:
# Cell 3: Basic Example of Sentence Transformers + BERT

# This step is optional if you want to see how to do local embedding with 'bert-base-uncased'
# Usually we prefer 'all-MiniLM-L6-v2' or 'all-mpnet-base-v2' from sentence_transformers

from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

text = "This is a short example text to see if BERT encodes it."
encoded = tokenizer.encode(text, return_tensors='pt')

with torch.no_grad():
    output = model(encoded)
    # We can get the average (mean) of last_hidden_state as a rough sentence embedding
    sentence_embedding = output.last_hidden_state.mean(dim=1)

print("Embedding shape:", sentence_embedding.shape)
print("Done local BERT example!")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Embedding shape: torch.Size([1, 768])
Done local BERT example!


In [4]:
# Cell 4: A sample text to demonstrate chunking or summarizing.
# You can load from a local file or any text source. We'll store it in a variable.

example_text = '''I would like to get your thoughts on the bond yield increase this week.
I'm not worried about the market downturn, but the sudden increase in yields.
On 2/16, the 10-year bond yields increased by almost 9%, and on 2/19, the yield increased by 5%.
Some experts recall the 'taper tantrum' of 2013, where bond prices crashed after the Fed
announced it would begin tapering quantitative easing.
Others see it as a normal feature of an economic recovery.
'''

with open('example_text.txt', 'w') as f:
    f.write(example_text)

print("Sample text saved to example_text.txt")


Sample text saved to example_text.txt


In [6]:
pip install -U langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.20-py3-none-any.whl.metadata (2.4 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Downloading langchain_community-0.3.20-py3-none-any.whl (2.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m28.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx_sse-0.4.0-py3-none-any.whl (7.8 kB)
Downloading pydantic_settings-2.8.1-py3-none-any.whl (30 kB)
Downloading python_dotenv-1.1.0-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv, httpx-sse, pydantic-settings, langchain-community
Succes

In [7]:
# Cell 5: Using LangChain's text loader + text splitter

# If we want to load a text file from disk, we can do:
from langchain.document_loaders import TextLoader
loader = TextLoader("example_text.txt")

# We'll load the text as a Document
docs = loader.load()
print(f"We have {len(docs)} document(s). First doc length in chars:", len(docs[0].page_content))

# Now let's chunk it using a text splitter
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=0)
splits = text_splitter.split_documents(docs)
print(f"Number of chunked splits: {len(splits)}")

print("First chunk:\n", splits[0].page_content)


We have 1 document(s). First doc length in chars: 456
Number of chunked splits: 6
First chunk:
 I would like to get your thoughts on the bond yield increase this week.


In [10]:
# Cell 6: Creating a Vector Store with an open-source embedding (like 'all-MiniLM-L6-v2')
!pip install chromadb
from langchain.embeddings import HuggingFaceEmbeddings
# We'll pick a local model name (this is quite good for small tasks)
model_name = "sentence-transformers/all-mpnet-base-v2"

# This can be large, but let's do it.
local_embedder = HuggingFaceEmbeddings(model_name=model_name)

# Now let's create a Chroma vector store from these splits
from langchain.vectorstores import Chroma

db = Chroma.from_documents(splits, embedding=local_embedder)
retriever = db.as_retriever()

print("Vector store created with local embeddings!")


Collecting chromadb
  Using cached chromadb-0.6.3-py3-none-any.whl.metadata (6.8 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb)
  Downloading chroma_hnswlib-0.7.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting fastapi>=0.95.2 (from chromadb)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb)
  Downloading uvicorn-0.34.0-py3-none-any.whl.metadata (6.5 kB)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-3.23.0-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.21.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.31.1-p

In [11]:
# Cell 7: Running a quick retrieval-based Q&A with a local or open-source LLM

# For a small open-source LLM, we might use the new 'llama-2' from Hugging Face, or 'OpenAssistant'.
# But for demonstration, let's just do a "fake" or "prompt" approach, or skip the generative step.

query = "Why did bond yields spike in 2013, and is it relevant to 2023?"
docs = retriever.get_relevant_documents(query)
print(f"Top doc chunks returned: {len(docs)}")

for i,doc in enumerate(docs):
    print(f"\n--- Chunk {i+1} (score unknown) ---\n{doc.page_content}")


  docs = retriever.get_relevant_documents(query)


Top doc chunks returned: 4

--- Chunk 1 (score unknown) ---
I'm not worried about the market downturn, but the sudden increase in yields.

--- Chunk 2 (score unknown) ---
I would like to get your thoughts on the bond yield increase this week.

--- Chunk 3 (score unknown) ---
Some experts recall the 'taper tantrum' of 2013, where bond prices crashed after the Fed

--- Chunk 4 (score unknown) ---
On 2/16, the 10-year bond yields increased by almost 9%, and on 2/19, the yield increased by 5%.


In [13]:
# Cell 8: Optionally, if you do want to use an open source LLM for generation, let's say ChatGLM2 or Dolly
# We'll show how to do it with "OpenAIChat" if you have an API key, but the logic is the same for huggingface_hub.

import os
os.environ["OPENAI_API_KEY"] = "YOUR_KEY_HERE"

from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
# We'll create a simple chain
chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(temperature=0.0),
    chain_type="stuff",
    retriever=retriever
)

result = chain.run(query)
print(result)


RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

In [14]:
# Cell 9: Summarizing the entire text with a chain

# Another example: Summarize the entire text with a "MapReduce" chain
from langchain.chains.summarize import load_summarize_chain
from langchain.llms import OpenAI
llm = OpenAI(temperature=0.0)

# If we want to do a naive approach with all docs:
summary_chain = load_summarize_chain(llm, chain_type="map_reduce")
summary = summary_chain.run(docs)
print(summary)


  llm = OpenAI(temperature=0.0)


RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

In [16]:
!pip install gradio

Collecting gradio
  Using cached gradio-5.23.3-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.8.0 (from gradio)
  Downloading gradio_client-1.8.0-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3 (from gradio)
  Downloading ruff-0.11.2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)
Collecting safehttpx<0.2.0,>=0.1.6 (from gradio)
  Downloading safehttpx-0.1.6-py3-none-any.whl.metadata (4.2 kB)
Collecting semantic-version~=2.0

In [17]:
# Cell 10: Provide a Gradio UI for user queries (all open-source)

import gradio as gr

def answer_question(query):
    # 1) We do a retrieval
    # 2) Summarize or produce answer
    docs = retriever.get_relevant_documents(query)
    if len(docs)==0:
        return "No relevant chunks found. Not sure."

    # For demonstration, let's just combine them:
    combined_text = "\n".join(d.page_content for d in docs)

    # We'll do a naive LLM call with ChatOpenAI or skip if no key
    llm = ChatOpenAI(temperature=0.0)
    prompt = f"""
    You are a helpful Q&A system.
    The user asked: {query}
    We have these relevant text chunks:
    {combined_text}
    Provide a short answer from the text. If not enough info, say "I don't know."
    """
    resp = llm.predict(prompt)
    return resp

demo = gr.Interface(
    fn=answer_question,
    inputs="text",
    outputs="text",
    title="Open-source RAG Demo",
    description="Ask a question about bond yields or other text content loaded."
)

demo.launch(debug=False, share=True)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://30e5ab67475101e324.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




Summary:

Installing open-source libraries (LangChain, unstructured, Hugging Face, etc.) so all code can run in Colab with no proprietary dependencies.

Demonstrate Local Embeddings via the sentence_transformers approach or the brand-new HuggingFaceEmbeddings.

Building a Chroma vector store for storing chunked doc embeddings.

A Retriever to get relevant chunks from user queries.

Optionally using an open-source LLM for final generation – or if you have an OpenAI key, we show how to call GPT.

A Gradio interface to unify the pipeline for easy user queries.