
# Retrieval-Only RAG (In-Memory) with **LlamaIndex** on *Tiny Shakespeare*

This notebook will show a ** RAG pipeline focused on retrieval only**
We use **LlamaIndex** with an **in-memory vector store** and a single **chunking technique** (`SentenceSplitter`).

**What you'll see**
1. Download Tiny Shakespeare.
2. Chunk the text with `SentenceSplitter` (size 1000, overlap 100).
3. Build an in-memory `VectorStoreIndex` with **HuggingFace** embeddings.
4. Run retrieval: compute a query embedding, fetch top-k chunks, and inspect similarities & previews.

> Data source: [Tiny Shakespeare (raw)](https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt)


## 0) Setup

In [1]:

# If running in a new environment, uncomment:
%pip install -q llama-index llama-index-core llama-index-embeddings-huggingface sentence-transformers faiss-cpu tiktoken matplotlib scikit-learn requests

import os, time, textwrap
from pathlib import Path
import requests

from llama_index.core import Document, VectorStoreIndex, Settings, StorageContext
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.vector_stores import SimpleVectorStore

import numpy as np
import pandas as pd


You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.




## 1) Download Tiny Shakespeare

In [2]:

URL = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
DATA_PATH = Path("tinyshakespeare.txt")

if not DATA_PATH.exists():
    print("Downloading dataset ...")
    r = requests.get(URL, timeout=60)
    r.raise_for_status()
    DATA_PATH.write_text(r.text, encoding="utf-8")
    print("Saved to", DATA_PATH.resolve())
else:
    print("Using cached", DATA_PATH.resolve())

raw_text = DATA_PATH.read_text(encoding="utf-8")
print("Characters in corpus:", len(raw_text))
print("First 400 chars:\n", raw_text[:400])


Downloading dataset ...
Saved to /Users/spartan/Documents/236/Assignment/Assignment 3/tinyshakespeare.txt
Characters in corpus: 1115394
First 400 chars:
 First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

All:
We know't, we know't.

First Citizen:
Let us kill him, and we'll have corn at our own price.
Is't a verdict?

All:
No more talking on't; let it 


## 2) Configure Embeddings (HuggingFace) + No LLM

In [3]:

EMBED_MODEL_NAME = os.environ.get("EMBED_MODEL", "sentence-transformers/all-MiniLM-L6-v2")

for var in ["HF_TOKEN", "HUGGINGFACEHUB_API_TOKEN", "HUGGINGFACE_HUB_TOKEN"]:
    os.environ.pop(var, None)

Settings.embed_model = HuggingFaceEmbedding(model_name=EMBED_MODEL_NAME)


print("Embedding model:", EMBED_MODEL_NAME)
# Sanity: compute one embedding to show shape
q_test = "Who is Romeo in love with?"
vec = Settings.embed_model.get_text_embedding(q_test)
print(" embedding length:", len(vec))


Embedding model: sentence-transformers/all-MiniLM-L6-v2
 embedding length: 384


## 3) Chunking with `SentenceSplitter`

In [4]:

# Single chunking strategy
splitter = SentenceSplitter(chunk_size=1000, chunk_overlap=100)

doc = Document(text=raw_text)

# Convert to nodes
nodes = splitter.get_nodes_from_documents([doc])
print(f"Created {len(nodes)} nodes. Example preview:")
print(textwrap.shorten(nodes[0].get_content().replace("\n", " "), width=200))


Created 338 nodes. Example preview:
First Citizen: Before we proceed any further, hear me speak. All: Speak, speak. First Citizen: You are all resolved rather to die than to famish? All: Resolved. resolved. First Citizen: First, [...]


## 4) Build an **in-memory** Vector Index

In [5]:
# Create a simple in-memory vector store & storage context
simple_vs = SimpleVectorStore()  # in memory for this session
storage_context = StorageContext.from_defaults(vector_store=simple_vs)

# Build the VectorStoreIndex directly from our nodes
t0 = time.time()
index = VectorStoreIndex(nodes, storage_context=storage_context, show_progress=True)
build_sec = round(time.time() - t0, 2)

print("Index built in", build_sec, "s")
print("Vector store type:", type(index.vector_store).__name__)

n_vectors = None
vs = index.vector_store

if hasattr(vs, "to_dict"):
    d = vs.to_dict()
    for key in ("embedding_dict", "id_to_embedding", "doc_id_to_embedding"):
        if isinstance(d.get(key), dict):
            n_vectors = len(d[key])
            break

print("Stored embeddings:", n_vectors if n_vectors is not None else f"(unknown; nodes={len(nodes)})")


Generating embeddings:   0%|          | 0/338 [00:00<?, ?it/s]

Index built in 2.2 s
Vector store type: SimpleVectorStore
Stored embeddings: 338


## 5) Retrieval-only: inspect embeddings, scores, and text previews

In [6]:


def retrieve_only(query: str, k: int = 4):
    # Get a retriever from the index (no LLM involved)
    retriever = index.as_retriever(similarity_top_k=k)
    results = retriever.retrieve(query)  # list[NodeWithScore]

    # Get query embedding
    q_vec = np.array(Settings.embed_model.get_text_embedding(query))

    # Build a table with store similarity score and cosine similarity we compute
    rows = []
    d_vecs = []
    for rank, r in enumerate(results, start=1):
        text = r.node.get_content()
        src = r.node.metadata.get("source", "tinyshakespeare")
        d_vec = np.array(Settings.embed_model.get_text_embedding(text))
        d_vecs.append(d_vec)
        rows.append({
            "rank": rank,
            "store_score": round(r.score, 6),
            "chunk_len": len(text),
            "preview": text,
            "source": src,
        })
        print(text,"\n ---------------------- END -------------------- \n")

    import pandas as pd
    df = pd.DataFrame(rows)
    print(f"Query embedding shape: {q_vec.shape}  (dim={q_vec.size})")
    if len(d_vecs):
        print(f"Doc embeddings shape: ({len(d_vecs)}, {d_vecs[0].size})")
        print("Query embedding first 8 values:", np.round(q_vec[:8], 4))
    return df, q_vec, np.array(d_vecs) if len(d_vecs) else None

q = "who is Juliet in love with?"
print("\n Query:", q)
df, qv, dvs = retrieve_only(q, k=4)
print(df)



 Query: who is Juliet in love with?
Nurse:
What's this? what's this?

JULIET:
A rhyme I learn'd even now
Of one I danced withal.

Nurse:
Anon, anon!
Come, let's away; the strangers all are gone.

Chorus:
Now old desire doth in his death-bed lie,
And young affection gapes to be his heir;
That fair for which love groan'd for and would die,
With tender Juliet match'd, is now not fair.
Now Romeo is beloved and loves again,
Alike betwitched by the charm of looks,
But to his foe supposed he must complain,
And she steal love's sweet bait from fearful hooks:
Being held a foe, he may not have access
To breathe such vows as lovers use to swear;
And she as much in love, her means much less
To meet her new-beloved any where:
But passion lends them power, time means, to meet
Tempering extremities with extreme sweet.

ROMEO:
Can I go forward when my heart is here?
Turn back, dull earth, and find thy centre out.

BENVOLIO:
Romeo! my cousin Romeo!

MERCUTIO:
He is wise;
And, on my lie, hath stol'n hi