# Llama Index Framework
This notebook is dedicated to learning about the Llama Index Framework for creating Agents

In [13]:
# Make sure to setup your environment currectly 
# For this project I used python 3.12.3 

# Since we are using huggingface models, we need to install the huggingface_hub library
%pip install -q llama-index-llms-huggingface-api llama-index-embeddings-huggingface
%pip install -q lmstudio
%pip install -q load-dotenv

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


In [15]:
# Test out llm studio sdk
import lmstudio as lms

#List all model avialable locally
lmstudio_downloaded_models = lms.list_downloaded_models()
for model in lmstudio_downloaded_models:
    print(f"Downloaded model: {model}")
    
# List LLM's only
lmstudio_llms_only = lms.list_downloaded_models("llm")
for llm in lmstudio_llms_only:
    print(f"LLM: {llm}")
    
# List embeddings only
lmstudio_embeddings_only = lms.list_downloaded_models("embedding")
for embedding in lmstudio_embeddings_only:
    print(f"Embedding: {embedding}")



LMStudioWebsocketError: 
    LM Studio is not reachable at ws://localhost:1234/system (due to httpx.ConnectError: All connection attempts failed).
    Is LM Studio running?

In [16]:
import lmstudio as lms

lmstudio_llm = 78 

In [17]:
# Structured output
import lmstudio as lms
from pydantic import BaseModel

class Book(BaseModel):
    title: str
    author: str
    year: int
    rating: float

lmstudio_llm = lms.llm() # Gets the currrent loaded model

prompt = "Tell me about the book 'The Great Gatsby"

response = lmstudio_llm.respond(
    prompt,
    response_format=Book
)

print(response)


LMStudioWebsocketError: 
    LM Studio is not reachable at ws://localhost:1234/llm (due to httpx.ConnectError: All connection attempts failed).
    Is LM Studio running?

In [18]:
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Get Hugging Face token from environment variable
hf_token = os.environ.get("HF_TOKEN")
if hf_token:
    print("HF_TOKEN found in environment variables")
else:
    raise ValueError("HF_TOKEN not found in environment variables. Please add it to your .env file")

# Login to Hugging Face
import huggingface_hub
# huggingface_hub.login(token=hf_token)


HF_TOKEN found in environment variables


In [19]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from dotenv import load_dotenv
import os

load_dotenv()

# model_name = "Qwen/Qwen2.5-Coder-32B-Instruct" # This is if you want to use a model from huggingface
model_name = "meta-llama/Meta-Llama-3-8B-Instruct"

llm = HuggingFaceInferenceAPI(
    model_name=model_name,
    temperature=0.7,
    max_tokens=1000,
    token=os.environ.get("HF_TOKEN"),
)


llm.complete("Hello, how are you?")



CompletionResponse(text=' I hope you\'re having a fantastic day!\n\nToday, I wanted to share with you a simple yet powerful JavaScript function that can help you to check if an object is an instance of another object.\n\nLet\'s get started!\n\n**The Function:**\n```javascript\nfunction isInstanceOf(obj, target) {\n  return Object.getPrototypeOf(obj) === target.prototype;\n}\n```\n**How it works:**\n\nThe `isInstanceOf` function takes two arguments: `obj` (the object you want to check) and `target` (the object you want to check against).\n\nThe function uses the `Object.getPrototypeOf()` method to get the prototype of the `obj` object. Then, it compares the result with the `prototype` property of the `target` object using the `===` operator.\n\nIf the prototype of `obj` is equal to the `prototype` property of `target`, it means that `obj` is an instance of `target`, and the function returns `true`. Otherwise, it returns `false`.\n\n**Example usage:**\n```javascript\nclass Person {\n  co

# Key stages in RAG pipeline
1. Loading
2. Idexing
3. Storing
4. Querying
5. Evaluation

In [20]:
# Load data
from llama_index.core import SimpleDirectoryReader
from load_dotenv import load_dotenv
import os

load_dotenv()
directory_path = os.environ.get("DOCUMENTS_DIR")

reader = SimpleDirectoryReader(input_dir=directory_path)
documents = reader.load_data()
print(f"Found {len(documents)} documents")

ValueError: Directory ./documents does not exist.

## Document Processing and Node Creation

After loading our documents, we need to break them into smaller pieces called Node objects. A Node is just a chunk of text from the original document that's easier for the AI to work with, while it still has references to the original Document object.

The IngestionPipeline helps us create these nodes through two key transformations:

1. **SentenceSplitter**: Breaks down documents into manageable chunks by splitting them at natural sentence boundaries.
2. **HuggingFaceEmbedding**: Converts each chunk into numerical embeddings - vector representations that capture the semantic meaning in a way AI can process efficiently.

This process helps us organise our documents in a way that's more useful for searching and analysis.

In [6]:
from llama_index.core import Document
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline

chunk_size = 1000
chunk_overlap = 0
# embedding_model = "sentence-transformers/all-MiniLM-L6-v2"
embedding_model = "BAAI/bge-small-en-v1.5"

# Create a pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap),
        HuggingFaceEmbedding(model_name=embedding_model)
    ]
)

# Apply the pipeline to our documents
nodes = await pipeline.arun(documents=[Document.example()])

  from .autonotebook import tqdm as notebook_tqdm


Storing and indexing documents
After creating our Node objects we need to index them to make them searchable, but before we can do that, we need a place to store our data.

Since we are using an ingestion pipeline, we can directly attach a vector store to the pipeline to populate it. In this case, we will use Chroma to store our documents.

In [7]:
%pip install llama-index-vector-stores-chroma

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


In [21]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from load_dotenv import load_dotenv
import os

# Initialize ChromaDB
chroma_collection_name = "rag_collection"
db = chromadb.PersistentClient(path=os.environ.get("CHROMA_DB_PATH"))
chroma_collection = db.get_or_create_collection(name=chroma_collection_name)
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# Define the pipeline
chunk_size = 1000
chunk_overlap = 0
embedding_model = "BAAI/bge-small-en-v1.5"

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap),
        HuggingFaceEmbedding(model_name=embedding_model)
    ],
    vector_store=vector_store,
)

This is where vector embeddings come in - by embedding both the query and nodes in the same vector space, we can find relevant matches. The VectorStoreIndex handles this for us, using the same embedding model we used during ingestion to ensure consistency.

Let’s see how to create this index from our vector store and embeddings:

In [27]:
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from load_dotenv import load_dotenv
import os

load_dotenv()

embedding_model_name = os.environ.get("EMBEDDING_MODEL")
if embedding_model_name:
    print(f"Embedding model name: {embedding_model_name}")
else:
    raise ValueError("Embedding model name not found in environment variables. Please add it to your .env file")

embedding_model = HuggingFaceEmbedding(model_name=embedding_model_name)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store, embed_model=embedding_model)

# # Querying the index
# query_engine = index.as_query_engine()
# response = query_engine.query("What is the main idea of the document?")
# print(response)


Embedding model name: BAAI/bge-small-en-v1.5


In [None]:
# Lets make a query to the index
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from load_dotenv import load_dotenv
import os

load_dotenv()

huggingface_model_name = os.environ.get("HUGGINGFACE_MODEL")
if huggingface_model_name:
    print(f"Huggingface model name: {huggingface_model_name}")
else:
    raise ValueError("Huggingface model name not found in environment variables. Please add it to your .env file")

huggingface_llm = HuggingFaceInferenceAPI(model_name=huggingface_model_name)
query_engine = index.as_query_engine(
    llm=huggingface_llm,
    response_mode="tree_summarize",
)

query_engine.query("What is the main idea of the document?")

Huggingface model name: Qwen/Qwen2.5-Coder-32B-Instruct


Response(response='Empty Response', source_nodes=[], metadata=None)