![title](img/LlamaIndex.png)

LlamaIndex is a versatile and powerful data framework that empowers developers to connect custom data sources to large language models. 
It acts as crucial bridge, enabling LLMs to understand and reason over private or domain specific data, unlocking a wealth of potential for building 
LLM applications. Here is how LlamaIndex acheives this, broken down into its core functionalities.

1. **Loading:** LlamIndex simplifies the process of ingenstion data from a wide variety of formats including databases, documents, API. It can load data from 160+ data sources and data formats (APIs, PDFs, Word documents, SQL etc.)
2. **Indexing:** It then efficiently indexes the data creating structured representations that LLMs can easily understand. It intergrates with 40+ vector stores for similarity searches, document store for storing raw data, graph store for representing relationships and SQL db for structured data.
3. **Querying:** LlamaIndex excels at retrieving relevant context from the indexed data. It simplifies orchestration of production-ready LLM workflows by efficently reteriving relevant context from the data whether you are building simple prompt chains, advanced Retrieval Augmented Generation (RAG) systems, ore even autonomous agents.
4. **Evaluating:** Streamline the evalution of the LLM output with a comprehensive suit of modules. This evaluation capability is essential for fine-tuning your applications and ensuring they meet your performance requirements

## Applications:

This framework is instrumental in building applications such as:
* Intelligent chatbots that can answer questions based on your company's internal knowledge base.
* Data-driven analysis tools that can extract insights from complex datasets.
* Personalized content generation systems that can create tailored content based on user preferences and data.

### How to use LlamaIndex using local LLMs

#### 1. Install the required libraries and Ollama


!pip install llama-index-core llama-index-readers-file llama-index-llms-ollama llama-index-embeddings-huggingface


We will use *BAAI/bge-base-en-v1.5* as our embedding model and *llama3.2* served through Ollama.

Ollama is user-friendly tool designed to help setup LLMs on local machines. Moving beyond cloud and api calls, Ollama is great for experimenting with and integrating cutting edge models into various applications prioritizing privacy and local capabilities. 

**Key benefits**

1. *Enhanced Privacy:* Sensitive data remains local ensuring greater control and security.
2. *Faster inference:* Reduced latency due to elimination of netwwork communication.
3. *Offline functionality:* Applications can continue to work even without internet connection.
4. *Cost effectiveness:* Eliminates the recurring cost associated with cloud based LLM services.

To install Ollama follow the [README](https://github.com/ollama/ollama?tab=readme-ov-file)

#### 2 Startup Ollama and choose the model

##### 2.1 Define the LLM model

In [100]:
# Choose the model
#OLLAMA_MODEL = 'llama3.2:1b' # 1B parameters
#OLLAMA_MODEL = 'llama3.2' # 3B parameters
#OLLAMA_MODEL = 'gemma3' # 4B parameters
OLLAMA_MODEL = 'llama3.1:latest'

# set it at the os level
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ['OLLAMA_MODEL'] = OLLAMA_MODEL
!echo $OLLAMA_MODEL


llama3.1:latest


##### 2.2 Start Ollama

In [102]:
# Locally start Ollama in the background process
import subprocess
import time
import requests

OLLAMA_COMMAND = "ollama serve"
ollama_process = None

# Use subprocess.Popen to start the process in the background
try:
    ollama_process = subprocess.Popen(
        OLLAMA_COMMAND.split(),
        stdout = subprocess.DEVNULL, 
        stderr = subprocess.DEVNULL
    )
    print(f" ✅ Ollama server started with PID {ollama_process.pid}")
except Exception as e:
    print(f"❌ Error starting Ollama: {e}")
    exit(1)

time.sleep(5)

# Check if server is up and running
try:
    response = requests.get("http://localhost:11434") # Default Ollama port
    if response.status_code == 200:
        print("✅ Ollama server is up and running!")
    else:
        print(f"⚠️ Server responded with status code:{response.status_code}")
except requests.exceptions.ConnectionError:
    print("❌ Could not connect Ollama server. Check if it is running correctly.")

 ✅ Ollama server started with PID 23468
✅ Ollama server is up and running!


#### 3. Read the libraries

In [103]:
ollama_process.pid

23468

In [104]:
import asyncio
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core.agent.workflow import AgentWorkflow

##### 3.1 Call Ollama from LlamaIndex

In [106]:
from IPython.display import Markdown, display

llm = Ollama(model=OLLAMA_MODEL, request_timeout=360)

# Run the completion
response = llm.complete(
    "Who is Sachin Tendulkar and why is he called the master blaster? Summarize in 3 bullet points"
)

# Display the response as Markdown
display(Markdown(response.text))

Here are three bullet points summarizing who Sachin Tendulkar is and why he's known as the "Master Blaster":

• **International Cricketer**: Sachin Ramesh Tendulkar is a legendary Indian cricketer who played for the Indian national team from 1989 to 2013. He holds numerous records in international cricket, including becoming the highest run-scorer in both Test and One-Day International (ODI) formats.

• **Batting Genius**: Known for his exceptional batting skills, Tendulkar is renowned for his ability to score runs in all conditions, against all types of bowling. He was particularly effective at the top of the order, where he would often set up big partnerships and lay a solid foundation for his team's innings.

• **Master Blaster Nickname**: The "Master Blaster" nickname was given to Tendulkar due to his incredible batting prowess, which earned him widespread recognition as one of the greatest batsmen in cricket history. He is credited with revolutionizing Indian cricket and inspiring a generation of young cricketers, cementing his place as an iconic figure in world cricket.

#### 4. Data Ingestion and Indexing

In [107]:
# Load all documents from data directory and create an index
documents = SimpleDirectoryReader("data").load_data()

index = VectorStoreIndex.from_documents(
    documents, 
    embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
)

#### 5. Query Engine

In [108]:
# Create a query engine
query_engine = index.as_query_engine(
    llm=Ollama(model=OLLAMA_MODEL, request_timeout=360.0)
)

#### 6. Tool definitions

In [109]:
# Define a simple calculator tool
def multiply(a: float, b: float) -> int:
    """Useful for multiplying two numbers."""
    return int(a * b)

# Define a document search function
async def search_documents(query: str) -> str:
    """Search documents using RAG."""
    response = await query_engine.aquery(query)
    return str(response)

#### 7. Agent Workflow

In [136]:
# Create an agent workflow
agent = AgentWorkflow.from_tools_or_functions(
    [search_documents, multiply],
    llm=Ollama(model=OLLAMA_MODEL, request_timeout=360.0),
    system_prompt=(
        "You are a helpful assistant that can multiply two numbers "
        "and search through documents to answer questions."
    )
)

# Main function
async def main():
    response = await agent.run(
        "What is Natural Language Processing, give me a defination from the document. Also what is 8 * 8?. Use tools. Give a detailed answer."
    )

    # Safely get content from agent response
    if hasattr(response, "response") and hasattr(response.response, "content"):
        final_text = response.response.content
    elif hasattr(response, "text"):
        final_text = response.text
    else:
        final_text = str(response)
    
    display(Markdown(final_text))
    #print(response)

# Run the agent properly
if __name__ == "__main__":
    await (main())


Based on the tool call responses, I can provide you with a detailed answer.

Natural Language Processing (NLP) is a rapidly evolving field that seeks to understand, interpret, and generate human language using computational methods. This definition comes from various sources, including academic papers and online articles related to NLP.

Regarding your second question, 8 multiplied by 8 equals 64.

## Storing the RAG Index

 Save the LlamaIndex to a folder (storage in this case) avoid reprocessing the documents everytime we run the code.

In [None]:
index.storgae_context.persist("storage")

# later load the index
from llama_index.core import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir="storage")

index = load_index_from_storage(
    storage_context,
    embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
)

query_engine = index.as_query_engine(
    llm=Ollama(model=OLLAMA_MODEL, request_timeout=360.0)
)