This notebook 
1. Creates 3 Query engines for each category of documents. 
2. Then uses Query Router to route the query to the corresponding Query Engine. 
3. Added Agents.

In [1]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings, StorageContext, load_index_from_storage
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core.node_parser import SentenceSplitter
from llama_index.vector_stores.faiss import FaissVectorStore
from IPython.display import Markdown, display
from llama_index.core.tools import QueryEngineTool
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

HF_TOKEN = "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"

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
import mlflow
import time
import dagshub
import faiss
import nest_asyncio
# import openai

nest_asyncio.apply()

In [3]:
# Connecting to DagsHub repo
dagshub.init(repo_owner='Omdena', repo_name='SriLankaChapter_RegulatoryDecisionMaking', mlflow=True)

In [4]:
# Connecting to Mlflow
mlflow.start_run(run_name = "LlamaIndex - Seventh experiment.")

<ActiveRun: >

In [23]:
# Ollama
Settings.llm = Ollama(model="llama3.2:1b", request_timeout=360.0)
# bge-base embedding model
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

In [6]:
# Loading the documents
def load_documents(directory_path):
    reader = SimpleDirectoryReader(directory_path)
    documents = reader.load_data()
    return documents

In [24]:
# Build Vector Index for the documents
def build_index(documents):
    text_splitter = SentenceSplitter(chunk_size=512, chunk_overlap=75)
    # Text splitter
    Settings.text_splitter = text_splitter

    # dimensions of BAAI/bge-base-en-v1.5
    d = 768
    faiss_index = faiss.IndexFlatL2(d)
    vector_store = FaissVectorStore(faiss_index=faiss_index)
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context,
    transformations=[text_splitter]
    )

    return index

In [8]:
# Save Into Faiss Vector Store
def save_index(index, dir_path):
    index.storage_context.persist(persist_dir=dir_path)

In [9]:
mlflow.log_param("embedding_model", "BAAI/bge-base-en-v1.5")
mlflow.log_param("LLM", "llama3.2:1b")
mlflow.log_param("Vector Store", "FAISS")

'FAISS'

In [25]:
# Load index from disk
def load_index(dir_path):
    vector_store = FaissVectorStore.from_persist_dir(dir_path)
    storage_context = StorageContext.from_defaults(
        vector_store=vector_store, persist_dir=dir_path, 
    )
    index = load_index_from_storage(storage_context=storage_context)
    return index

In [11]:
# Query the Index
def query_index(index, user_input):
    query_engine = index.as_query_engine()

    start_time = time.time()
    response = query_engine.query(user_input)
    latency = time.time() - start_time
    print(response)
    return response, latency

In [12]:
import logging

# Set up logging
# logging.basicConfig(level=logging.DEBUG)
logging.basicConfig(level=logging.ERROR)

In [13]:
circular_path = "./storage/circular_vectorstore"
regulation_path = "./storage/regulation_vectorstore"
other_path = "./storage/other_vectorstore"

In [33]:
circular_docs = load_documents(".\data\TRI")
regulation_docs = load_documents(".\data\Tea_Board")
other_docs = load_documents(".\data\Others")

circular_index = build_index(circular_docs)
regulation_index = build_index(regulation_docs)
other_index = build_index(other_docs)

In [34]:
# Saving the indices.
save_index(circular_index, circular_path)
save_index(regulation_index, regulation_path)
save_index(other_index, other_path)

In [35]:
circular_index = load_index(circular_path)
regulation_index = load_index(regulation_path)
other_index = load_index(other_path)

In [36]:
# Building Query Engines for different indices
circular_query_engine = circular_index.as_query_engine()
regulation_query_engine = regulation_index.as_query_engine()
other_query_engine = other_index.as_query_engine()

In [37]:
# Creating Query Engine Tools for different Query Engines
circular_tool = QueryEngineTool.from_defaults(
    query_engine=circular_query_engine,
    description=(
        "Useful for retrieving specific context from the Circulars."
    ),
)
regulation_tool = QueryEngineTool.from_defaults(
    query_engine=regulation_query_engine,
    description=(
        "Useful for retrieving specific context from the Regulations."
    ),
)
other_tool = QueryEngineTool.from_defaults(
    query_engine=other_query_engine,
    description=(
        "Useful for retrieving specific context from the Other documents."
    ),
)

In [29]:
# Defining Router Query Engine
Settings.llm = Ollama(model="llama3.2:1b", request_timeout=360.0)
query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        circular_tool,
        regulation_tool,
        other_tool,
    ],
    verbose=True
)

In [30]:
# Querying using the Query Router
user_input = "What is the procedure for sampling on a level land?"
start_time = time.time()  
response = query_engine.query(user_input)
latency = time.time() - start_time
print(response)

[1;3;38;5;200mSelecting query engine 2: Youth and Sports, Government of India, Ministry of Human Resource Development Letter No. (A)(VIII) of 2007 dated 21st November 2007 as per clause (iii) of Part-II, sub-clause (g), Circular on Guidelines for Conducting Field Experiments in Educational Institutions..
[0mfile_path: c:\Users\amulya\OneDrive\Desktop\ML Practice\Silanka Chapter\Development\LlamaIndex\data\Others\1958_20No_2002.o.txt

On a level land, the sampling procedure typically involves dividing the land into smaller sections or plots for assessment and evaluation. The process may include:

1. Planning: Identifying the specific areas to be sampled based on factors such as soil type, slope, and topography.
2. Selection of sampling units: Determining the number and size of plots or sections to be sampled, ensuring that they are representative of the land's characteristics.
3. Marking boundaries: Establishing clear boundaries for each sampling unit to prevent confusion and ensure a

In [38]:
# Building Agent
llm = Ollama(model="llama3.2:1b", request_timeout=360.0)
agent_worker = FunctionCallingAgentWorker.from_tools(
    [circular_tool, regulation_tool, other_tool,], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [39]:
# Querying using Agent
start_time = time.time()  
response = agent.chat(user_input)
latency = time.time() - start_time
print(response)

Added user message to memory: What is the procedure for sampling on a level land?
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "procedures for sampling on level land"}
=== Function Output ===
For sample sampling on level land, the Board may consider the following procedures:

1. **Visual inspection**: The Board can visually inspect the land to determine if it is suitable for sampling. If the land is uneven or has any obstructions that could affect the accuracy of the sample, the Board may not allow the land for sampling.

2. **Elevation data collection**: If the land is already known to be level, the Board might decide to collect elevation data from nearby points on the land. This can help identify areas that are likely to have variable elevations and avoid sampling those areas.

3. **Geographic information system (GIS)**: The Board could use a GIS software to create a digital representation of the land and its topography. This can help identify pat

In [40]:
# Log the latency and response quality
mlflow.log_metric("response_latency", latency)


In [41]:
# End the run
mlflow.end_run()

2024/10/22 16:21:45 INFO mlflow.tracking._tracking_service.client: 🏃 View run LlamaIndex - Seventh experiment. at: https://dagshub.com/Omdena/SriLankaChapter_RegulatoryDecisionMaking.mlflow/#/experiments/0/runs/d6d4d624399b46edb229c2fe4f65fa02.
2024/10/22 16:21:45 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://dagshub.com/Omdena/SriLankaChapter_RegulatoryDecisionMaking.mlflow/#/experiments/0.
