In [46]:
%pip install spacy

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting spacy
  Downloading spacy-3.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (27 kB)
Collecting spacy-legacy<3.1.0,>=3.0.11 (from spacy)
  Using cached spacy_legacy-3.0.12-py2.py3-none-any.whl (29 kB)
Collecting spacy-loggers<2.0.0,>=1.0.0 (from spacy)
  Using cached spacy_loggers-1.0.5-py3-none-any.whl.metadata (23 kB)
Collecting murmurhash<1.1.0,>=0.28.0 (from spacy)
  Using cached murmurhash-1.0.10-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.0 kB)
Collecting cymem<2.1.0,>=2.0.2 (from spacy)
  Using cached cymem-2.0.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.4 kB)
Collecting preshed<3.1.0,>=3.0.2 (from spacy)
  Using cached preshed-3.0.9-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.2 kB)
Collecting thinc<8.3.0,>=8.2.2 (from spacy)
  Using cached thinc-8.2.2-cp310-cp310-manylinux_2_17_x86_64.m

### **Setup**

In [None]:
%pip install -r requirements.txt

In [None]:
%pip install wandb
%pip install openai
%pip install llama_index
%pip install python-dotenv


In [6]:
# Importing required libraries
import warnings
import os
import openai
from pathlib import Path
from dotenv import load_dotenv
from llama_index.llms import OpenAI
import wandb

# Configuring warnings and environmental variables
warnings.filterwarnings("ignore")
WANDB_PROJECT = "test_local_v2"

In [7]:
import pandas as pd

### **Read Documents**

In [None]:
# Loading the PDFReader from llama_index
from llama_index import VectorStoreIndex, download_loader

PDFReader = download_loader("PDFReader")
loader = PDFReader()
documents = loader.load_data(file=Path("./data/sample.pdf"))

### **Initialize W&B**

In [27]:
# Weights & Biases (W&B) is used for tracking experiments, visualizing data, and sharing insights. We initialize it here for our project.

from llama_index import ServiceContext
from llama_index.callbacks import CallbackManager, WandbCallbackHandler

wandb_args = {"project": WANDB_PROJECT, "name": "baseline-rag"}
wandb_callback = WandbCallbackHandler(run_args=wandb_args)
callback_manager = CallbackManager([wandb_callback])

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/codespace/.netrc
[34m[1mwandb[0m: Streaming LlamaIndex events to W&B at https://wandb.ai/ai4energysaas/test_local_v2/runs/s4sd39c9
[34m[1mwandb[0m: `WandbCallbackHandler` is currently in beta.
[34m[1mwandb[0m: Please report any issues to https://github.com/wandb/wandb/issues with the tag `llamaindex`.


### **LLM Model**

In [28]:
from llama_index import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    ServiceContext,
)
from llama_index.llms import LlamaCPP
from llama_index.llms.llama_utils import (
    messages_to_prompt,
    completion_to_prompt,
)

In [29]:
load_dotenv()
openai.api_key = os.getenv(
    "OPENAI_API_KEY"
)

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)

In [None]:
# Test if the model is working
response = llm.complete("Hello! Can you tell me a poem about cats and dogs?")
print(response.text)

### **Baseline RAG**

The ServiceContext in LLAMAIndex is used to manage the lifecycle of services like models and callbacks. We set it up with the required configurations.

In [None]:
%pip install transformers

In [None]:
service_context = ServiceContext.from_defaults( # create a new service context
    llm=llm, embed_model="local:BAAI/bge-small-en-v1.5", # set the language model and the embedding model
    callback_manager=callback_manager
)

In [None]:
# Setting up the ServiceContext with the language model and embedding model
embed_model = "local:BAAI/bge-small-en-v1.5"
service_context = ServiceContext.from_defaults(
    llm=llm, 
    embed_model=embed_model, 
    callback_manager=callback_manager
)

The VectorStore in LLAMAIndex is responsible for chunking, embedding, and storing document vectors. We create and configure it here.

In [35]:
# Creating the VectorStoreIndex for document handling
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

# Converting the index to a query engine for retrieval
query_engine = index.as_query_engine()

[34m[1mwandb[0m: Logged trace tree to W&B.


### **Query Engine**

In [36]:
# Defining a function to display responses
from llama_index.response.notebook_utils import display_response


def query_and_display(question):
    response = query_engine.query(question)
    display_response(response)

In [37]:
query_and_display("O que é tecnologia assistiva a pessoas com deficiência?")

[34m[1mwandb[0m: Logged trace tree to W&B.


**`Final Response:`** Tecnologia assistiva é uma área do conhecimento que promove a funcionalidade e a comunicação para pessoas com deficiência. Ela engloba recursos de baixa tecnologia, como pranchas de comunicação, e recursos de alta tecnologia, como acionadores, teclados expandidos e comunicadores. Esses recursos são destinados a usuários com comprometimentos motores e/ou intelectuais, transtornos invasivos do desenvolvimento e outras deficiências sensoriais. O objetivo da tecnologia assistiva é facilitar a inclusão nos diferentes níveis: familiar, educacional e social.

In [38]:
query_and_display("Como o orgão funciona?")

[34m[1mwandb[0m: Logged trace tree to W&B.


**`Final Response:`** O órgão funciona oferecendo recursos de baixa e alta tecnologia para usuários com comprometimentos motores e/ou intelectivos, transtornos invasivos do desenvolvimento e outras deficiências sensoriais. Ele atua proporcionando oportunidades para que as pessoas com deficiência demonstrem todas as suas possibilidades e conta com parceiros que interpretem e respondam às suas tentativas de interação e comunicação. O público-alvo consiste em crianças e adultos com defasagens em sua comunicação oral e/ou escrita, incluindo desordens neurológicas adquiridas por doença encéfalo-vascular, traumatismo crânio-encefálico e distúrbios do movimento.

In [39]:
query_and_display("Para quem serve?")

[34m[1mwandb[0m: Logged trace tree to W&B.


**`Final Response:`** O serviço de tecnologia assistiva serve para pessoas com deficiência, incluindo crianças e adultos com comprometimentos motores, intelectuais, transtornos invasivos do desenvolvimento e outras deficiências sensoriais. Também atende pessoas com desordens neurológicas adquiridas por doença encéfalo-vascular, traumatismo crânio-encefálico e distúrbios do movimento.

In [40]:
# Closing the W&B run after queries
wandb_callback.finish()



## **Evaluation**

We now move to the evaluation phase where we will assess the performance of our RAG setup using different metrics.

### **Generating Eval Questions**

To evaluate - we need questions. Let's be honest - we are lazy to write them by ourselves. So let's already available QuestionsGenerator inside llamaindex + GPT-3.5 Api to generate them for us. Alternatively you can use your local llm model.

In [41]:
# Importing necessary modules for evaluation
import copy
import random
import nest_asyncio
import pandas as pd
from llama_index.evaluation import (
    DatasetGenerator,
    RelevancyEvaluator,
    ResponseEvaluator,
    RetrieverEvaluator,
)

In [43]:
# Initialize W&B for evaluation
embed_model="local:BAAI/bge-small-en-v1.5"
wandb_args = {"project": WANDB_PROJECT, "name": "eval-questions-generation"}
wandb_callback = WandbCallbackHandler(run_args=wandb_args)
callback_manager = CallbackManager([wandb_callback])
llm_eval = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(
    llm=llm_eval, 
    embed_model=embed_model, 
    callback_manager=callback_manager
)

In [44]:
# Setting up the documents and generating questions for evaluation
random_documents = copy.deepcopy(documents)

# Shuffling the documents and selecting random 2 documents. Just to make the evaluation quicker
random.shuffle(random_documents)
random_documents = random_documents[:2]

In [47]:
# Generating questions from the documents for evaluation
data_generator = DatasetGenerator.from_documents(
    random_documents, service_context=service_context, num_questions_per_chunk=2
)

# Applying nest_asyncio to run async code in Jupyter
nest_asyncio.apply()
eval_questions = data_generator.generate_questions_from_nodes()

  return cls(
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.


Failed to log trace tree to W&B: list index out of range


[34m[1mwandb[0m: Logged trace tree to W&B.
  return QueryResponseDataset(queries=queries, responses=responses_dict)


In [48]:
eval_questions[:3]

['What is the purpose of the Assistive Technology project in the field of disability support?',
 'How can individuals with acquired neurological disorders benefit from the services provided by the Centers for People with Disabilities?',
 'What are the contact numbers for the Centro Municipal de Referência da Pessoa com Deficiência in Vila Isabel, Irajá, Santa Cruz, and Mato Alto?']

Ideally you want to save evaluation questions as an artifact in W&B. This way you can easily show them, share and re-use.

In [49]:
import wandb

In [50]:
# Persisting the questions to a CSV file using W&B, for further loading
# Create an artifact object
artifact = wandb.Artifact(name="eval-questions", type="text")

# Add the list of questions as a file to the artifact
with artifact.new_file("questions.txt", mode="w") as f:
    f.write("\n".join(eval_questions))

# Log the artifact to W&B
wandb.log_artifact(artifact)

<Artifact eval-questions>

In [None]:
# # Lookup the artifact
artifact = wandb.use_artifact("eval-questions:v0")

# # Get the file containing the list of questions
file = artifact.get_file("questions.txt")

# # Read the list of questions from the file
with file.open("r") as f:
    questions = f.read().split("\n")

# # Print the list of questions
print(questions)

In [53]:
wandb_callback.finish()



### **Evaluation on the validation set**

In [54]:
# Initialize W&B for response evaluation
wandb_args = {"project": WANDB_PROJECT, "name": "baseline-evaluation"}
wandb_callback = WandbCallbackHandler(run_args=wandb_args)
callback_manager = CallbackManager([wandb_callback])

[34m[1mwandb[0m: Streaming LlamaIndex events to W&B at https://wandb.ai/ai4energysaas/test_local_v2/runs/eg0eoweu
[34m[1mwandb[0m: `WandbCallbackHandler` is currently in beta.
[34m[1mwandb[0m: Please report any issues to https://github.com/wandb/wandb/issues with the tag `llamaindex`.


In [55]:
# Preparing the data for evaluation
question_df = pd.DataFrame(columns=["questions"], data=eval_questions)
question_df.head()

Unnamed: 0,questions
0,What is the purpose of the Assistive Technolog...
1,How can individuals with acquired neurological...
2,What are the contact numbers for the Centro Mu...
3,What documents are required for registration a...


In [56]:
# Setup for evaluating the responses
llm_eval = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context_eval = ServiceContext.from_defaults(
    llm=llm_eval, 
    callback_manager=callback_manager
)

In [None]:
# Running the evaluation using BatchEvalRunner
from llama_index.evaluation import (
    BatchEvalRunner,
    FaithfulnessEvaluator,
    RelevancyEvaluator,
)

faithfulness_evaluator = FaithfulnessEvaluator(service_context=service_context_eval)
relevancy_evaluator = RelevancyEvaluator(service_context=service_context_eval)
runner = BatchEvalRunner(
    {"faithfulness": faithfulness_evaluator, "relevancy": relevancy_evaluator},
    workers=8,
)

In [None]:
import time

# Wait for 60 seconds to ensure the rate limit has reset
time.sleep(60)

# Retry the API call
eval_results = await runner.aevaluate_queries(index.as_query_engine(), queries=eval_questions)

In [61]:
from llama_index.evaluation import BatchEvalRunner
"""
If your OpenAI API license does not have limit to call the API, then you can use this code to fasten the process
"""
## Method 1
# faithfulness_gpt4 = FaithfulnessEvaluator(service_context=service_context_gpt4)
# relevancy_gpt4 = RelevancyEvaluator(service_context=service_context_gpt4)

# runner = BatchEvalRunner(
#     {"faithfulness": faithfulness_gpt4, "relevancy": relevancy_gpt4},
#     workers=8,
# )

# eval_results = await runner.aevaluate_queries(
#     index.as_query_engine(), queries=eval_questions[0:5]
# )

## method 2
## for normal function query 1 by 1
def evaluate_query_engine(evaluator, query_engine, questions):
  total_correct = 0
  all_results = []
  if isinstance(evaluator, FaithfulnessEvaluator):
    print("Use FaithfulnessEvaluator")
  elif isinstance(evaluator, RelevancyEvaluator):
    print("Use RelevancyEvaluator")

  for query in questions:
    print(f"Questions: {query}")
    response = query_engine.query(query)
    eval_result = 1 if "YES" in evaluator.evaluate_response(response=response).feedback else 0
    total_correct += eval_result
    all_results.append(eval_result)
    time.sleep(4)

  return total_correct, all_results

In [63]:
# eval for Faithfulness/hallucination
query_engine = index.as_query_engine()
faithfulness_evaluator = FaithfulnessEvaluator(service_context=service_context)
total_correct, all_results = evaluate_query_engine(faithfulness_evaluator, query_engine, eval_questions)
print(f"Faithfulness:  Scored {total_correct} out of {len(eval_questions)} questions correctly.")

Use FaithfulnessEvaluator
Questions: What is the purpose of the Assistive Technology project in the field of disability support?


[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.


Questions: How can individuals with acquired neurological disorders benefit from the services provided by the Centers for People with Disabilities?


[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.


Questions: What are the contact numbers for the Centro Municipal de Referência da Pessoa com Deficiência in Vila Isabel, Irajá, Santa Cruz, and Mato Alto?


[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.


Questions: What documents are required for registration at the Centro Municipal de Referência da Pessoa com Deficiência?


[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.


Faithfulness:  Scored 4 out of 4 questions correctly.


In [59]:
eval_results = await runner.aevaluate_queries(
    index.as_query_engine(), queries=eval_questions
)

Failed to log trace tree to W&B: list index out of range
Failed to log trace tree to W&B: list index out of range
Failed to log trace tree to W&B: list index out of range


[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.


Failed to log trace tree to W&B: list index out of range
Failed to log trace tree to W&B: list index out of range
Failed to log trace tree to W&B: list index out of range


RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-GRsbNU0IHqzP3btNPHiUYBbY on requests per min (RPM): Limit 3, Used 3, Requested 1. Please try again in 20s. Visit https://platform.openai.com/account/rate-limits to learn more. You can increase your rate limit by adding a payment method to your account at https://platform.openai.com/account/billing.', 'type': 'requests', 'param': None, 'code': 'rate_limit_exceeded'}}

Failed to log trace tree to W&B: list index out of range
Failed to log trace tree to W&B: list index out of range
Failed to log trace tree to W&B: list index out of range
Failed to log trace tree to W&B: list index out of range
Failed to log trace tree to W&B: list index out of range


So here is the thing, current integration of wandb and llamaindex is not perfect. So we will need to do some workarounds in order to propperly log our information. But, its fairly easy. We just need to use the weandb library itself.

In [None]:
# Make a dataframe from the results.
faithfulness_df = pd.DataFrame.from_records(
    [eval_result.dict() for eval_result in eval_results["faithfulness"]]
)
relevancy_df = pd.DataFrame.from_records(
    [eval_result.dict() for eval_result in eval_results["relevancy"]]
)
relevancy_df.head()

In [None]:
# save questions , faithfulness_df and relevancy_df to csv. Drop none columns from faithfulness_df and relevancy_df
question_df.to_csv("questions.csv", index=False)
faithfulness_df.dropna(axis=1).to_csv("faithfulness.csv", index=False)
relevancy_df.dropna(axis=1).to_csv("relevancy.csv", index=False)