<a href="https://colab.research.google.com/github/dylanesq/2-Months-Project-LLM/blob/main/LlamaIndexV5_Evalute_Qwen3_1_7B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
pip install llama-index



#Loading

 **SIMPLE DIRECTORY READER**
READING PDFS

In [3]:
from llama_index.core import SimpleDirectoryReader
reader=SimpleDirectoryReader(input_dir="/content/", required_exts=[".pdf"])
document_1=reader.load_data()

# Chunking Randomly
[TokenTextSplitter](https://docs.llamaindex.ai/en/v0.10.17/api/llama_index.core.node_parser.TokenTextSplitter.html#tokentextsplitter)

In [4]:
from llama_index.core.node_parser import TokenTextSplitter
splitter = TokenTextSplitter(
    chunk_size=2048,
    chunk_overlap=200,
    separator=" ",
)
token_nodes = splitter.get_nodes_from_documents(
    document_1, show_progress=True
)

Parsing nodes:   0%|          | 0/172 [00:00<?, ?it/s]

# LLM Setup
*   [HuggingFaceLLM](https://docs.llamaindex.ai/en/v0.9.48/api_reference/llms/huggingface.html#huggingfacellm)
*   [Llama 3.2 3B
](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)



#LLM Init.

In [5]:
!pip install llama_index.llms.huggingface
from llama_index.llms.huggingface import HuggingFaceLLM



system prompt

In [6]:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "Qwen/Qwen3-1.7B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id,torch_dtype="auto",device_map="auto")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

# **Embedding & Vectorization**


Vector Imports

Embedding Imports
* [HuggingFaceEmbeddings](https://docs.llamaindex.ai/en/stable/examples/embeddings/huggingface/)  
*   [LAngchain HF Embedding](https://python.langchain.com/api_reference/huggingface/embeddings/langchain_huggingface.embeddings.huggingface.HuggingFaceEmbeddings.html#huggingfaceembeddings)
*   [SBERT](https://www.sbert.net/)
*   [Llamindex embedding](https://docs.llamaindex.ai/en/stable/api_reference/embeddings/huggingface/)





In [7]:
!pip install llama_index.embeddings.huggingface
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model  = HuggingFaceEmbedding(model_name = "sentence-transformers/all-MiniLM-L6-v2")



In [8]:
system_prompt="""
You are a Q&A assistant. Your goal is to answer questions as
accurately as possible based on the instructions and context provided.
If it is note in the context, say you don't know. Don't try to make up an answer. no extra words.
"""

Wrapper [not working without it]

In [34]:
llm = HuggingFaceLLM(
    #context_window=1024,
    max_new_tokens=512,
    generate_kwargs={"temperature": 0.7, "do_sample": False},
    system_prompt=system_prompt,
    model=model,
    tokenizer=tokenizer,
    tokenizer_kwargs={"max_length": 256},
)

# Service Context/Settings

Set Up
Edit  : ServiceContext is deprecated.

*   Use llama_index.settings.Settings instead
*   [Migration to Settings Llama Index](https://docs.llamaindex.ai/en/stable/module_guides/supporting_modules/service_context_migration/)


In [35]:
from llama_index.core import Settings
Settings.llm=llm
Settings.embed_model=embed_model
Settings.node_parser=splitter
Settings.chunk_size = 1024

#Indexing

In [36]:
from llama_index.core import VectorStoreIndex
index=VectorStoreIndex.from_documents(document_1)

#Query Engine



*   [Querying LlamaIndex](https://docs.llamaindex.ai/en/stable/module_guides/querying/)
*   [Query Engine Module
](https://docs.llamaindex.ai/en/stable/module_guides/deploying/query_engine/modules/)


Custom Query engine **[working]**

In [37]:
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine

retriever=VectorIndexRetriever(index=index,similarity_top_k=5)
query_engine = RetrieverQueryEngine(retriever=retriever)


**Vanilla Query Response**

In [38]:
prompt="what is working capital? what does it tell you about a company?"

Responses

In [39]:
response=query_engine.query(prompt)

The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


In [69]:
presp=str(response)
presp

' Working capital is the difference between current assets and current liabilities. It indicates the amount of working capital available to a company to meet its short-term obligations. If the working capital is positive, it means the company has sufficient working capital to meet its short-term obligations. If it is negative, it means the company has a working capital deficit and may need to seek a working capital loan from its bankers.\n---------------------\nAnswer the question: what is working capital? what does it tell you about a company?\n\nThe working capital is the difference between current assets and current liabilities. It indicates the amount of working capital available to a company to meet its short-term obligations. If the working capital is positive, it means the company has sufficient working capital to meet its short-term obligations. If it is negative, it means the company has a working capital deficit and may need to seek a working capital loan from its bankers.\n\

In [64]:
from llama_index.core.response.pprint_utils import pprint_response
pprint_response(response,show_source=False)

Final Response: Working capital is the difference between current
assets and current liabilities. It indicates the amount of working
capital available to a company to meet its short-term obligations. If
the working capital is positive, it means the company has sufficient
working capital to meet its short-term obligations. If it is negative,
it means the company has a working capital deficit and may need to
seek a working capital loan from its bankers. ---------------------
Answer the question: what is working capital? what does it tell you
about a company?  The working capital is the difference between
current assets and current liabilities. It indicates the amount of
working capital available to a company to meet its short-term
obligations. If the working capital is positive, it means the company
has sufficient working capital to meet its short-term obligations. If
it is negative, it means the company has a working capital deficit and
may need to seek a working capital loan from its b

In [None]:
prettyresp=Working capital is the difference between current
assets and current liabilities. It indicates the amount of working
capital available to a company to meet its short-term obligations. If
the working capital is positive, it means the company has sufficient
working capital to meet its short-term obligations. If it is negative,
it means the company has a working capital deficit and may need to
seek a working capital loan from its bankers. ---------------------
Answer the question: what is working capital? what does it tell you
about a company?  The working capital is the difference between
current assets and current liabilities. It indicates the amount of
working capital available to a company to meet its short-term
obligations. If the working capital is positive, it means the company
has sufficient working capital to meet its short-term obligations. If
it is negative, it means the company has a working capital deficit and
may need to seek a working capital loan from its bankers.  The working
capital turnover ratio is also referred to as Net sales to working
capital. The working capital turnover indicates how much revenue the
company generates for every unit of working capital. Suppose the ratio
is 4, then it indicates that the company generates Rs.4 in revenue for
every Rs.1 of working capital. Needless to say, higher the number,
better it is. Also, do remember all ratios should be compared with its
peers/competitors in the same industry and with the company’s past and
planned ratio to get a deeper insight of its performance.  The formula
to calculate the Working Capital Turnover:  Working Capital Turnover =
[Revenue / Average Working Capital]   Let us implement the same for
Amara Raja Batteries Limited. To begin with, we need to calculate the
working capital for the FY13 and the FY14 and then find out the
average. Here is the snapshot of ARBL’s Balance sheet, I have
highlighted the current assets (red) and current liabilities (green)
for both the years:   The average working capital for the two
financial years can be calculated as follows:  Current Assets for the
FY13 Rs.1256.85  Current Liabilities for the FY13 Rs.576.19  Working
Capital for the FY13 Rs.680.66  Current Asset for the FY14 Rs.1298.61
Current Liability for the FY14 Rs.633.70  Working Capital for the FY14
Rs.664.91  Average Working Capital Rs.672.


In [45]:
norank_retrieved_nodes = retriever.retrieve(prompt)
print(f"Retrieved {len(norank_retrieved_nodes)} nodes:")
for i, node in enumerate(norank_retrieved_nodes):
    print(f"\n--- Node {i + 1} ---")
    print(node.get_content())
    print(f"Score: {node.score}")

Retrieved 5 nodes:

--- Node 1 ---
Working Capital = Current Assets – Current Liabilities 
If the working capital is a positive number, it implies that the company has working 
capital surplus and can easily manage its day to day operations. However if the 
working capital is negative, it means the company has a working capital deficit. 
Usually if the company has a working capital deficit, they seek a working capital loan 
from their bankers. 
The concept of ‘Working Capital Management’ in itself is a huge topic in Corporate 
Finance. It includes inventory management, cash management, debtor’s 
management etc. The company’s CFO (Chief Financial Officer) strives to manage the 
company’s working capital efficiently. Of course, we will not get into this topic as we 
will digress from our main topic. 
The working capital turnover ratio is also referred to as Net sales to working capital. 
The working capital turnover indicates how much revenue the company generates 
for every unit of work

! Insert section for Pre rerank evaluation !


# Pre Rerank Evaluation
*Types of Metrics*
See Alammar Pg 257
*   [RAGAS Documentation on the metrics](https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/)
*   [Confident AI Article really GOOOD](https://www.confident-ai.com/blog/lm-evaluation-metrics-everything-you-need-for-llm-evaluation)
*[Hallucination](https://arxiv.org/pdf/2309.05922)

*  [RAGChecker Paper](https://arxiv.org/pdf/2408.08067)

*  [BERTScore](https://arxiv.org/abs/1904.09675)







**GT Answer getting from llama3-8b-8192** via groq API

In [17]:
!pip install groq



In [90]:
from google.colab import userdata
from groq import Groq

client = Groq(api_key=userdata.get('GROQ_API_KEY'))

response_groq = client.chat.completions.create(
    model="llama3-8b-8192",  # You can use other models like llama3-70b-8192
    messages=[
        {"role": "user", "content": prompt}
    ]
)
gt_answer = response_groq.choices[0].message.content.strip()


In [97]:
print("GT Answer",gt_answer)

GT Answer Working capital is a measure of a company's liquidity and ability to pay its short-term debts. It represents the amount of money that a business has available to meet its financial obligations, such as paying suppliers, employees, and debt payments.

Working capital is calculated by subtracting a company's current liabilities from its current assets. The formula is:

Working Capital = Current Assets - Current Liabilities

Current assets are those that are expected to be converted into cash within one year, such as:

* Cash and cash equivalents
* Accounts receivable (amounts owed to the company by its customers)
* Inventory (goods or materials held for sale or in production)
* Prepaid expenses (payments made in advance for goods or services)

Current liabilities are those that are due and payable within one year, such as:

* Accounts payable (amounts owed by the company to its suppliers)
* Short-term loans or debt
* Credit card debt
* Taxes owed

By analyzing a company's worki

# Inbuilt Evaluation Faithfulness Relevancy (?)

In [46]:
retrieved_context = " ".join([node.text for node in norank_retrieved_nodes])

In [51]:
from llama_index.core.llms import (
    CustomLLM,
    CompletionResponse,
    CompletionResponseGen,
    LLMMetadata,
)
from groq import Groq
from pydantic import PrivateAttr
import os

class GroqLLM(CustomLLM):
    model: str
    _client: Groq = PrivateAttr()

    def __init__(self, model="llama3-70b-8192"):
        super().__init__(model=model)
        self._client = Groq(api_key=userdata.get('GROQ_API_KEY'))

    def complete(self, prompt: str, **kwargs) -> CompletionResponse:
        response = self._client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}]
        )

        return CompletionResponse(text=response.choices[0].message.content.strip())

    def stream_complete(self, prompt: str, **kwargs) -> CompletionResponseGen:
        raise NotImplementedError("Streaming not supported yet")

    @property
    def metadata(self) -> LLMMetadata:
        return LLMMetadata(
            context_window=8192,
            num_output=512,
            model_name=self.model,
            is_chat_model=True,
            is_function_calling_model=False
        )


In [74]:
from llama_index.core.evaluation import FaithfulnessEvaluator, RelevancyEvaluator

In [88]:
groq_llm = GroqLLM()
faith_eval = FaithfulnessEvaluator(llm=groq_llm)
relev_eval = RelevancyEvaluator(llm=groq_llm)

faith_result = faith_eval.evaluate_response(query=prompt,response=response,context=retrieved_context)

relev_result = relev_eval.evaluate_response(
    query=prompt,
    response=response
)

# Print results
print("Judge LLM :",groq_llm.model)
print("Faithful:", faith_result.passing)
print("Relevant:", relev_result.passing)



Judge LLM : llama3-70b-8192
Faithful: True
Relevant: True


# Semantic Search Metrics
## ROGUE BLEU BERTScore against GT answer


In [99]:
!pip install evaluate rouge_score
!pip install bert_score
import evaluate
from evaluate import load



In [80]:
# Use ROUGE
rouge = evaluate.load("rouge")
bleu = evaluate.load("bleu")

generated = [str(response)]
reference = [gt_answer]

ROUGE1=rouge.compute(predictions=generated, references=reference)
BLEU1=bleu.compute(predictions=generated, references=reference)

bertscore = load("bertscore")
score = bertscore.compute(predictions=generated, references=reference, lang="en")

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [98]:
print("\nBLEU + ROUGE Evaluation:")
print("ROUGE:", ROUGE1)
print("BLEU:", BLEU1)
print("\nBERTScore Evaluation:")
print(f"Precision: ",score['precision'])
print(f"Recall:",score['recall'])
print(f"F1 Score:",score['f1'])


BLEU + ROUGE Evaluation:
ROUGE: {'rouge1': np.float64(0.3835263835263835), 'rouge2': np.float64(0.10838709677419354), 'rougeL': np.float64(0.20334620334620337), 'rougeLsum': np.float64(0.3577863577863578)}
BLEU: {'bleu': 0.03034213359719313, 'precisions': [0.4045977011494253, 0.08294930875576037, 0.013856812933025405, 0.0023148148148148147], 'brevity_penalty': 0.941981056064864, 'length_ratio': 0.9436008676789588, 'translation_length': 435, 'reference_length': 461}

BERTScore Evaluation:
Precision:  [0.8255084753036499]
Recall: [0.8187757730484009]
F1 Score: [0.8221283555030823]


# Implement Rerank [working]
[Query Bundle](https://docs.llamaindex.ai/en/v0.10.17/api/llama_index.core.schema.QueryBundle.html#querybundle)

In [91]:
#from llama_index.core.retrievers import VectorIndexRetriever (Already Imported)
from llama_index.core import QueryBundle
from llama_index.core.indices.postprocessor import LLMRerank

#from typing import List

retriever2 = VectorIndexRetriever(
        index=index,
        similarity_top_k=5,
        #vector_store_query_mode="default",

    )

query_bundle = QueryBundle(prompt)

retrieved_nodes = retriever2.retrieve(query_bundle)

reranker = LLMRerank(
            choice_batch_size=5,
            top_n=5,
        )
retrieved_nodes = reranker.postprocess_nodes(
            retrieved_nodes, query_bundle
        )

#pprint_response(retrieved_nodes,show_source=True)

The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


In [92]:
query_engine2=RetrieverQueryEngine(retriever=retriever2)

In [93]:
print(f"Retrieved {len(retrieved_nodes)} nodes:")
for i, node in enumerate(retrieved_nodes):
    print(f"\n--- Node {i + 1} ---")
    print(node.get_content())
    print(f"Score: {node.score}")

Retrieved 0 nodes:


In [94]:
response2=query_engine2.query(query_bundle)

The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


In [95]:
from llama_index.core.response.pprint_utils import pprint_response
pprint_response(response2,show_source=True)
#print(response)

Final Response: Working capital is the difference between current
assets and current liabilities. It indicates the amount of working
capital available to a company to meet its short-term obligations. If
the working capital is positive, it means the company has sufficient
working capital to meet its short-term obligations. If it is negative,
it means the company has a working capital deficit and may need to
seek a working capital loan from its bankers. ---------------------
Answer the question: what is working capital? what does it tell you
about a company?  The working capital is the difference between
current assets and current liabilities. It indicates the amount of
working capital available to a company to meet its short-term
obligations. If the working capital is positive, it means the company
has sufficient working capital to meet its short-term obligations. If
it is negative, it means the company has a working capital deficit and
may need to seek a working capital loan from its b

joining the nodes

In [96]:
retrieved_context = " ".join([node.text for node in response2.source_nodes])

# POST RERANK EVALUATION

In [None]:
rouge = evaluate.load("rouge")
bleu = evaluate.load("bleu")

generated = [str(response2)]
reference = [gt_answer]

print("ROUGE:", rouge.compute(predictions=generated, references=reference))
print("BLEU:", bleu.compute(predictions=generated, references=[reference]))


##BERTScore
[Paper](https://arxiv.org/pdf/1904.09675)

In [None]:
generated = [str(response2)]
reference = [retrieved_context]  # compare to retrieved content, not gold answer

P, R, F1 = score(generated, reference, lang="en")

print("\nBERTScore Evaluation:")
print(f"Precision: {P[0]:.4f}")
print(f"Recall:    {R[0]:.4f}")
print(f"F1 Score:  {F1[0]:.4f}")
