# Import of the packages

In [4]:
# Used to build the RAG pipeline and manage indexing, retrieval, and query processing
!pip install llama-index
# Enables the use of Hugging Face embedding models within the LlamaIndex framework
!pip install llama-index-embeddings-huggingface
# Allows fine-tuning and inference of language models using lightweight adapter techniques
!pip install peft
# Provides support for loading and running quantized (4-bit) language models efficiently
!pip install auto-gptq
# Offers optimized transformer model runtimes for improved inference performance
!pip install optimum
# Enables low-bit (8-bit/4-bit) matrix operations for memory-efficient LLM usage
!pip install bitsandbytes



In [5]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# Loads a Hugging Face embedding model to convert text chunks into dense vector representations
from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex
# Settings manages global config, SimpleDirectoryReader loads documents from a folder, and VectorStoreIndex builds a vector index from them
from llama_index.core.retrievers import VectorIndexRetriever
# Retrieves the most relevant chunks from the vector index based on a query embedding
from llama_index.core.query_engine import RetrieverQueryEngine
# Combines a retriever and a language model to create a RAG pipeline that answers questions
from llama_index.core.postprocessor import SimilarityPostprocessor
# Filters or ranks retrieved documents based on their similarity score to the query

# Settings

In [6]:
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
# Sets the embedding model used to convert text into vectors for indexing and retrieval
Settings.llm = None
# Disables the default LLM
Settings.chunk_size = 256
# Defines the number of tokens per text chunk when splitting
Settings.chunk_overlap = 25
# Specifies how many tokens should overlap between consecutive chunks

LLM is explicitly disabled. Using MockLLM.


# Article into Vector DB

In [7]:
# Import Colab file upload module and the os module for file operations
from google.colab import files
import os

# Create a folder named "articles" if it doesn't already exist
os.makedirs("articles", exist_ok=True)

# Manually upload the file on "LLM as a judge"
uploaded = files.upload()

# Move each file into the "articles" folder
for filename in uploaded.keys():
    os.rename(filename, os.path.join("articles", filename))

Saving LLM.pdf to LLM.pdf


In [8]:
# Load all documents from the "articles" folder into a list of Document objects
documents = SimpleDirectoryReader("articles").load_data()

In [9]:
# Store into vector DB
index = VectorStoreIndex.from_documents(documents)

# Function Search

In [10]:
# Specified the number of docs to retreive
top_k = 3

# Retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=top_k,
)

In [11]:
# Builds a query engine that uses the retriever to find relevant chunks and filters them based on a minimum similarity score of 0.5
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.5)],
)

# Retrieve the best docs

In [12]:
# query documents
query = "What are the two main challenges that hinder the widespread application of the 'LLM-as-a-Judge' approach?"
response = query_engine.query(query)

In [13]:
# Reformats the response by extracting the top_k relevant source nodes and adding their text to build a context for the answer
context = "Context:\n"
for i in range(top_k):
    context = context + response.source_nodes[i].text + "\n\n"
# Show the context to see if it's working as we want
print(context)

Context:
As a solution to this persistent dilemma,
“LLM-as-a-Judge” has emerged as a promising idea to combine the strengths of the above two
evaluation methods. Recent studies have shown that this idea can merges the scalability of automatic
methods with the detailed, context-sensitive reasoning found in expert judgments [19, 81, 163, 210,
220]. Moreover, LLMs may become sufficiently flexible to handle multimodal inputs [ 18] under
appropriate prompt learning or fine-tuning [64]. These advantages suggest that the LLM-as-a-Judge
approach could serve as a novel and broadly applicable paradigm for addressing complex and
open-ended evaluation problems.
LLM-as-a-Judge holds significant potential as a scalable and adaptable evaluation framework
compared to aforementioned two traditional methods [160]. However, the widespread application
of this idea is hindered by two key challenges. The first challenge lies in the absence of a systematic
review, which highlights the lack of formal definiti

# Import the LLM we need to generate the response

In [14]:
from peft import PeftModel, PeftConfig
# 'peft' is a library for working with models that have been fine-tuned using techniques
# PeftModel allows you to load and apply fine-tuning methods, and PeftConfig manages the configuration for fine-tuning
from transformers import AutoModelForCausalLM, AutoTokenizer
# 'transformers' is a library which simplifies working with pre-trained language models
# AutoModelForCausalLM is used to load a model that generates text
# AutoTokenizer helps in converting text into tokens that the model can understand

# Load a pre-trained language model from Hugging Face Model Hub
model_name = "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ"
# model_name specifies the specific pre-trained model to load (here, a version of Mistral 7B fine-tuned)

# 'AutoModelForCausalLM.from_pretrained' loads this model and prepares it for generating text
model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto",  # Automatically assigns the model to the right device
                                             trust_remote_code=False,  # Ensures that only trusted code is run for security
                                             revision="main")  # Loads the "main" version of the model (latest one)

# Load the fine-tuning configuration
config = PeftConfig.from_pretrained("shawhin/shawgpt-ft")
# PeftConfig helps load the fine-tuning configuration from the given model. "shawhin/shawgpt-ft" is a specific fine-tuned version

# Apply fine-tuning to the model using the configuration
model = PeftModel.from_pretrained(model, "shawhin/shawgpt-ft")
# This applies the fine-tuned changes from the "shawhin/shawgpt-ft" version to the pre-trained model

# Load the tokenizer for the model (helps convert text into a format the model can process)
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
# AutoTokenizer loads a tokenizer that converts regular text into tokens (pieces of words or symbols)
# 'use_fast=True' makes the tokenizer work faster by using optimized methods

config.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]

  @custom_fwd
  @custom_bwd
  @custom_fwd(cast_inputs=torch.float16)
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/4.16G [00:00<?, ?B/s]

`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
Some weights of the model checkpoint at TheBloke/Mistral-7B-Instruct-v0.2-GPTQ were not used when initializing MistralForCausalLM: ['model.layers.0.mlp.down_proj.bias', 'model.layers.0.mlp.gate_proj.bias', 'model.layers.0.mlp.up_proj.bias', 'model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.1.mlp.down_proj.bias', 'model.layers.1.mlp.gate_proj.bias', 'model.layers.1.mlp.up_proj.bias', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.10.mlp.down_proj.bias', 'model.layers.10.mlp.gate_proj.bias', 'model.layers.10.mlp.up_proj.bias', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.q_pr

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/8.40M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

# Use the LLM with instructions

In [15]:
# Instructions for an AI analyzing scientific articles
intstructions_string = f"""You are an AI system designed to analyze and interpret scientific articles. Your responses should focus on providing clear, accurate explanations of scientific concepts and findings, adjusting the level of detail based on the complexity of the article. If necessary, you can explain technical terms in simple language to ensure accessibility while maintaining scientific rigor. Provide insights, summarize key points, and clarify any ambiguities in the article.

Please respond to the following comment.
"""
prompt_template = lambda comment: f'''[INST] {intstructions_string} \n{comment} \n[/INST]'''


In [17]:
# Comment input
comment = "What are the two main challenges that hinder the widespread application of the 'LLM-as-a-Judge' approach?"

# Prompt generation to see what we send to the LLM
prompt = prompt_template(comment)
print(prompt)

[INST] You are an AI system designed to analyze and interpret scientific articles. Your responses should focus on providing clear, accurate explanations of scientific concepts and findings, adjusting the level of detail based on the complexity of the article. If necessary, you can explain technical terms in simple language to ensure accessibility while maintaining scientific rigor. Provide insights, summarize key points, and clarify any ambiguities in the article.

Please respond to the following comment.
 
What are the two main challenges that hinder the widespread application of the 'LLM-as-a-Judge' approach? 
[/INST]


In [19]:
# Switch the model to evaluation mode
model.eval()

# The tokenizer converts the input text (the prompt) into a series of numbers called 'input_ids'
# These numbers represent parts of words in a form the model can understand
inputs = tokenizer(prompt, return_tensors="pt")

# Generate text based on the input tokens, sending the input IDs to the GPU for faster processing using cuda
# and limit the output to a maximum of 280 new tokens
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=280)

# Decode the generated token IDs back into human-readable text using the tokenizer to read them
print(tokenizer.batch_decode(outputs)[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] You are an AI system designed to analyze and interpret scientific articles. Your responses should focus on providing clear, accurate explanations of scientific concepts and findings, adjusting the level of detail based on the complexity of the article. If necessary, you can explain technical terms in simple language to ensure accessibility while maintaining scientific rigor. Provide insights, summarize key points, and clarify any ambiguities in the article.

Please respond to the following comment.
 
What are the two main challenges that hinder the widespread application of the 'LLM-as-a-Judge' approach? 
[/INST] The commenter is referring to the "LLM-as-a-Judge" approach, which is a method for using large language models (LLMs) to make legal judgments. The two main challenges that hinder the widespread application of this approach are:

1. Lack of transparency and explainability: LLMs make decisions based on complex patterns and relationships in data, which can be difficult