# RAG system evaluation

In this notebook we will explore how out RAG system works by changing its parameters:

1. `top_k_retrieve`
2. `top_k_rank`

We will measure:
1. `responce time`
2. `result quality`

And will try to accomodate a new approach [LLM-as-a-Judge](https://huggingface.co/learn/cookbook/en/llm_judge) with the [Ragas](https://docs.ragas.io/en/stable/) for evaluation.

We will use LLMs from OpenAI and from [Nebius AI Studio API](https://studio.nebius.ai/).

Demo usage of Nebius AI is [here](https://github.com/2001092236/LLM-engineering/blob/main/week_1/LLM_engineering_Nebius_AI_Studio_Demo.ipynb).

For more detailed desctiption, see my [github](https://github.com/2001092236/LLM-engineering/blob/main/week_2/9.%20Practice%20RAG%20project.md).

The process:

1. Create set of $N=10$ questions about HF internals, save them into `.csv`
2. Measuring the relevance will be done with the [Ragas](https://docs.ragas.io/en/stable/).

## 1. Sep up functions and libraries

In [None]:
# query: str - the text query 
# chat_id: Optional[str] - chat id to maintain conversation history
# model: str = "gpt-4o-mini" - name of an LLM
# use_reranker: bool = True - whether to rerank after retreival
# top_k_retrieve: int = 20 - num of items to retrieve
# top_k_rank: int = 4 - num of items to return after reranking
# max_out_tokens: int = 512 - max num of tokens. Used to drop retrieved pieces 
#     of content if total prompt length goes beyond max contex length.

In [139]:
import httpx
from collections import defaultdict

class RagChatUser:
    
    def __init__(self, user_data: dict) -> None:

        # user_data = defaultdict(lambda: None)(user_data)
        all_fields = (
            'api_key',
            'user_name',
            'password',
        )
        # Set all fields from user data
        
        for k in all_fields:
            setattr(self, k, None)

        for k, v in user_data.items():
            setattr(self, k, v)

        if self.api_key is not None:
            return
                

        username, password = user_data['username'], user_data['password']
        with httpx.Client() as client:
            try:
                response = client.post(
                    f"http://0.0.0.0:8001/signup/",
                    data={
                        # 'grant_type': 'password',
                        "username": username,
                        "password": password,
                        # 'scope': '',                # Optional, can be left empty
                        # 'client_id': 'string',      # Replace with actual client ID
                        # 'client_secret': 'string'   # Replace with actual client secret
                    },
                    headers={'accept': 'application/json',
                                'Content-Type': 'application/x-www-form-urlencoded'},
                    timeout=httpx.Timeout(60.0),
                )
            except httpx.ConnectError:
                raise Exception("Failed to connect to chat service")

        if response.status_code != 200:
            print("Error: filed to create a new user. Try use other username")
            raise Exception(response.json()["detail"])
        
        print(f"{response.json()['message']}")

        self.api_key = response.json()['api_key']
        print(response.json())


        async def ask_rag(self,
                  query: str,
                  chat_id: str | None = None,
                  use_reranker: bool = False, 
                  top_k_retrieve: int = 20, 
                  top_k_rank: int = 3, 
                  model: str = 'gpt-4o-mini',
                  ) -> str:
            if len(query) == 0:
                raise Exception("The request is empty, please type something in")

            async with httpx.AsyncClient() as client:
                try:
                    response = await client.post(
                        "http://0.0.0.0:8001/rag/",
                        json={"query": query,
                              "chat_id": chat_id,
                              "model": model,
                              "use_reranker": use_reranker, 
                              "top_k_retrieve": top_k_retrieve, 
                              "top_k_rank": top_k_rank,
                              "max_out_tokens": 512},
                        
                        timeout=httpx.Timeout(60.0),
                        headers={"Authorization": self.api_key,
                                 'Content-Type': 'application/json',
                                 'accept': 'application/json',
                                 },
                    )
                except httpx.ConnectError:
                    raise Exception("Failed to connect to RAG service")

            if response.status_code != 200:
                raise Exception(response.json()["detail"])
            

        return resp_message

    def ask_rag(self,
                  query: str,
                  chat_id: str | None = None,
                  use_reranker: bool = False, 
                  top_k_retrieve: int = 20, 
                  top_k_rank: int = 3, 
                  model: str = 'gpt-4o-mini',
                  ) -> tuple[str, float]:
        
        if len(query) == 0:
            raise Exception("The request is empty, please type something in")

        with httpx.Client() as client:
            try:
                response = client.post(
                    "http://0.0.0.0:8001/rag/",
                    json={"query": query,
                            "chat_id": chat_id,
                            "model": model,
                            "use_reranker": use_reranker, 
                            "top_k_retrieve": top_k_retrieve, 
                            "top_k_rank": top_k_rank,
                            "max_out_tokens": 512},
                    
                    timeout=httpx.Timeout(60.0),
                    headers={"Authorization": self.api_key,
                                'Content-Type': 'application/json',
                                'accept': 'application/json',
                                },
                )
            except httpx.ConnectError:
                raise Exception("Failed to connect to RAG service")

        if response.status_code != 200:
            raise Exception(response.json()["detail"])
        

        return response.json()['response'], response.elapsed.total_seconds()

    async def async_ask_rag(self,
                  query: str,
                  chat_id: str | None = None,
                  use_reranker: bool = False, 
                  top_k_retrieve: int = 20, 
                  top_k_rank: int = 3, 
                  model: str = 'gpt-4o-mini',
                  ) -> tuple[str, float]:
        
        if len(query) == 0:
            raise Exception("The request is empty, please type something in")

        async with httpx.AsyncClient() as client:
            try:
                response = await client.post(
                    "http://0.0.0.0:8001/rag/",
                    json={"query": query,
                            "chat_id": chat_id,
                            "model": model,
                            "use_reranker": use_reranker, 
                            "top_k_retrieve": top_k_retrieve, 
                            "top_k_rank": top_k_rank,
                            "max_out_tokens": 512},
                    
                    timeout=httpx.Timeout(60.0),
                    headers={"Authorization": self.api_key,
                                'Content-Type': 'application/json',
                                'accept': 'application/json',
                                },
                )
            except httpx.ConnectError:
                raise Exception("Failed to connect to RAG service")

        if response.status_code != 200:
            raise Exception(response.json()["detail"])
        

        return response.json()['response'], response.elapsed.total_seconds()
    
    def ask_llm(self,
                        query: str,
                        model: str = 'gpt-4o-mini',
                        chat_id: str | None = None,
                        ) -> str:
        
        if len(query) == 0:
            raise Exception("The request is empty, please type something in")

        with httpx.Client() as client:
            try:
                response = client.post(
                    "http://0.0.0.0:8001/chat/",
                    json={"message": query,
                            "chat_id": chat_id,
                            "model": model,
                            },
                    
                    timeout=httpx.Timeout(60.0),
                    headers={"Authorization": self.api_key,
                                'Content-Type': 'application/json',
                                'accept': 'application/json',
                                },
                )
            except httpx.ConnectError:
                raise Exception("Failed to connect to RAG service")

        if response.status_code != 200:
            raise Exception(response.json()["detail"])
        

        return response.json()['response']

    async def async_ask_llm(self,
                        query: str,
                        model: str = 'gpt-4o-mini',
                        chat_id: str | None = None,
                        ) -> str:
        
        if len(query) == 0:
            raise Exception("The request is empty, please type something in")

        async with httpx.AsyncClient() as client:
            try:
                response = await client.post(
                    "http://0.0.0.0:8001/chat/",
                    json={"message": query,
                            "chat_id": chat_id,
                            "model": model,
                            },
                    
                    timeout=httpx.Timeout(60.0),
                    headers={"Authorization": self.api_key,
                                'Content-Type': 'application/json',
                                'accept': 'application/json',
                                },
                )
            except httpx.ConnectError:
                raise Exception("Failed to connect to RAG service")

        if response.status_code != 200:
            raise Exception(response.json()["detail"])
        

        return response.json()['response']

In [140]:
from openai import OpenAI, AsyncOpenAI
from collections import defaultdict
import requests
import json

chat_gpt_configs = {
    'proxyai': {'base_url': 'https://api.proxyapi.ru/openai/v1', 'api_key': 'sk-3IqehGd2A8iAldf9VgG2CgHyGQtO9qPH'},
    'nebius': {'base_url': 'https://api.studio.nebius.ai/v1/', 'api_key': 'eyJhbGciOiJIUzI1NiIsImtpZCI6IlV6SXJWd1h0dnprLVRvdzlLZWstc0M1akptWXBvX1VaVkxUZlpnMDRlOFUiLCJ0eXAiOiJKV1QifQ.eyJzdWIiOiJnb29nbGUtb2F1dGgyfDEwNjQ4NDg2MTk5OTM2MjczNjIzOSIsInNjb3BlIjoib3BlbmlkIG9mZmxpbmVfYWNjZXNzIiwiaXNzIjoiYXBpX2tleV9pc3N1ZXIiLCJhdWQiOlsiaHR0cHM6Ly9uZWJpdXMtaW5mZXJlbmNlLmV1LmF1dGgwLmNvbS9hcGkvdjIvIl0sImV4cCI6MTg4ODIyNjIzMCwidXVpZCI6IjlkNmE3NGU1LWM0MmItNGFlYy05ZGMyLTExZTY3NmRkZDZmMSIsIm5hbWUiOiJBSSZEVCBjb3VyY2UiLCJleHBpcmVzX2F0IjoiMjAyOS0xMS0wMVQxMToxNzoxMCswMDAwIn0.WAYpQB5orr96I9XVRlV2g-OWqDcfTCENCDNSKqE575Y'}
}


## 2. `top-k` and `top-k-retrieve` analysis

In [141]:
chat = RagChatUser({'api_key': 'token-2447608109062291213'})

In [142]:
res = chat.ask_llm(
    query="Give me a code to train HF model. 5 lines of code",
    model='llama-3-70b',
)

In [143]:
res

'Here\'s a simple example of training a Hugging Face (HF) model using the `Trainer` class:\n\n```python\nfrom transformers import AutoModelForSequenceClassification, AutoTokenizer\nfrom transformers import Trainer, TrainingArguments\nfrom datasets import load_dataset\n\n# Load dataset and model\ndataset = load_dataset("glue", "sst2")\nmodel = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")\ntokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")\n\n# Define training arguments and trainer\ntraining_args = TrainingArguments("test_trainer", num_train_epochs=3, per_device_train_batch_size=16, per_device_eval_batch_size=64, evaluation_strategy="epoch", learning_rate=5.0e-5)\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset["train"], eval_dataset=dataset["validation"], compute_metrics=lambda pred: {"accuracy": (pred.label_ids == (pred.label_ids > 0.5)).mean()})\n```\nThis code doesn\'t actually train the model - you would nee

In [144]:
res_rag = chat.ask_rag(
    query="Give me a code to train HF model. 5 lines of code",
    model='llama-3-70b',
)

In [145]:
res_rag

('Here is a code snippet to train a Hugging Face (HF) model based on the provided context:\n\n```python\nfrom transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n\nmodel = AutoModelForSeq2SeqLM.from_pretrained("model_checkpoint")\ntokenizer = AutoTokenizer.from_pretrained("model_checkpoint")\n\ntrain_dataset = tokenizer(train_text, return_tensors="pt", truncation=True, padding="max_length")\ntrainer = AutoTrainer(\n    model=model,\n    train_dataset=train_dataset,\n    dataset_text_field=\'text\',\n    max_seq_length=512,\n)\ntrainer.train()\n```',
 4.279853)

In [138]:
res

"To train a model using Hugging Face's `transformers` library, you can follow these steps. Below is a minimal example assuming you have a dataset ready and you're using a PyTorch model like BERT for a text classification task:\n\n```python\nfrom transformers import BertForSequenceClassification, Trainer, TrainingArguments\n\nmodel = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)\ntraining_args = TrainingArguments(output_dir='./results', num_train_epochs=3, per_device_train_batch_size=8)\ntrainer = Trainer(model=model, args=training_args, train_dataset=train_dataset)\n\ntrainer.train()\n```\n\nThis code assumes that you have a dataset `train_dataset` already loaded and preprocessed. Adjust `num_labels`, `num_train_epochs`, and batch size as needed for your specific task."

## 3. comparison of 3 LLMs

## 4. comparison of 3 LLMs

## 5. different embedding model

## 6. multy-turn conversation