# **Előkészületek**

*Az Open-Source AI Cookbook documentation alapján készült a backend kód:*

https://huggingface.co/learn/cookbook/rag_evaluation


In [None]:
!pip install -q torch transformers transformers langchain sentence-transformers tqdm openpyxl openai pandas datasets

In [None]:
%reload_ext autoreload
%autoreload 2

In [None]:
%pip install ragatouille



In [None]:
from tqdm.auto import tqdm
import pandas as pd
from typing import Optional, List, Tuple
import json
import datasets

#megelőzzük a Pandasban lévő hosszú szövegek levágását
pd.set_option("display.max_colwidth", None)

In [None]:
from huggingface_hub import notebook_login

#bejelentkezünk a HF HUB API tokennel
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

# **Adathalmaz betöltése**

### Wikipédiás fájlok betöltése

In [None]:
ds= datasets.load_dataset("wikipedia", "20220301.simple", split="train")

In [None]:
ds

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document as LangchainDocument

langchain_docs = [LangchainDocument(page_content=doc["text"], metadata={"source": doc["title"]}) for doc in tqdm(ds)]
# behelyezi a szöveget és a forrásokat a langchainDocument formátumú listába, itt átneveztem az oszlopot title-re

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=200,
    add_start_index=True,
    separators=["\n\n", "\n", ".", " ", ""],
)
#egy szegmentáló algoritmus mely a megadott paraméterek alapján vágja bizonyos részekre a szöveget
#majd ezeket akár külön szeparátorokkal kezeli és különböző részekre bontja


docs_processed = []
for doc in langchain_docs:
    docs_processed += text_splitter.split_documents([doc])

#miután bekalibráltuk a szegmentáló algoritmust, azután végighaladunk az általunk létrehozott dokumentum listán

### Alternatív betöltés - eredeti fájlok

In [None]:
ds = datasets.load_dataset("m-ric/huggingface_doc", split="train")
#betöltjük az adott adathalmazt

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document as LangchainDocument

langchain_docs = [LangchainDocument(page_content=doc["text"], metadata={"source": doc["source"]}) for doc in tqdm(ds)]
# behelyezi a szöveget és a forrásokat a langchainDocument formátumú listába

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=200,
    add_start_index=True,
    separators=["\n\n", "\n", ".", " ", ""],
)
#egy szegmentáló algoritmus mely a megadott paraméterek alapján vágja bizonyos részekre a szöveget
#majd ezeket akár külön szeparátorokkal kezeli és különböző részekre bontja


docs_processed = []
for doc in langchain_docs:
    docs_processed += text_splitter.split_documents([doc])

#miután bekalibráltuk a szegmentáló algoritmust, azután végighaladunk az általunk létrehozott dokumentum listán

  0%|          | 0/2647 [00:00<?, ?it/s]

In [None]:
from huggingface_hub import InferenceClient


repo_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"

llm_client = InferenceClient(
    model=repo_id,
    timeout=120,
)
#meghívjuk az InferenceClient modult a huggingface könyvtárból, majd beállítjuk paraméterenek a mistarlAI modelljét
#továbbá korlátozzuk a kérés maximális időtúllépését másodpercekben

def call_llm(inference_client: InferenceClient, prompt: str):
    response = inference_client.post(
        json={
            "inputs": prompt,
            "parameters": {"max_new_tokens": 1000},
            "task": "text-generation",
        },
    )
    return json.loads(response.decode())[0]["generated_text"]

#létrehozunk egy metódust, melyben egy

call_llm(llm_client, "This is a test context")

'This is a test context for the `@mui/material` library.\n\n## Installation\n\n```sh\nnpm install @mui/material\n```\n\n## Usage\n\n```jsx\nimport React from \'react\';\nimport { Button } from \'@mui/material\';\n\nfunction App() {\n  return (\n    <div className="App">\n      <Button variant="contained" color="primary">\n        Hello World\n      </Button>\n    </div>\n  );\n}\n\nexport default App;\n```\n\n## Documentation\n\n- [Material-UI](https://material-ui.com/)\n- [Material Design](https://material.io/)'

In [None]:
#Tesztelésként nézzük meg mit reagál egy kérdésre és hogyan tud választ adni rá
result =call_llm(llm_client, "What is the meaning of life?")
print(result)

What is the meaning of life?

This is a question that has puzzled philosophers, theologians, and scientists for centuries. It is a question that has been asked in many different ways, and it has been given many different answers.

Some people believe that the meaning of life is to seek happiness and fulfillment. Others believe that it is to serve a higher power or to follow a set of moral principles. Still others believe that the meaning of life is to learn and grow, to experience new things and to become the best version of ourselves that we can be.

Ultimately, the meaning of life is a personal and subjective question. It is up to each individual to decide what gives their life meaning and purpose. For some, this may be a religious or spiritual belief. For others, it may be a career or a hobby. And for others still, it may be a relationship or a family.

No matter what gives your life meaning, it is important to remember that the journey is just as important as the destination. The e

In [None]:
QA_generation_prompt = """
Your task is to write a factoid question and an answer given a context.
Your factoid question should be answerable with a specific, concise piece of factual information from the context.
Your factoid question should be formulated in the same style as questions users could ask in a search engine.
This means that your factoid question MUST NOT mention something like "according to the passage" or "context".

Provide your answer as follows:

Output:::
Factoid question: (your factoid question)
Answer: (your answer to the factoid question)

Now here is the context.

Context: {context}\n
Output:::"""

## Kérdések létrehozása

In [None]:
import random

N_GENERATIONS = 10
# Csak 10 kérdést hozunk létre a futtatás rövidsége és az elérési hibák elkerülése végett
print(f"Generating {N_GENERATIONS} QA couples...")
# Ki is írjuk, hogy hányat generálunk le

outputs = []
for sampled_context in tqdm(random.sample(docs_processed, N_GENERATIONS)):
    # itt hozzuk létre kérdés és válasz párokat, véletlenszerű mintát veszünk a random.sample függvény segítségével
    output_QA_couple = call_llm(llm_client, QA_generation_prompt.format(context=sampled_context.page_content))
    #minden iterációban meghívjuk az LLM klienst, hogy hozzon létre nekünk egy párt, alapként felhasználjuk az előkészített QA_generation_prompt-ot
    #hogy tömbösítve tudjuk létrehozni a kérdéseket
    try:
        question = output_QA_couple.split("Factoid question: ")[-1].split("Answer: ")[0]
        answer = output_QA_couple.split("Answer: ")[-1]
        assert len(answer) < 300, "Answer is too long"
        #itt kidobjuk azokat a válaszokat, amelyek túlságosan hosszúak
        outputs.append(
            {
                "context": sampled_context.page_content,
                "question": question,
                "answer": answer,
                "source_doc": sampled_context.metadata["source"],
            }
        )
        #itt megadjuk az outputs lista elemeit és betöltjük őket rá
    except:
        continue

Generating 10 QA couples...


  0%|          | 0/10 [00:00<?, ?it/s]

Itt továbbá megadjuk a következő futtatáshoz is a promptokat

In [None]:
question_groundedness_critique_prompt = """
You will be given a context and a question.
Your task is to provide a 'total rating' scoring how well one can answer the given question unambiguously with the given context.
Give your answer on a scale of 1 to 5, where 1 means that the question is not answerable at all given the context, and 5 means that the question is clearly and unambiguously answerable with the context.

Provide your answer as follows:

Answer:::
Evaluation: (your rationale for the rating, as a text)
Total rating: (your rating, as a number between 1 and 5)

You MUST provide values for 'Evaluation:' and 'Total rating:' in your answer.

Now here are the question and context.

Question: {question}\n
Context: {context}\n
Answer::: """

question_relevance_critique_prompt = """
You will be given a question.
Your task is to provide a 'total rating' representing how useful this question can be to machine learning developers building NLP applications with the Hugging Face ecosystem.
Give your answer on a scale of 1 to 5, where 1 means that the question is not useful at all, and 5 means that the question is extremely useful.

Provide your answer as follows:

Answer:::
Evaluation: (your rationale for the rating, as a text)
Total rating: (your rating, as a number between 1 and 5)

You MUST provide values for 'Evaluation:' and 'Total rating:' in your answer.

Now here is the question.

Question: {question}\n
Answer::: """

question_standalone_critique_prompt = """
You will be given a question.
Your task is to provide a 'total rating' representing how context-independant this question is.
Give your answer on a scale of 1 to 5, where 1 means that the question depends on additional information to be understood, and 5 means that the question makes sense by itself.
For instance, if the question refers to a particular setting, like 'in the context' or 'in the document', the rating must be 1.
The questions can contain obscure technical nouns or acronyms like Gradio, Hub, Hugging Face or Space and still be a 5: it must simply be clear to an operator with access to documentation what the question is about.

For instance, "What is the name of the checkpoint from which the ViT model is imported?" should receive a 1, since there is an implicit mention of a context, thus the question is not independant from the context.

Provide your answer as follows:

Answer:::
Evaluation: (your rationale for the rating, as a text)
Total rating: (your rating, as a number between 1 and 5)

You MUST provide values for 'Evaluation:' and 'Total rating:' in your answer.

Now here is the question.

Question: {question}\n
Answer::: """

In [None]:
print("Generating critique for each QA couple...")
for output in tqdm(outputs):
    evaluations = {
        "groundedness": call_llm(
            llm_client,
            question_groundedness_critique_prompt.format(context=output["context"], question=output["question"]),
        ),
        "relevance": call_llm(
            llm_client,
            question_relevance_critique_prompt.format(question=output["question"]),
        ),
        "standalone": call_llm(
            llm_client,
            question_standalone_critique_prompt.format(question=output["question"]),
        ),
    }
    #itt a három fent definiált promptba behelyezzük a kérdéseket és végighaladunk az elemek listáján
    #ezeket itt három csoportba osztotta fel groundedness, relevance, standalone, itt is a call_llm modult használjuk
    try:
        for criterion, evaluation in evaluations.items():
            score, eval = (
                int(evaluation.split("Total rating: ")[-1].strip()), #score-hoz
                evaluation.split("Total rating: ")[-2].split("Evaluation: ")[1], #eval-hoz
            )
            output.update(
                {
                    f"{criterion}_score": score,
                    f"{criterion}_eval": eval,
                }
            )
    #az eredmények feldolgozása során kinyerjük a pontszámokat és az értékeléseket
    except Exception as e:
        continue
    #következő generáláshoz lép

Generating critique for each QA couple...


  0%|          | 0/7 [00:00<?, ?it/s]

Ezekután létrehozunk egy dataframe-et amelyben rászűrünk azokra a válaszokra amelyeket a legmegfelelőbbnek találtunk, mindkettőt kiíratjuk

In [None]:
import pandas as pd

pd.set_option("display.max_colwidth", None)

generated_questions = pd.DataFrame.from_dict(outputs)

print("Evaluation dataset before filtering:")

display(

    generated_questions[

        [

            "question",

            "answer",

            "groundedness_score",

            "relevance_score",

            "standalone_score",

        ]

    ]

)

generated_questions = generated_questions.loc[

    (generated_questions["groundedness_score"] >= 4)

    & (generated_questions["relevance_score"] >= 4)

    & (generated_questions["standalone_score"] >= 4)

]

print("============================================")

print("Final evaluation dataset:")

display(

    generated_questions[

        [

            "question",

            "answer",

            "groundedness_score",

            "relevance_score",

            "standalone_score",

        ]

    ]

)

eval_dataset = datasets.Dataset.from_pandas(generated_questions, split="train", preserve_index=False)

Evaluation dataset before filtering:


Unnamed: 0,question,answer,groundedness_score,relevance_score,standalone_score
0,Who fixed the typo in iterator variable name in run_predict function?\n,Freddy Aboulton,5.0,1.0,1.0
1,Which video shows how to work with Hugging Face models on Amazon SageMaker?\n,Working with Hugging Face models on Amazon SageMaker,,,
2,What is the BLEU score for a translation that is too short?\n,The BLEU score for a translation that is too short is 0.0.,1.0,3.0,5.0
3,Why were there larger than expected drops for the upscaled 384/512 in21k fine-tune weights?\n,The context suggests that the larger than expected drops for the upscaled 384/512 in21k fine-tune weights may be due to possible missing details or sensitivity of the 21k FT to small preprocessing.,3.0,4.0,2.0
4,What is the top 1 accuracy of dla34 on ImageNet?\n,74.62%,5.0,1.0,5.0
5,What does the Inflation Reduction Act lower?\n,"The Inflation Reduction Act lowers prescription drug costs, health care costs, and energy costs.\n</tf>\n</frameworkcontent>",5.0,1.0,5.0
6,How to enable calibration for a tensor quantizer in pytorch_quantization?\n,"To enable calibration for a tensor quantizer in pytorch_quantization, you can use the `enable_calib()` method of the quantizer module. For example, if the quantizer module is named `module`, you can enable calibration using `module.enable_calib()`.",5.0,4.0,5.0


Final evaluation dataset:


Unnamed: 0,question,answer,groundedness_score,relevance_score,standalone_score
6,How to enable calibration for a tensor quantizer in pytorch_quantization?\n,"To enable calibration for a tensor quantizer in pytorch_quantization, you can use the `enable_calib()` method of the quantizer module. For example, if the quantizer module is named `module`, you can enable calibration using `module.enable_calib()`.",5.0,4.0,5.0


# **RAG rendszer létrehozása**

In [None]:
from langchain.docstore.document import Document as LangchainDocument

RAW_KNOWLEDGE_BASE = [
    LangchainDocument(page_content=doc["text"], metadata={"source": doc["source"]}) for doc in tqdm(ds)
]
#itt létrehozunk egy újabb LangchainDocumentet a forrás dataseg-ből (vagy adattömbből), szintén hasonló formátumban

  0%|          | 0/2647 [00:00<?, ?it/s]

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from transformers import AutoTokenizer


def split_documents(
    chunk_size: int,
    knowledge_base: List[LangchainDocument],
    tokenizer_name: str,
) -> List[LangchainDocument]:
    #Itt létrehozunk egy listát melyben szintént text splitterrel daraboljuk fel a szöveget
    text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(
        AutoTokenizer.from_pretrained(tokenizer_name),
        chunk_size=chunk_size,
        chunk_overlap=int(chunk_size / 10),
        add_start_index=True,
        strip_whitespace=True,
        separators=["\n\n", "\n", ".", " ", ""],
    )
    #itt beraakjuk a kapott langchain documentet- a textsplitter-rel való szétvágás utána  listába

    docs_processed = []
    for doc in knowledge_base:
        docs_processed += text_splitter.split_documents([doc])

    # Kiszedni a duplikátumokat, így csak egyszer fog egy dokumentum szerepelni a végső kimenetben
    unique_texts = {}
    docs_processed_unique = []
    for doc in docs_processed:
        if doc.page_content not in unique_texts:
            unique_texts[doc.page_content] = True
            docs_processed_unique.append(doc)

    return docs_processed_unique

## 2.2 Retriever embedding-ek beállítása

In [None]:
from langchain.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores.utils import DistanceStrategy
import os


def load_embeddings(
    langchain_docs: List[LangchainDocument],
    chunk_size: int,
    embedding_model_name: Optional[str] = "thenlper/gte-small",
) -> FAISS:
    # betölteni az embedding modelt és létrehoz egy FAISS indexet a dokumentumokhoz
    embedding_model = HuggingFaceEmbeddings(
        model_name=embedding_model_name,
        multi_process=True,
        model_kwargs={"device": "cuda"}, #ezt át lehet állítani, ha nem CUDA alapon szeretnénk futtatni
        encode_kwargs={"normalize_embeddings": True},  # minden beágyazás hossza 1-re lesz normalizálva
        #így lehetővé teszi a koszinusz hasonlóság közvetlen számításást, leegyszerűsíti az analízist
    )

    # megnézni azt, hogy az embedding már létezik-e
    index_name = f"index_chunk:{chunk_size}_embeddings:{embedding_model_name.replace('/', '~')}"
    index_folder_path = f"./data/indexes/{index_name}/"
    if os.path.isdir(index_folder_path):
        return FAISS.load_local(
            index_folder_path,
            embedding_model,
            distance_strategy=DistanceStrategy.COSINE,
        )
    #ha nem akkor lefuttatja az alábbi kódot és visszaadja a létrehozott FAISS indexeket
    else:
        print("Index not found, generating it...")
        docs_processed = split_documents(
            chunk_size,
            langchain_docs,
            embedding_model_name,
        )
        knowledge_index = FAISS.from_documents(
            docs_processed, embedding_model, distance_strategy=DistanceStrategy.COSINE
        )
        knowledge_index.save_local(index_folder_path)
        return knowledge_index

## 2.3 Reader - LLM

RAG Prompt template beállítása

In [None]:
RAG_PROMPT_TEMPLATE = """
<|system|>
Using the information contained in the context,
give a comprehensive answer to the question.
Respond only to the question asked, response should be concise and relevant to the question.
Provide the number of the source document when relevant.
If the answer cannot be deduced from the context, do not give an answer.</s>
<|user|>
Context:
{context}
---
Now here is the question you need to answer.

Question: {question}
</s>
<|assistant|>
"""

Reader LLM bevezetése, amely elolvassa a dokumentumokat, ahhoz, hogy válaszoljon

In [None]:
from langchain_community.llms import HuggingFaceHub

repo_id = "HuggingFaceH4/zephyr-7b-beta"
READER_MODEL_NAME = "zephyr-7b-beta"

READER_LLM = HuggingFaceHub(
    huggingfacehub_api_token= 'HF_HUB_KEY',#HF_HUB_KEY
    repo_id=repo_id,
    task="text-generation",#modellkalibrálás
    model_kwargs={
        "max_new_tokens": 512,
        "top_k": 30, #legvalószínűbb tokenek száma
        "temperature": 0.1, #generálás szabadságfoka
        "repetition_penalty": 1.03, #ismétlődések bűntetése
    },#kiegészítő paraméterek
)

  warn_deprecated(


In [None]:
from ragatouille import RAGPretrainedModel
from langchain_core.vectorstores import VectorStore
from langchain_core.language_models.llms import LLM

#question kérdés, amire a RAG modell válaszolni fog
#llm, amit felhasználunk a modelhez
#vektorbázis, ami tartalmazza a tudást, mellyel a RAG modell válaszol

def answer_with_rag(
    question: str,
    llm: LLM,
    knowledge_index: VectorStore,
    reranker: Optional[RAGPretrainedModel] = None,
    num_retrieved_docs: int = 30,
    num_docs_final: int = 7,
) -> Tuple[str, List[LangchainDocument]]:
    #A knowledge index alapján RAG-gal válaszol
    # dokumentumok begyűjtése a retreiver-rel
    relevant_docs = knowledge_index.similarity_search(query=question, k=num_retrieved_docs)
    relevant_docs = [doc.page_content for doc in relevant_docs]  # keep only the text

    # Opcionálisan rerank-olhatóak az eredmények
    if reranker:
        relevant_docs = reranker.rerank(question, relevant_docs, k=num_docs_final)
        relevant_docs = [doc["content"] for doc in relevant_docs]

    relevant_docs = relevant_docs[:num_docs_final]

    # Végső prompt
    context = "\nExtracted documents:\n"
    context += "".join([f"Document {str(i)}:::\n" + doc for i, doc in enumerate(relevant_docs)])

    final_prompt = RAG_PROMPT_TEMPLATE.format(question=question, context=context)

    # válasz adása
    answer = llm(final_prompt)

    return answer, relevant_docs

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
