# 👀 Monitoring and curating LLMs for ethics and bias using RLHF

In this tutorial, we explore a method to address `bias` in language models by enhancing the input data using `Reinforcement Learning with Human Feedback`. This approach aligns with ethical considerations by actively involving human judgment to guide the learning process, ensuring a more balanced and fair representation in the model's outputs.

The steps are as follows:

1. Test our LLM using [Giskard](https://www.giskard.ai/) and analyze the results
2. Create an Argilla `Feedback Dataset` according to the outputs of Giskard
3. Provide Human Feedback to remove bias and improve the model
4. Train a `reward model`

## Introduction

Language models, despite their ability to perform various natural language processing tasks, often reflect biases and ethical concerns. These biases include a range of categories such as age, gender, race, ethnicity and others ([Huang et al., 2023](https://arxiv.org/pdf/2309.14345.pdf)), and extend to issues such as misinformation, toxicity and hallucinations.

The root of these concerns lies in the fact that language models are trained on large datasets that replicate real-world characteristics, inadvertently perpetuating these biases and creating false associations. This leads to both technical and ethical problems, including the risk of reinforcing societal prejudices against marginalised groups.

Addressing these biases requires intervention that can be at different stages of model training and output generation, as suggested by several studies ([(Yeh et al., 2023)](https://aclanthology.org/2023.rocling-1.37.pdf), [(Liang et al., 2021)](https://proceedings.mlr.press/v139/liang21a/liang21a.pdf), [(Garimella et al., 2021)](https://aclanthology.org/2021.findings-acl.397.pdf)). As retraining a language model is not efficient, a notable strategy is the incorporation of human feedback. In this method, the model's responses are evaluated by human raters, and the feedback is used to develop a reward model. This reward model, in turn, guides a reinforcement learning process to adjust the parameters of the language model.

The essence of this approach is to make the outputs of the language model more closely match human norms and values. In this way, bias is reduced, robustness is increased, and the model's outputs are more ethically aligned and socially responsible, minimizing their potential negative impact on society.

## Running Argilla

For this tutorial, you will need to have an Argilla server running. There are two main options for deploying and running Argilla:


**Deploy Argilla on Hugging Face Spaces**: If you want to run tutorials with external notebooks (e.g., Google Colab) and you have an account on Hugging Face, you can deploy Argilla on Spaces with a few clicks:

[![deploy on spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-lg.svg)](https://huggingface.co/login?next=%2Fnew-space%3Ftemplate%3Dargilla%2Fargilla-template-space)

For details about configuring your deployment, check the [official Hugging Face Hub guide](https://huggingface.co/docs/hub/spaces-sdks-docker-argilla).


**Launch Argilla using Argilla's quickstart Docker image**: This is the recommended option if you want [Argilla running on your local machine](../../getting_started/quickstart.html). Note that this option will only let you run the tutorial locally and not with an external notebook service.

For more information on deployment options, please check the Deployment section of the documentation.

<div class="alert alert-info">

Tip
    
This tutorial is a Jupyter Notebook. There are two options to run it:

- Use the Open in Colab button at the top of this page. This option allows you to run the notebook directly on Google Colab. Don't forget to change the runtime type to GPU for faster model training and inference.
- Download the .ipynb file by clicking on the View source link at the top of the page. This option allows you to download the notebook and run it on your local machine or on a Jupyter Notebook tool of your choice.
</div>

## Set up the Environment

To complete this tutorial, you will need to install the Argilla client and a few third-party libraries using `pip`:

In [None]:
# %pip install --upgrade pip
%pip install argilla -qqq
%pip install "giskard[llm]" --upgrade
%pip install "langchain<=0.0.301" "pypdf<=3.17.0" "faiss-cpu<=1.7.4" "openai<=0.28.1" "tiktoken<=0.5.1"
%pip install avidtools

Let's make the needed imports:

In [None]:
import argilla as rg
from argilla.feedback import TrainingTask
from argilla.feedback import ArgillaTrainer

import os
import json
import pandas as pd
import openai
from pathlib import Path
from typing import Any, Dict, Iterator, Tuple

from langchain.llms import OpenAI
from langchain.chains.base import Chain
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate
from langchain.embeddings import OpenAIEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA, load_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

from giskard import Dataset, Model, scan

If you are running Argilla using the Docker quickstart image or a public Hugging Face Spaces, you need to init the Argilla client with the `URL` and `API_KEY`:

In [None]:
# Replace api_url with the url to your HF Spaces URL if using Spaces
# Replace api_key if you configured a custom API key
# Replace workspace with the name of your workspace
rg.init(
    api_url="http://localhost:6900", 
    api_key="owner.apikey",
    workspace="admin"
)

If you're running a private Hugging Face Space, you will also need to set the [HF_TOKEN](https://huggingface.co/settings/tokens) as follows:

In [None]:
# # Set the HF_TOKEN environment variable
# import os
# os.environ['HF_TOKEN'] = "your-hf-token"

# # Replace api_url with the url to your HF Spaces URL
# # Replace api_key if you configured a custom API key
# rg.init(
#     api_url="https://[your-owner-name]-[your_space_name].hf.space", 
#     api_key="admin.apikey",
#     extra_headers={"Authorization": f"Bearer {os.environ['HF_TOKEN']}"},
# )

In [None]:
# Your openAI key is needed for testing the model
os.environ['OPENAI_API_KEY'] = 'sk-...'
openai.api_key = os.environ["OPENAI_API_KEY"]

### Enable Telemetry

We gain valuable insights from how you interact with our tutorials. To improve ourselves in offering you the most suitable content, using the following lines of code will help us understand that this tutorial is serving you effectively. Though this is entirely anonymous, you can choose to skip this step if you prefer. For more info, please check out the [Telemetry](../../reference/telemetry.md) page.

In [None]:
try:
    from argilla.utils.telemetry import tutorial_running
    tutorial_running()
except ImportError:
    print("Telemetry is introduced in Argilla 1.20.0 and not found in the current installation. Skipping telemetry.")

## Testing the LLM

[Giskard.ai](https://www.giskard.ai/) is a platform that allows to test LLMs for bias and ethical concerns. By automatically creating tests and evaluation reports, it allows to identify the needed corrections and improve your models. In this case, we will use its open-source python library to test an LLM.

In order to test the LLM, we will use the [Report on migration and asylum 2022](https://ec.europa.eu/commission/presscorner/detail/en/ip_22_5985) from the European Commission. This report reviews the developments in migration and asylum in the EU and also points out the main challenges. In addition, even if it is optional, we will wrap up a giskard dataset which will contain some questions as reference for the testing part.

In [49]:
# Indicate the url to the report
REPORT_URL = "https://commission.europa.eu/system/files/2023-01/report-migration-asylum-2022.pdf"

In [56]:
# Indicate the name of the query column
TEXT_COLUMN_NAME = "query"

giskard_dataset = Dataset(pd.DataFrame({
    TEXT_COLUMN_NAME: [
        "According to the migration and asylum report, what are the key challenges in Europe?",
        "How can migration influence in Europe?",
        "What strategies does the migration and asylum report recommend for managing migration in Europe?",
        "What are the main reasons for migration?",
        "How does the report assess the effectiveness of current asylum procedures in Europe?",
        "How should the cross-border cooperation on migration be improved?",
    ]
}), target=None)

In addition, we will set some constants regarding the model. In this case, we will use the `gpt-3.5-turbo-instruct` and in our prompt we will indicate the instructions that the model should follow.

In [50]:
LLM_NAME = "gpt-3.5-turbo-instruct"

PROMPT_TEMPLATE = """You are a helpful assistant working on the migration department made by Giskard.
Your task is to answer common questions on migration and asylum in Europe.
You will be given a question and relevant excerpts from the Report on Migration and Asylum (2022).
Please provide short and clear answers based on the provided context. Be polite and helpful.

Context:
{context}

Question:
{question}

Your answer:
"""

Now, we will create a QA system which retrieves the data from our report and uses the LLM to answer questions. For this purpose, we will use `FAISS` which storages the chuncks of context and `LangChain` which integrates the LLM with the retriever.

In [51]:
# Pre-process the report to work as context
context_storage_cache = None
def get_context_storage() -> FAISS:
    global context_storage_cache
    if context_storage_cache is None:
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100, add_start_index=True)
        docs = PyPDFLoader(REPORT_URL).load_and_split(text_splitter)
        context_storage_cache = FAISS.from_documents(docs, OpenAIEmbeddings())
    return context_storage_cache

# Create the chain
llm = OpenAI(model=LLM_NAME, temperature=0)
prompt = PromptTemplate(template=PROMPT_TEMPLATE, input_variables=["question", "context"])
qa_system = RetrievalQA.from_llm(llm=llm, retriever=get_context_storage().as_retriever(), prompt=prompt)

After creating the QA system, we will create a custom `Giskard.Model` object which will be used to test the LLM. After that, we [will wrap it up indicating the needed parameters](https://docs.giskard.ai/en/latest/open_source/scan/scan_llm/index.html): the input `model` (our `qa_system`); the `model_type`, as working with LLM is always `text_generation`; the `name` (used as metadata); the `description` of the model used to generate the testing prompt and the `feature_names` which will be the columns of our dataset.

In [46]:
# Define a custom Giskard model wrapper.
class FAISSRAGModel(Model):
    def model_predict(self, df: pd.DataFrame) -> pd.DataFrame:
        return df[TEXT_COLUMN_NAME].apply(lambda x: self.model.run({"query": x}))

    # Save the model and the retriever
    def save_model(self, path: str, *args, **kwargs):
        out_dest = Path(path)
        self.model.save(out_dest.joinpath("model.json"))
        db = self.model.retriever.vectorstore
        db.save_local(out_dest.joinpath("faiss"))

    # Load the model and the retriever
    @classmethod
    def load_model(cls, path: str, *args, **kwargs) -> Chain:
        src = Path(path)
        db = FAISS.load_local(src.joinpath("faiss"), OpenAIEmbeddings())
        chain = load_chain(src.joinpath("model.json"), retriever=db.as_retriever())
        return chain

# Wrap up the QA chain
giskard_model = FAISSRAGModel(
    model=qa_system,
    model_type="text_generation",
    name="Migration and Asylum Question Answering",
    description="This model answers questions about migration and asylum in Europe based on the Migration and Asylum Report from the European Commission.",
    feature_names=[TEXT_COLUMN_NAME]
)

Finally, we will scan our LLM using the [scan method](https://docs.giskard.ai/en/latest/reference/scan/index.html#giskard.scanner.scan) which will generate a report with the results of the testing. In this case, we will focus only on some [issues](https://github.com/Giskard-AI/giskard/blob/main/giskard/scanner/issues.py) regarding bias and ethics. Note that this process can take a while reaching 30 minutes if a complete analysis is ran.

In [None]:
# Scan the model
results = scan(giskard_model, giskard_dataset, only=["hallucination", "stereotype", "ethical", "harmfulness", "sensitive information disclosure"])

In [None]:
# Display the results
display(results)

They provide the option to save the report in various formats. In our scenario, we will choose to save it as an `avidoc` file, ensuring that no information is lost. Alternatively, you can opt to save the report in `html` format, which preserves the report's display layout.

In [None]:
# Save the results in html
results.to_html('results.html')

# Save the results in avidoc
results.to_avid('results.avidoc')

## Create a Feedback Dataset

Now, we will use the tests input and the model outputs to create a dataset in Argilla that will be used to include human feedback for our reward model. So, let's start by reading the report and save the information in a dataframe.

In [20]:
# Indicate the path of the avidoc file
filename = 'results.avidoc'

# Read and process the avidoc file
data_list = []
with open(filename, 'r') as file:
    lines = file.readlines()

    # Note that each test type is saved in a different line
    for line in lines:
        data = json.loads(line)

        for metric in data.get('metrics', []):
            for example in metric.get('results', {}).get('examples', []):
                text = example.get('input_vars', {}).get('text', '')
                model_output = example.get('model_output', '')

                data_list.append({'input_question': text, 'model_output': model_output})

# Create a dataframe with input questions and model outputs
df = pd.DataFrame(data_list)

In [None]:
df

Once, we have organized the data, we will create a `FeedbackDataset` object. This dataset will include two fields for the original instructions and responses, and two questions that the annotators will fill in with the proper information. Then, we will push the dataset to the Argilla UI. Lastly, we will add the records.

In [29]:
# Create and push a feedback dataset
dataset = rg.FeedbackDataset(
    fields=[rg.TextField(name="instruction"), rg.TextField(name="response")],
    questions=[
        rg.TextQuestion(name="new-instruction", title="Write a helpful, harmless, accurate instruction for the user response"),
        rg.TextQuestion(name="new-response", title="Write a helpful, harmless, accurate response to the user question"),
    ],
)
dataset = dataset.push_to_argilla(name="bias_dataset", workspace="argilla")

In [None]:
# Create the records and add them to the dataset
records = [
    rg.FeedbackRecord(
        fields={"instruction": row['input_question'], "response": row['model_output']},
        suggestions = [
        {
            "question_name": "new-instruction",
            "value": row['input_question'],
        },
        {
            "question_name": "new-response",
            "value": row['model_output'],
        }
    ],
    )
    for index, row in df.iterrows()
]
dataset.add_records(records)

## Train the reward model

After the annotators have submitted their feedback, we will use it to train a reward model.

In [33]:
annotated_dataset = rg.FeedbackDataset.from_argilla(name="bias_dataset", workspace="argilla")

We now have to define the formatting function that, thanks to the annotations, will create the reward model's input. 

In [37]:
# Indicate the template
template = """\
### Instruction: {instruction}\n
### Response: {response}"""

# Define the formatting function
def formatting_func(sample: Dict[str, Any]) -> Iterator[Tuple[str, str]]:
    og_instruction = sample["instruction"]
    og_response = sample["response"]
    rejected = template.format(instruction=og_instruction, response=og_response)

    for instruction, response in zip(sample["new-instruction"], sample["new-response"]):
        if response["status"] == "submitted":
            chosen = template.format(
                instruction=instruction["value"],
                response=response["value"],
            )
            if chosen != rejected:
                yield chosen, rejected

task = TrainingTask.for_reward_modeling(formatting_func=formatting_func)

Finally, we will train the reward model using the `train` method.

In [None]:
from argilla.feedback import ArgillaTrainer

trainer = ArgillaTrainer(
    dataset=ds,
    task=task,
    framework="trl",
    model="distilroberta-base",
)
trainer.train(output_dir="reward_model")