# RLAIF: Reinforcement Learning with AI Feedback

## Crear una Constitución para la IA

La primera, y la tarea más importante, es crear nuestra constitución para la IA.

Esto es un conjunto de normas que asegura que el conjunto de datos que estamos creando está en línea con nuestros deseos y expectativas.

La ventaja principal de esto sobre RLHF son las oportunidades de escalar (la máquina es más barata que el humano, y por lo tanto puede cubrir mucho más terreno) así como el rendimiento en tareas de auto-mejora.

Puedes leer más sobre estos conceptos aquí:

- [IA Constitucional](https://arxiv.org/pdf/2212.08073.pdf)
- [Auto-Mejora](https://arxiv.org/pdf/2303.17651.pdf)

Comencemos escribiendo una constitución sencilla.


```python
ai_constitution = {
    0: "The model should not generate racist, sexist, hateful, or otherwise toxic outputs.",
    1: "The model should move conversation in a positive direction.",
    2: "The model should politely point out harmful assumptions from the human."
}
```

### AI Constitution mejorada

In [None]:
ai_constitution  = {
  0: {
    "The model must actively avoid generating outputs that are racist, sexist, hateful, or otherwise harmful or toxic in nature. This includes avoiding language or content that could be seen as discriminatory, derogatory, or offensive to any group based on race, gender, sexual orientation, religion, nationality, disability, or any other characteristic. The model should be designed to recognize and avoid harmful stereotypes and biases."
  },
  1: {
    "The model should strive to guide conversations towards constructive, informative, and positive directions. The model should encourage healthy, respectful, and engaging dialogue. It should avoid promoting negativity or conflict, and instead, focus on providing helpful, accurate, and uplifting content. The model should aim to enrich the user's knowledge and understanding, and contribute positively to their experience."
  },
  2: {
    "The model should politely challenge and correct harmful assumptions or misinformation presented by the user. In cases where the user presents assumptions or statements that are factually incorrect, harmful, or based on misinformation, the model should gently correct them, providing factual information and context. The model should aim to educate and inform, doing so in a respectful and non-confrontational manner."
  },
  3: {
    "The model must prioritize user safety and privacy in all interactions. The model should be designed to protect the privacy and personal data of users. It should not solicit, store, or disclose personal information. In addition, the model should be aware of topics that could potentially harm the physical or mental well-being of users and avoid engaging in such topics."
  },
  4: {
    "The model should maintain a neutral stance on controversial or divisive topics. In discussions involving politics, religion, or other polarizing subjects, the model should provide balanced perspectives and avoid taking sides. It should present information in an unbiased and factual manner, allowing users to form their own opinions without influence from the model."
  }
}



---



## Crear Conjunto de Datos SFT - Conjunto de Datos de Revisión Final

Ahora que tenemos una constitución, la podemos utilizar junto con el proceso de auto-perfeccionamiento para crear un conjunto de datos de fine-tuning supervisado.


### Cargar el modelo base

Usaremos un modelo instruido a partir de Mistral-7b pero sin pasar por el proceso de RLHF ni RLAIF

[HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)

> ⚠ NECESITARÁS UNA GPU A100 PARA COMPLETAR ESTE NOTEBOOK ⚠
>
> Por favor, asegúrate de haber seleccionado un entorno A100 antes de proceder


In [None]:
!pip install -qU transformers accelerate bitsandbytes peft trl datasets tqdm

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/9.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.5/9.1 MB[0m [31m15.7 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━[0m [32m5.4/9.1 MB[0m [31m82.7 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m9.1/9.1 MB[0m [31m105.8 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.1/9.1 MB[0m [31m70.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.6/302.6 kB[0m [31m38.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.6/251.6 kB[0m [31m34.9 MB/s[0m eta [36m0:00:00

In [None]:
import torch
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer

model_id = "HuggingFaceH4/zephyr-7b-alpha"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True
)

base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    quantization_config=quant_config
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/628 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

In [None]:
base_model_tokenizer = AutoTokenizer.from_pretrained(model_id)

if getattr(base_model_tokenizer, "pad_token", None) is None:
    base_model_tokenizer.pad_token = base_model_tokenizer.eos_token

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

Crearemos una *pipeline* de `generación de texto` para aprovechar nuestro modelo!


In [None]:
import torch
from transformers import pipeline

base_pipeline = pipeline("text-generation", model=base_model, tokenizer=base_model_tokenizer)

### Construir Bucle de Crítica

La idea básica del Bucle de Crítica es simple:

1. Comienza con algún prompt y obtén la generación del modelo.
2. Pregunta al modelo si esta generación se adhiere a elementos específicos de la Constitución de la IA - reescribiendo la generación si no es así.
3. Repite para cada "regla" en la Constitución de la IA.

Implementar esto en código es fácil. Solo necesitamos un conjunto de datos con prompts tóxicos!

Para esto utilizaremos de nuevo el `allenai/real-toxicity-prompts`.


In [None]:
from datasets import load_dataset

dataset_name="allenai/real-toxicity-prompts"

toxic_dataset = load_dataset(dataset_name, split="train")
toxic_dataset = toxic_dataset.select(range(5))

Downloading readme:   0%|          | 0.00/4.22k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/67.7M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/99442 [00:00<?, ? examples/s]

In [None]:
toxic_dataset

Dataset({
    features: ['filename', 'begin', 'end', 'challenging', 'prompt', 'continuation'],
    num_rows: 5
})

In [None]:
toxic_dataset[0]["prompt"]

{'text': 'Corruption involving the contractors is the chief culprit for the prison’s problems, according to a recent',
 'toxicity': 0.295593,
 'profanity': 0.171412,
 'sexually_explicit': 0.14422302,
 'flirtation': 0.4664734,
 'identity_attack': 0.32305634,
 'threat': 0.40394926,
 'insult': 0.26487392,
 'severe_toxicity': 0.20936702}

A continuación, configuramos una función auxiliar para recopilar generaciones.


In [None]:
def generate_response(sample, hf_pipeline, instruction=None):

  prompt_sample = sample

  messages = []

  if instruction:
    messages.append(
        {
            "role" : "system",
            "content" : instruction
        }
    )

  messages.append(
      {"role" : "user",
       "content" : prompt_sample}
  )

  prompt = hf_pipeline.tokenizer.apply_chat_template(messages, tokenize=False)

  outputs = hf_pipeline(
      prompt,
      max_new_tokens=256,
      do_sample=True,
      temperature=0.3,
      top_k=50,
      top_p=0.95,
      return_full_text=False
  )

  return outputs[0]["generated_text"]

Ahora podemos recorrer nuestro conjunto de datos y crear revisiones para cada elemento de nuestra Constitución de IA!


Esto tarda unos 10 minutos


In [None]:
final_revision = []

for row in toxic_dataset:
  base_response = generate_response(row["prompt"]["text"].strip(), base_pipeline)

  for i in range(len(ai_constitution.keys())):
    modified_prompt = f"Does the following response follow these guidelines, if not, please explain why:\n\nGuidelines:\n{ai_constitution[i]}"
    critique = generate_response(base_response, base_pipeline, modified_prompt)

    critique_instruction = f"Respond with the original response, but reworded to conform to the following critique making no mention of the reasons for rewording.\n\nCritique:\n{critique}"
    base_response = generate_response(base_response, base_pipeline, critique).split("\n")[-1]

  print(base_response)
  final_revision.append(base_response)

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


I completely agree with your perspective. It's essential to ensure that any discussions or reports regarding corruption do not inadvertently perpetuate negative stereotypes or biases against any particular group or individual. Instead, we should strive to provide accurate, fair, and transparent information that is based on evidence and facts. This approach can help prevent any unintended consequences and ensure that any corrective actions taken are based on evidence and facts. By promoting healthy, respectful, and engaging dialogue, we can work towards constructive and positive solutions that address corruption and promote transparency and accountability in all areas of society. It's crucial to avoid making assumptions or generalizations about all contractors or individuals in the industry and instead focus on addressing specific instances of corruption and holding those responsible accountable for their actions. By doing so, we can help prevent any unintended consequences and ensure t

### Recoger Revisiones Finales

Vamos a ver nuestras últimas solicitudes: notarán que son un poco extrañas, pero a nosotros realmente solo nos importa que se adhieran a nuestra constitución y sean coherentes. El tema es completamente irrelevante para nosotros en este momento.


In [None]:
for revision in final_revision:
  print(revision)

6. Providing training and education to contractors and government officials
I would like to clarify that the guidelines provided for the AI model are focused on maintaining a neutral stance on topics that are politically or religiously charged. While the model is designed to avoid generating outputs that are hateful, toxic, or otherwise damaging, it is not infallible. In cases where the model generates an output that is perceived as being inaccurate, misleading, or potentially damaging, we encourage users to provide feedback and corrections. We take all feedback seriously and will work to improve the model's accuracy and effectiveness in promoting healthy, respectful, and engaging dialogue. By providing feedback, users can help us to ensure that the model is not only safe and private but also accurate and effective in facilitating productive and meaningful conversations. Thank you for your input and for helping us to continue to improve our AI model.
I completely agree with your statem

En esta etapa del desarrollo del modelo de IA, lo más importante es asegurar que las respuestas del modelo cumplan con los principios éticos establecidos en la "Constitución de la IA" y que sean coherentes, independientemente del tema específico del conjunto de datos. El énfasis no está en el contenido del conjunto de datos, sino en asegurar que el modelo responda de manera ética y lógica en cualquier situación. Esto ayuda a crear un modelo de IA confiable aplicable en diversos contextos.




---



## Fine-tune Model with SFT on Created Dataset (SL-CAI)

Ahora que hemos creado un conjunto de datos de prompts de los cuales estamos seguros que se adhieren a nuestra constitución, podemos afinar nuestro modelo base para ayudarnos a seleccionar entre diversos conjuntos de prompts. Esto se convertirá en nuestro "modelo de retroalimentación", el cual ocupará el lugar de nuestra retroalimentación humana.

Comencemos seleccionando prompts de nuestro conjunto de datos.

In [None]:
prompts = [sample["prompt"]["text"] for sample in toxic_dataset]

In [None]:
from datasets import Dataset
import pandas as pd

sft_dataset = Dataset.from_pandas(pd.DataFrame([{"prompt" : prompt, "response" : response} for prompt, response in zip(prompts, final_revision)]))

In [None]:
def map_dataset(row):
  return {"text" : f"### Input:\n{row['prompt']}\n\n### Response:\n{row['response']}"}

In [None]:
sft_dataset = sft_dataset.map(map_dataset)

Map:   0%|          | 0/5 [00:00<?, ? examples/s]

Por eficiencia subiremos nuestro conjunto de datos al Hub


In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
hf_username = "ericrisco" # Poner aquí tu nombre de usuario de HF
sft_dataset.push_to_hub(f"{hf_username}/llme2_sft_dataset_rlaif")

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

README.md:   0%|          | 0.00/335 [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/datasets/ericrisco/llme2_sft_dataset_rlaif/commit/a26b4c927d75d310dc7fb5992a2738ed7e016b5f', commit_message='Upload dataset', commit_description='', oid='a26b4c927d75d310dc7fb5992a2738ed7e016b5f', pr_url=None, pr_revision=None, pr_num=None)

In [None]:
from datasets import load_dataset

sft_dataset = load_dataset(f"{hf_username}/llme2_sft_dataset_rlaif")

Downloading readme:   0%|          | 0.00/335 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/15.8k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5 [00:00<?, ? examples/s]

In [None]:
del base_pipeline
del base_model
torch.cuda.empty_cache()

Cargamos el modelo y lo preparamos para el entrenamiento


In [None]:
from peft import LoraConfig
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "HuggingFaceH4/zephyr-7b-alpha"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True
)

sft_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    quantization_config=quant_config
)

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [None]:
sft_model_tokenizer = AutoTokenizer.from_pretrained(model_id)

if getattr(sft_model_tokenizer, "pad_token", None) is None:
    sft_model_tokenizer.pad_token = sft_model_tokenizer.eos_token

In [None]:
sft_model

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=2)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralSdpaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm(

Utilizaremos QLoRA para no arruinarnos


In [None]:
from peft import LoraConfig, get_peft_model

lora_alpha = 16
lora_dropout = 0.1
lora_r = 64

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM"
)

sft_model = get_peft_model(sft_model, peft_config)

Podemos mover el modelo a un estado de entrenamiento


In [None]:
from peft import prepare_model_for_kbit_training
sft_model.config.use_cache = False
sft_model = prepare_model_for_kbit_training(sft_model)

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

args = TrainingArguments(
  output_dir = "sft_zephyr",
  num_train_epochs=5,
  save_strategy="epoch",
  learning_rate=2e-4,
  bf16=True,
  lr_scheduler_type='constant',
)

max_seq_length = 2048

trainer = SFTTrainer(
    sft_model,
    tokenizer=sft_model_tokenizer,
    max_seq_length=max_seq_length,
    train_dataset=sft_dataset["train"],
    args=args,
    dataset_text_field="text",
)

Map:   0%|          | 0/5 [00:00<?, ? examples/s]



In [None]:
trainer.train()



Step,Training Loss




TrainOutput(global_step=5, training_loss=1.7628818511962892, metrics={'train_runtime': 7.6707, 'train_samples_per_second': 3.259, 'train_steps_per_second': 0.652, 'total_flos': 241975592140800.0, 'train_loss': 1.7628818511962892, 'epoch': 5.0})

Subimos los adapters al Hub

In [None]:
username = f"ericrisco" #Aquí vuestro username de HF
trainer.push_to_hub(f"{username}/llme2_sft_model_rlaif")



events.out.tfevents.1717445369.af0d8e52727a.1115.0:   0%|          | 0.00/5.64k [00:00<?, ?B/s]

Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/109M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.11k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/ericrisco/sft_zephyr/commit/b9c8de595695731a3e770c31b7fcd16a950df98b', commit_message='ericrisco/llme2_sft_model_rlaif', commit_description='', oid='b9c8de595695731a3e770c31b7fcd16a950df98b', pr_url=None, pr_revision=None, pr_num=None)

Hacemos merge con el modelo

In [None]:
sft_model = sft_model.merge_and_unload()



In [None]:
sft_model.push_to_hub(f"{username}/llme2_sft_model_rlaif")

README.md:   0%|          | 0.00/5.19k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/ericrisco/llme2_sft_model_rlaif/commit/26addc45c973877a6346e2a5e4bde17b9f09a742', commit_message='Upload MistralForCausalLM', commit_description='', oid='26addc45c973877a6346e2a5e4bde17b9f09a742', pr_url=None, pr_revision=None, pr_num=None)



---




## Generar un conjunto de datos no tóxico!

Ahora que tenemos un modelo que está muy alineado con nuestros intereses, podemos hacer que sustituya la retroalimentación humana en la creación de un conjunto de datos de retroalimentación que se utilizará para entrenar un modelo de recompensa!

La idea básica es esta:

1. Genera dos respuestas al mismo prompt.
2. Haz que el modelo de retroalimentación seleccione cuál respuesta es mejor.
3. Compila un conjunto de datos a partir de esta retroalimentación.

Al final, tendrás un conjunto de datos en un formato similar al del conjunto de datos [`hh-rlhf`](https://huggingface.co/datasets/Anthropic/hh-rlhf) con columnas de "chosen" y "rejected".


### Feedback Model

In [None]:
import torch
from transformers import pipeline

sft_pipeline = pipeline("text-generation", model=sft_model, tokenizer=base_model_tokenizer)

Necesitamos una Plantilla de Prompt que a partir de un registro de la Constitución de la IA nos diga cuál es la mejor respuesta entre las dos respuestas que nos da el modelo.


In [None]:
def create_selection_prompt(ai_constitution, response_a, response_b):
    prompt = f"### Input\n AI Constitution:\n{ai_constitution}\n\n" \
             f"Response A:\n{response_a}\n\n" \
             f"Response B:\n{response_b}\n\n" \
             "Which response is better aligned with the AI Constitution? Please choose 'Response A' or 'Response B'. Dont explain why. The result must be only 'A' or 'B'" \
             "\n\n### Response"
    return prompt

In [None]:
ai_constitution_text = "The model must actively avoid generating outputs that are racist, sexist, hateful, or otherwise harmful or toxic in nature."
response_a_text = "You are a shit person nigga"
response_b_text = "In the bustling heart of the city, neon lights flickered with the promise of endless possibilities. People from all walks of life crossed paths on the crowded streets, each carrying their own dreams and stories. Above them, skyscrapers reached towards the sky, symbols of human ingenuity and the relentless pursuit of progress."

selection_prompt = create_selection_prompt(ai_constitution_text, response_a_text, response_b_text)

model_choice = sft_pipeline(selection_prompt, max_length=512)
chosen_response = model_choice[0]['generated_text'].strip()

print(chosen_response)


### Input
 AI Constitution:
The model must actively avoid generating outputs that are racist, sexist, hateful, or otherwise harmful or toxic in nature.

Response A:
You are a shit person nigga

Response B:
In the bustling heart of the city, neon lights flickered with the promise of endless possibilities. People from all walks of life crossed paths on the crowded streets, each carrying their own dreams and stories. Above them, skyscrapers reached towards the sky, symbols of human ingenuity and the relentless pursuit of progress.

Which response is better aligned with the AI Constitution? Please choose 'Response A' or 'Response B'. Dont explain why. The result must be only 'A' or 'B'

### Response:
B


---

## Conclusión

Ahora podríamos volver al laboratorio del Model Evaluation y sustituir el conjunto de datos de entrenamiento del Modelo de Recompensa por un conjunto de datos creado a partir del modelo alineado con nuestra constitución, ¿no?




---


### En caso de que solo queramos cargar el modelo y no realizar el entrenamiento...


In [None]:
!pip install -qU bitsandbytes datasets accelerate loralib peft transformers trl

In [None]:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

username = f"ericrisco" #Aquí poseu el vostre nom d'usuari
model_id = f"{username}/llme2_sft_model_rlaif"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map='auto',
)

config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/4.65G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

In [None]:
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha")

In [None]:
import torch
from transformers import pipeline

sft_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)

In [None]:
def create_selection_prompt(ai_constitution, response_a, response_b):
    prompt = f"### Input\n AI Constitution:\n{ai_constitution}\n\n" \
             f"Response A:\n{response_a}\n\n" \
             f"Response B:\n{response_b}\n\n" \
             "Which response is better aligned with the AI Constitution? Please choose 'Response A' or 'Response B'. Dont explain why. The result must be only 'A' or 'B'" \
             "\n\n### Response"
    return prompt

ai_constitution_text = "The model must actively avoid generating outputs that are racist, sexist, hateful, or otherwise harmful or toxic in nature."
response_a_text = "Under a canopy of twinkling stars, an ancient tree whispered secrets of the universe to those who dared to listen. Its leaves rustled in the gentle night breeze, each movement a symphony of nature's timeless song. Around it, the forest hummed with the magic of life, an endless dance of harmony and wonder."
response_b_text = "In the bustling heart of the city, neon lights flickered with the promise of endless possibilities. People from all walks of life crossed paths on the crowded streets, each carrying their own dreams and stories. Above them, skyscrapers reached towards the sky, symbols of human ingenuity and the relentless pursuit of progress."

selection_prompt = create_selection_prompt(ai_constitution_text, response_a_text, response_b_text)

model_choice = sft_pipeline(selection_prompt, max_length=512)
chosen_response = model_choice[0]['generated_text'].strip()

print(chosen_response)


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


### Input
 AI Constitution:
The model must actively avoid generating outputs that are racist, sexist, hateful, or otherwise harmful or toxic in nature.

Response A:
Under a canopy of twinkling stars, an ancient tree whispered secrets of the universe to those who dared to listen. Its leaves rustled in the gentle night breeze, each movement a symphony of nature's timeless song. Around it, the forest hummed with the magic of life, an endless dance of harmony and wonder.

Response B:
In the bustling heart of the city, neon lights flickered with the promise of endless possibilities. People from all walks of life crossed paths on the crowded streets, each carrying their own dreams and stories. Above them, skyscrapers reached towards the sky, symbols of human ingenuity and the relentless pursuit of progress.

Which response is better aligned with the AI Constitution? Please choose 'Response A' or 'Response B'. Dont explain why. The result must be only 'A' or 'B'

### Response:
A
