<a href="https://colab.research.google.com/github/FrancoArtico/question-generator-LLM/blob/main/Fine_Tunning_%7C_LLaMa_3_1_8B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Medium tutorial : https://medium.com/@rschaeffer23/how-to-fine-tune-llama-3-1-8b-instruct-bf0a84af7795

### Instalaciones

In [None]:
!pip install accelerate peft bitsandbytes transformers trl

Collecting bitsandbytes
  Downloading bitsandbytes-0.44.1-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Collecting trl
  Downloading trl-0.12.0-py3-none-any.whl.metadata (10 kB)
Collecting datasets>=2.21.0 (from trl)
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting transformers
  Downloading transformers-4.46.2-py3-none-any.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.1/44.1 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tokenizers<0.21,>=0.20 (from transformers)
  Downloading tokenizers-0.20.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets>=2.21.0->trl)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets>=2.21.0->trl)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets>=2.21.0->trl)
  D

### Importaciones

In [None]:
import torch
from datasets import load_dataset, Dataset
from peft import LoraConfig, AutoPeftModelForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments
from trl import SFTTrainer
import os

In [None]:
from google.colab import userdata, drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
from huggingface_hub import notebook_login
notebook_login()

#from google.colab import userdata
#userdata.get('HUGGINGFACE_ACCESS_TOKEN')

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Preprocesamiento del conjunto de datos

In [None]:
def preprocesamiento(dataset):
  # Eliminar la columna que incluye las respuestas
  columnas_a_eliminar = ['id','answers', 'title']
  dataset = dataset.remove_columns([col for col in columnas_a_eliminar if col in dataset.column_names])

  # Filtrar los artículos que contienen "Biografía"
  dataset = dataset.filter(lambda fila: fila["context"].startswith("Biografía"))

  return dataset

In [None]:
def format_prompt(example):
    example['formatted_text'] = "{'prompt': Genera una pregunta en base al siguiente texto " + example['context']+ ", 'completion': " + example['question'] + "}"
    return  example


In [None]:
dataset = load_dataset("/content/drive/MyDrive/SQAC", split="train")
dataset_entrenamiento = preprocesamiento(dataset)
formatted_dataset = dataset_entrenamiento.map(format_prompt)
print(formatted_dataset)

The repository for SQAC contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/SQAC.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


train.json:   0%|          | 0.00/11.0M [00:00<?, ?B/s]

dev.json:   0%|          | 0.00/1.40M [00:00<?, ?B/s]

test.json:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Filter:   0%|          | 0/15036 [00:00<?, ? examples/s]

Map:   0%|          | 0/156 [00:00<?, ? examples/s]

Dataset({
    features: ['context', 'question', 'formatted_text'],
    num_rows: 156
})


In [None]:
print(formatted_dataset['formatted_text'][0])

{'prompt': Genera una pregunta en base al siguiente texto Biografía 
La noble familia armenia Curcuas, –cuyo apellido original, Gurgen (en armenio, Գուրգեն), fue helenizado a Curcuas– se labró un puesto destacado en la aristocracia militar y latifundista de la Anatolia del siglo IX. Junto con Juan Curcuas, surgieron personajes que destacaron dentro de la política bizantina, como su homónimo abuelo, quien había sido comandante del selecto tagma de hikanatoi durante el reinado de Basilio I (867-886), su hermano Teófilo Curcuas, que descolló como general, su propio hijo, Romano Curcuas, y su sobrino nieto, Juan Tzimisces, que reinó como emperador entre 969 y 976., 'completion': ¿Cuál era el apellido de la familia Curcuas antes de ser helenizado?}


## Cargando el modelo

In [None]:
# Obtener el modelo
def obtener_modelo_y_tokenizer(model_id):
  tokenizer = AutoTokenizer.from_pretrained(model_id)
  tokenizer.pad_token = tokenizer.eos_token

  bnb_config = BitsAndBytesConfig(
      load_in_4bit=True,
      bnb_4bit_compute_dtype=torch.bfloat16,
      bnb_4bit_quant_type="nf4",
      bnb_4bit_use_double_quant=True
      #bnb_4bit_compute_dtype="float16",
  )

  model = AutoModelForCausalLM.from_pretrained(
      model_id, quantization_config=bnb_config, device_map="auto"
  )

  model.config.use_cache=False
  model.config.pretraining_tp=1

  return model, tokenizer

In [None]:
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
model, tokenizer = obtener_modelo_y_tokenizer(model_id)

tokenizer_config.json:   0%|          | 0.00/55.4k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/855 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

## Generando una respuesta de prueba

In [None]:
def formatted_prompt(prompt)-> str:
    return f'''<|begin_of_text|><|start_header_id|>user<|end_header_id|>
     {prompt}<|eot_id|>
     <|start_header_id|>assistant<|end_header_id|>'''

In [None]:
from transformers import GenerationConfig

def generar_respuesta(pregunta):
  prompt = formatted_prompt(pregunta)
  generation_config = GenerationConfig(
      max_new_tokens=60,
      pad_token_id=tokenizer.eos_token_id
  )
  inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
  outputs = model.generate(**inputs, generation_config=generation_config)
  respuesta = (tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)) # true no genera token especiales
  return respuesta

In [None]:
generar_respuesta("¿Qué versión eres?")

'\nSoy un modelo de inteligencia artificial de lenguaje desarrollado por Meta AI, lo que significa que puedo procesar y generar texto en base a la información que me es proporcionada.'

## Entrenando el modelo

In [None]:
output_model="llama3.1-8B-Instruct-Fine-tuned-to-generate-spanish-questions"

In [None]:
# Función para formatear datos de entrenamiento
#def formatted_train(input,response)->str:
#    return f'''
#    <|begin_of_text|><|start_header_id|>user<|end_header_id|>
#    {input}<|eot_id|>
#   <|start_header_id|>assistant<|end_header_id|>
#    {response}<eot_id|>'''

In [None]:
#def formatting_prompts_func(dataset):
#    output_texts = []
#    for i in range(len(dataset['context'])):
#        text = f"### Question: {dataset['context'][i]}\n ### Answer: {dataset['question'][i]}"
#        output_texts.append(text)
#    return output_texts

In [None]:
#import pandas as pd
#def prepare_train_datav2(data):
    # Convert the data to a Pandas DataFrame
#    data_df = pd.DataFrame(data)
#    print("1\n")
#    print(data_df)
    # Create a new column called "text"
#    data_df["text"] = data_df[["prompt", "response"]].apply(lambda x: "<|im_start|>user\n" + x["prompt"] + " <|im_end|>\n<|im_start|>assistant\n" + x["response"] + "<|im_end|>\n", axis=1)
#    print(data_df)

    # Create a new Dataset from the DataFrame
#    data = Dataset.from_pandas(data_df)
#    print("3\n")
#    print(data)
#    return data

In [None]:
#datos = prepare_train_datav2(dataset_entrenamiento)


In [None]:
#Next we define the settings for our training utilizing LoRa:
peft_config = LoraConfig(
        r=8, lora_alpha=16, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM"
    )

In [None]:
training_arguments = TrainingArguments(
        output_dir=output_model,
        report_to=None, # este parametro es para el reporte de metricas, de momento desactivado(ESTUDIAR)
        per_device_train_batch_size=1,
        gradient_accumulation_steps=16,
        optim="paged_adamw_32bit",
        learning_rate=2e-4,
        lr_scheduler_type="cosine",
        save_strategy="epoch",
        logging_steps=10,
        num_train_epochs=3,
        max_steps=250,
        fp16=True,
        push_to_hub=True
    )

In [None]:
trainer = SFTTrainer(
        model=model,
        train_dataset=formatted_dataset,
        peft_config=peft_config,
        dataset_text_field="formatted_text",
        args=training_arguments,
        tokenizer=tokenizer,
        packing=False,
        max_seq_length=1024
    )


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/156 [00:00<?, ? examples/s]

  self.scaler = torch.cuda.amp.GradScaler(**kwargs)
max_steps is given, it will override any value given in num_train_epochs


In [None]:
#Then we finally get to train/tune the model!
trainer.train()


[34m[1mwandb[0m: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Step,Training Loss
10,1.9443
20,1.6387
30,1.5063
40,1.3952
50,1.2181
60,1.024
70,0.8723
80,0.7114
90,0.5539
100,0.4146


TrainOutput(global_step=250, training_loss=0.5260978257656097, metrics={'train_runtime': 5703.1129, 'train_samples_per_second': 0.701, 'train_steps_per_second': 0.044, 'total_flos': 5.088180725091533e+16, 'train_loss': 0.5260978257656097, 'epoch': 25.641025641025642})

**wandb api key** : fab8c2589569a83a9b3787df9650a2d074ca1556

---



---



## Probando el modelo entrenado

In [None]:
!pip install accelerate peft bitsandbytes transformers trl



In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
# Installing More Dependencies
!pip install datasets
import torch
from datasets import load_dataset, Dataset
from peft import LoraConfig, AutoPeftModelForCausalLM
from transformers import pipeline
from trl import SFTTrainer
import os



In [None]:
model_id='Doberfran/llama3.1-8B-Instruct-Fine-tuned-to-generate-spanish-questions'

In [None]:
def get_model_and_tokenizer(model_id):
  tokenizer = AutoTokenizer.from_pretrained(model_id)
  tokenizer.pad_token = tokenizer.eos_token
  bnb_config = BitsAndBytesConfig(
      load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="float16", bnb_4bit_use_double_quant=True
  )
  model = AutoModelForCausalLM.from_pretrained(
      model_id, quantization_config=bnb_config, device_map="auto"
  )
  model.config.use_cache=False
  model.config.pretraining_tp=1
  return model, tokenizer

In [None]:
model, tokenizer = get_model_and_tokenizer(model_id)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/55.4k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/325 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/855 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/13.6M [00:00<?, ?B/s]

In [None]:
from transformers import GenerationConfig
from time import perf_counter
def generate_response(user_input):
  inputs = tokenizer([prompt], return_tensors="pt")
  generation_config = GenerationConfig(penalty_alpha=0.6,do_sample = True,
      top_k=5,temperature=0.5,repetition_penalty=1.2,
      max_new_tokens=60,pad_token_id=tokenizer.eos_token_id
  )
  start_time = perf_counter()
  inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
  outputs = model.generate(**inputs, generation_config=generation_config)
  theresponse = (tokenizer.decode(outputs[0], skip_special_tokens=True))
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
  output_time = perf_counter() - start_time
  print(f"Time taken for inference: {round(output_time,2)} seconds")


In [None]:
!pip install wikipedia

import wikipedia

# configurar librería a español
wikipedia.set_lang("es")

# importar artículos
godel = wikipedia.page("Kurt Gödel").content

model = pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.float16,
                  "quantization_config": {"load_in_4bit": True}},
)



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


adapter_config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/855 [00:00<?, ?B/s]

`low_cpu_mem_usage` was None, now default to True since model is quantized.


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/55.4k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

In [None]:
prompt = [
    {"role":"system", "content": "Eres un generador de preguntas de respuesta corta y de tipo verdadero falso para lectura comprensiva, sin incluir las repsuestas"},
    {"role":"user", "content": "Haceme 10 preguntas de lectura comprensiva de tipo respuesta corta, sobre el siguiente texto:" + godel},
    ]

output = model(
    prompt,
    max_new_tokens=512,
    do_sample=False
    )

assistant_response = output[0]['generated_text'][-1]["content"]
print(assistant_response)


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


1. ¿En qué año nació Kurt Friedrich Gödel?
2. ¿Qué disciplinas estudió Kurt Friedrich Gödel en la Universidad de Viena?
3. ¿Qué teorema demostró Kurt Friedrich Gödel en 1931?
4. ¿Qué título recibió Kurt Friedrich Gödel en 1933?
5. ¿Qué año emigró Kurt Friedrich Gödel a los Estados Unidos?
6. ¿Qué universo construyó Kurt Friedrich Gödel en 1940?
7. ¿Qué premio recibió Kurt Friedrich Gödel en 1951?
8. ¿Qué año se convirtió Kurt Friedrich Gödel en profesor emérito?
9. ¿Qué enfermedad mental sufrió Kurt Friedrich Gödel en sus últimos años?
10. ¿Qué película de 2023 incluye a Kurt Friedrich Gödel como personaje secundario?
