## Setup

In [1]:
!pip install -U git+https://github.com/lvwerra/trl.git
!pip install -q -U datasets bitsandbytes einops wandb torch
# peft es una libreria para calcular la eficiencia de la aceleración de un modelo
!pip install -U git+https://github.com/huggingface/peft.git
# transformers es una libreria para entrenar y usar modelos de NLP
!pip install -U transformers
!pip install -U tokenizers
# sentencepiece es una libreria para tokenizar texto en subpalabras
!pip install -U seaborn
# accelerate es una libreria de huggingface para acelerar el entrenamiento de modelos de NLP en GPU y TPU
!pip install -U accelerate
!pip install -U evaluate
!pip install -U bitsandbytes
!pip install -U git+https://github.com/huggingface/huggingface_hub

Collecting git+https://github.com/lvwerra/trl.git
  Cloning https://github.com/lvwerra/trl.git to /tmp/pip-req-build-lbk2jq49
  Running command git clone --filter=blob:none --quiet https://github.com/lvwerra/trl.git /tmp/pip-req-build-lbk2jq49
  Resolved https://github.com/lvwerra/trl.git to commit 7fc970983c0c9bf154edf89b7be94a2b7348c972
  Preparing metadata (setup.py) ... [?25ldone
Collecting accelerate
  Downloading accelerate-0.22.0-py3-none-any.whl (251 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.2/251.2 kB[0m [31m34.9 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: trl
  Building wheel for trl (setup.py) ... [?25ldone
[?25h  Created wheel for trl: filename=trl-0.6.1.dev0-py3-none-any.whl size=110081 sha256=9ade200bc14c5ccee9740df3acfbf59ace4ff1baf56c2f6dcb3ce935a28c098a
  Stored in directory: /tmp/pip-ephem-wheel-cache-ym9_oq6x/wheels/ab/81/88/2e3ddd7591b397b560da92477ae2578b9b6f16f97a57ef49e1
Successfully built trl
Installing 

In [2]:
import transformers # transformers es de hugingface
from transformers import LlamaTokenizer, LlamaForCausalLM, AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig # LlamaTokenizer y LlamaForCausalLM son clases de transformers
import os # os es una libreria para interactuar con el sistema operativo
import sys # sys es una libreria para interactuar con el sistema operativo
import wandb
from peft import ( # peft es de hugingface
    LoraConfig, # LoraConfig es una clase de peft que contiene la configuración de Lora, Lora es un modelo de NLP que usa transformers y llama como tokenizer
    get_peft_model, # get_peft_model es una función de peft que obtiene el modelo de Lora
    get_peft_model_state_dict, # get_peft_model_state_dict es una función de peft que obtiene el estado del modelo de Lora
    prepare_model_for_kbit_training, # prepare_model_for_int8_training es una función de peft que prepara el modelo de Lora para el entrenamiento de int8
    PeftModel
)
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
import torch # torch es una libreria para entrenar y usar modelos de NLP
import datasets  # datasets es una libreria para cargar y procesar conjuntos de datos de NLP
import pandas as pd # pandas es una libreria para análisis de datos
from huggingface_hub import login # huggingface_hub es de hugingface
import matplotlib.pyplot as plt # matplotlib es una libreria para visualización de datos
import pandas as pd # pandas es una libreria para análisis de datos

In [26]:
OUTPUT_DIR = "/notebooks/puntos-de-control2" # directorio de salida de los experimentos
DEVICE = "cuda" if torch.cuda.is_available() else "cpu" #   se muestra si hay una GPU disponible, si no, se muestra la CPU
DEVICE

'cuda'

In [4]:
access_token = "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"
login(token=access_token)

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


## Cargar el modelo LLaMA

In [5]:
parameters="7b-chat"
BASE_MODEL = f"meta-llama/Llama-2-{parameters}-hf" # modelo base de llama de 7B de parámetros
# if there is a pretrained model, load it the model is Models_of_Llama/Llama_base
pathBase = f"Models_of_Llama/Llama_base_{parameters}"#base path

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

In [6]:
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    #torch_dtype=torch.float16,
    quantization_config=bnb_config,
    #load_in_8bit=True,
    device_map="auto"
)

Downloading (…)lve/main/config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

In [7]:
model.tie_weights()

In [8]:
mymodel='BrunoGR/LLaMA-2-7bChat-modified'
tokenizer =   AutoTokenizer.from_pretrained(mymodel)
tokenizer.pad_token = tokenizer.eos_token

Downloading (…)okenizer_config.json:   0%|          | 0.00/725 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.85M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [9]:
model.resize_token_embeddings(len(tokenizer.get_vocab()))

You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embeding dimension will be 32018. This might induce some performance reduction as *Tensor Cores* will not be available. For more details  about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc


Embedding(32018, 4096)

## Dataset

In [10]:
data= datasets.load_dataset("BrunoGR/Emo_support")

Downloading readme:   0%|          | 0.00/762 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/2.07M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/8.27M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/149k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating test split:   0%|          | 0/27445 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/112347 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/2001 [00:00<?, ? examples/s]

In [11]:
def generate_prompt(data_point):
#limita hasta 1024 tokens el texto
    text = data_point["texto"][:2460]

    return {'Prompt':f"Intruccion: Analiza la Consulta, y determina la Emocion.\n {text} \n### {data_point['etiqueta']} </s>"}



In [12]:
train_data = (data["train"].map(generate_prompt)) # se obtienen los datos de entrenamiento y se tokenizan con el tokenizer de llama

Map:   0%|          | 0/112347 [00:00<?, ? examples/s]

In [13]:
train_data['Prompt'][1]

'Intruccion: Analiza la Consulta, y determina la Emocion.\n Consulta: <  Vi una película que quería ver desde hace tiempo. > \n### Emocion: optimismo </s>'

In [14]:
print(f"Original:\n{train_data['Prompt'][1]}\ntokenizado{tokenizer.tokenize(train_data['Prompt'][1])}\n codificado:{tokenizer.encode(train_data['Prompt'][1])}")

Original:
Intruccion: Analiza la Consulta, y determina la Emocion.
 Consulta: <  Vi una película que quería ver desde hace tiempo. > 
### Emocion: optimismo </s>
tokenizado['▁Intruccion', ':', '▁Analiza', '▁la', '▁Consulta', ',', '▁y', '▁determ', 'ina', '▁la', '▁Emocion', '.', '<0x0A>', '▁Consulta', ':', '▁', '<', '▁▁', '▁Vi', '▁una', '▁película', '▁que', '▁quer', 'ía', '▁ver', '▁desde', '▁hace', '▁tiempo', '.', '▁', '>', '▁▁', '<0x0A>', '##', '#', '▁Emocion', ':', '▁optimismo', '▁', '</s>']
 codificado:[1, 32000, 29901, 32016, 425, 32003, 29892, 343, 3683, 1099, 425, 32002, 29889, 13, 32003, 29901, 29871, 29966, 259, 10630, 1185, 19053, 712, 22320, 1553, 1147, 5125, 20470, 13924, 29889, 29871, 29958, 259, 13, 2277, 29937, 32002, 29901, 32007, 29871, 2]


In [15]:
val_data = (data["validation"].map(generate_prompt))

Map:   0%|          | 0/2001 [00:00<?, ? examples/s]

In [16]:
model.config.pretraining_tp = 1
LoRA_TARGET_MODULES = [ # Esta lista especifica los módulos del modelo de lenguaje original que se adaptarán mediante la técnica LoRA
    "q_proj", # q_proj es la proyección de consulta
    "v_proj", # v_proj es la proyección de valor
]

LoRA_DROPOUT= 0.05
config = LoraConfig( # se configura el modelo de llama
    r=16, # indica el número de factores o dimensiones principales utilizados en la descomposición de las matrices de peso del modelo de lenguaje original.
    lora_alpha=32,
    target_modules=LoRA_TARGET_MODULES,
    lora_dropout=LoRA_DROPOUT,
    bias="none",
    task_type="CAUSAL_LM",
)
#model_train = get_peft_model(model_train, config) # se obtiene el modelo de llama
#model_train.print_trainable_parameters() # se muestran los parámetros entrenables del modelo

# Fine Tune
En esta seccion se hara un ajuste al modelo con los datos de Emo

## Argumentos del entrenamiento
En esta sección se configura el entrenamiento con la función de entrenamiento de transformers y se configuran los argumentos de entrenamiento

In [39]:
BATCH_SIZE = 128 # tamaño del batch, es decir, cuantos textos se procesan a la vez
MICRO_BATCH_SIZE = 8# tamaño del micro batch, es decir, cuantos textos se procesan a la vez en la GPU
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE # pasos de acumulación de gradientes
training_arguments = transformers.TrainingArguments( # se configuran los argumentos de entrenamiento
    per_device_train_batch_size=MICRO_BATCH_SIZE, # tamaño del micro batch
    gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS, # pasos de acumulación de gradientes
    warmup_steps=200, # pasos de calentamiento del entrenamiento
    num_train_epochs = 2, # epocas de entrenamiento que son 300
    learning_rate=5e-5, # tasa de aprendizaje
    adam_beta1=0.9, # betas de adam, se usa el mismo del paper de llama
    adam_beta2=0.95, # se usa el mismo del paper de llama
    adam_epsilon=1e-8, # se usa el mismo del paper de llama
    weight_decay=0.1,
    fp16=True, # se usa la precisión de 16 bits
    logging_steps=10, # pasos de logging
    optim="adamw_torch", # optimizador adamw, se usa el de torch
    evaluation_strategy="steps", # estrategia de evaluación
    save_strategy="steps", # estrategia de guardado
    eval_steps=100, # cada 50 pasos se evalúa el modelo
    save_steps=100, # cada 50 pasos se guarda el modelo
    output_dir="notebooks/Fine_tune/chkpoint", # directorio de salida
    save_total_limit=6, # límite de guardado3
    load_best_model_at_end=True, #se guarda el mejor modelo al final
    #report_to="wandb", # se reporta a tensorboard
    seed=1,
    lr_scheduler_type = "cosine",# tal y como dice en el paper de llama
    max_grad_norm = 1.0, # tal y como dice en el paper de llama
)

In [None]:
os.environ['WANDB_API_KEY'] = '4568bca12d8724d7cd88b0902226349c1d621364'

In [36]:
response_template_with_context = "\n### Emocion:"
instruction_template="Intruccion:"
response_template_ids = tokenizer.encode(response_template_with_context, add_special_tokens=False)[2:]  # Now we have it like in the dataset texts: `[2277, 29937, 4007, 22137, 29901]`
print(response_template_ids)
collator = DataCollatorForCompletionOnlyLM(response_template_ids,instruction_template, tokenizer=tokenizer)


[2277, 29937, 32002, 29901]


In [40]:
trainer = SFTTrainer(
    model=model,
    data_collator= collator,
    train_dataset=train_data,
    eval_dataset =val_data,
    peft_config=config,
    dataset_text_field="Prompt",
    max_seq_length=1024,
    tokenizer=tokenizer,
    args=training_arguments,
)

In [42]:
trainer.train(resume_from_checkpoint=False) # se entrena el modelo
trainer.save_model("/content/gdrive/MyDrive/Tesis/Modelos/LLaMA2_models/2nd_EmoLLama2") # se guarda el modelo

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


Step,Training Loss,Validation Loss
