<a href="https://colab.research.google.com/github/doraemonidol/NLP/blob/main/fine_tuning_and_testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Are we running in Colab
try:
  from google.colab import drive
  IN_COLAB=True
except:
  IN_COLAB=False

if IN_COLAB:
  print("We're running Colab")

We're running Colab


In [2]:
if IN_COLAB:
  # Mount the Google Drive at mount
  mount='/content/gdrive'
  print("Colab: mounting Google drive on ", mount)

  drive.mount(mount)

  # Switch to the directory on the Google Drive that you want to use
  import os
  drive_root = mount + "/My Drive/Colab Notebooks/Llama2-Translation/"

  # Create drive_root if it doesn't exist
  create_drive_root = True
  if create_drive_root:
    print("\nColab: making sure ", drive_root, " exists.")
    os.makedirs(drive_root, exist_ok=True)

  # Change to the directory
  print("\nColab: Changing directory to ", drive_root)
  %cd $drive_root

Colab: mounting Google drive on  /content/gdrive
Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).

Colab: making sure  /content/gdrive/My Drive/Colab Notebooks/Llama2-Translation/  exists.

Colab: Changing directory to  /content/gdrive/My Drive/Colab Notebooks/Llama2-Translation/
/content/gdrive/My Drive/Colab Notebooks/Llama2-Translation


In [3]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# SetUp Directory

In [4]:
%pwd

'/content/gdrive/My Drive/Colab Notebooks/Llama2-Translation'

In [5]:
# %cd ..

In [6]:
%ls

test.csv  train.csv  validation.csv  [0m[01;34mworking[0m/


In [7]:
# %mkdir working/results/
![ ! -d working/results/ ] && mkdir -p working/results/

# Necessary Installs and Imports

## Installs

In [8]:
%pip install -U datasets transformers trl accelerate peft bitsandbytes



## HuggingFace SetUp

In [9]:
# from huggingface_hub import notebook_login
# notebook_login()

In [10]:
# !python -c "from huggingface_hub.hf_api import HfFolder; HfFolder.save_token('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')"

## Imports

In [11]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model, PeftModel
from datasets import load_dataset
from trl import SFTTrainer, SFTConfig
import torch

# Model SetUp

In [12]:
model_name = "meta-llama/Llama-3.2-1B"

compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=compute_dtype,
            bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(model_name,quantization_config=bnb_config, device_map={"": 0})
model = prepare_model_for_kbit_training(model)

## Tokenizer

In [13]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, add_eos_token=True)
tokenizer.pad_token = tokenizer.unk_token
tokenizer.padding_side = "left"
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
model.resize_token_embeddings(len(tokenizer))

The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`


Embedding(128257, 2048)

## Load Model

In [14]:
train = pd.read_csv('train.csv', header = 0)
test = pd.read_csv('test.csv', header = 0)
validation = pd.read_csv('validation.csv', header = 0)

from datasets import Dataset, DatasetDict

train_dataset = Dataset.from_pandas(train)
test_dataset = Dataset.from_pandas(test)
validation_dataset = Dataset.from_pandas(validation)

dataset = DatasetDict({'train': train_dataset, 'test': test_dataset, 'validation': validation_dataset})

In [15]:
dataset

DatasetDict({
    train: Dataset({
        features: ['translation'],
        num_rows: 26036
    })
    test: Dataset({
        features: ['translation'],
        num_rows: 136
    })
    validation: Dataset({
        features: ['translation'],
        num_rows: 6509
    })
})

In [16]:
print(dataset['validation']['translation'][:5])

['短 長 枯 樹 枝 頭 淚 ###>nhánh cây khô giọt lệ ngắn dài', '縱 壑 之 而 諸 穴 窟 ###>cá tung tăng trong hốc lên xuống nơi các hang', '制 外 困 安 民 息 盜 令 人 千 载 佩 威 風 ###>chống giặc đến yên dân dẹp trộm uy phong nhiều ngàn thuở cùng người', '翰 墨 寧 期 有 美 姻 ###>được nên duyên đẹp bút nghiên mừng', '使 汝 不 知 鳧 鶩 肥 ###>mà để cho mày chẳng béo như mòng vịt']


# LoRA Configuration

In [17]:
peft_config = LoraConfig(
            lora_alpha=16,
            lora_dropout=0.05,
            r=16,
            bias="none",
            task_type="CAUSAL_LM",
            target_modules= ["down_proj","up_proj","gate_proj"]
)

In [18]:
# peft_config = LoraConfig(
#             lora_alpha=16,
#             lora_dropout=0.05,
#             r=64,
#             bias="none",
#             task_type="CAUSAL_LM",
#             target_modules= ["q_proj","up_proj","o_proj","k_proj","down_proj","gate_proj","v_proj"]
# )

# Training Hyperparameters

In [19]:
training_arguments = SFTConfig(
        output_dir="working/results/",
        evaluation_strategy="steps",
        optim="paged_adamw_8bit",
        save_steps=100,
        log_level="debug",
        logging_steps=100,
        learning_rate=1e-4,
        eval_steps=100,
        fp16=True,
        do_eval=True,
        per_device_train_batch_size=48,
        per_device_eval_batch_size=48,
        gradient_accumulation_steps=2,
        warmup_steps=50,
        max_steps=500,
        max_seq_length=48, # Increased max_seq_length
        dataset_text_field="translation",
        lr_scheduler_type="linear",
        report_to=None
)



# Training with TRL

In [20]:
!nvidia-smi

Sat Dec 28 11:46:08 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   44C    P0              27W /  70W |   4151MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [21]:
import wandb
wandb.init(project='test', entity='vinai_pbdang-org', mode="disabled")
# wandb.init(project='test', entity='vinai_pbdang-org', settings=wandb.Settings(init_timeout=600))

In [22]:
# import wandb
# wandb.init(project="test", settings=wandb.Settings(init_timeout=600))

trainer = SFTTrainer(
        model=model,
        train_dataset=dataset['train'],
        eval_dataset=dataset['validation'],
        peft_config=peft_config,
        tokenizer=tokenizer,
        args=training_arguments
)
trainer.train()

  trainer = SFTTrainer(


Map:   0%|          | 0/26036 [00:00<?, ? examples/s]

Map:   0%|          | 0/6509 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs
Using auto half precision backend
Currently training with a batch size of: 48
***** Running training *****
  Num examples = 26,036
  Num Epochs = 2
  Instantaneous batch size per device = 48
  Total train batch size (w. parallel, distributed & accumulation) = 96
  Gradient Accumulation steps = 2
  Total optimization steps = 500
  Number of trainable parameters = 7,864,320
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  return fn(*args, **kwargs)


Step,Training Loss,Validation Loss
100,4.7362,3.794999
200,3.65,3.54371
300,3.4589,3.438057



***** Running Evaluation *****
  Num examples = 6509
  Batch size = 48
Saving model checkpoint to working/results/checkpoint-100
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-3.2-1B/snapshots/4e20de362430cd3b72f300e6b0f18e50e7166e08/config.json
Model config LlamaConfig {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "head_dim": 64,
  "hidden_act": "silu",
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 16,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 32.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  

KeyboardInterrupt: 

# Inference: Translate with Llama 2

## Base Model SetUp

In [23]:
base_model = "meta-llama/Llama-3.2-1B"
compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
        base_model, device_map={"": 0}, quantization_config=bnb_config
)
tokenizer = AutoTokenizer.from_pretrained(base_model, use_fast=True)
tokenizer.pad_token = tokenizer.unk_token
tokenizer.padding_side = "left"
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
model.resize_token_embeddings(len(tokenizer))

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-3.2-1B/snapshots/4e20de362430cd3b72f300e6b0f18e50e7166e08/config.json
Model config LlamaConfig {
  "_name_or_path": "meta-llama/Llama-3.2-1B",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "head_dim": 64,
  "hidden_act": "silu",
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 16,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 32.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat

Embedding(128257, 2048)

## Initialize Adapter (Fine-Tuned-Model)

In [24]:
# Fetched from Kaggle Output
model = PeftModel.from_pretrained(model, "working/results/checkpoint-300/")

In [None]:
# Uploaded to Hugging Face Model Hub
# model = PeftModel.from_pretrained(model, "musfiqdehan/Llama-2-7b-ft-mt-Bengali-to-English-sm")

# Testing Manually

In [39]:
my_text = "花好月圓"
# my_text = m
my_text = ' '.join(list(my_text))
print(my_text)
# Ánh sáng không thua nhật nguyệt, soi khắp trời Nam, như phượng múa lân chầu, càng tăng thêm gấm vóc;
# Quê hương lừng lẫy danh hiền, thuần phong Mỹ tục vững bền giang sơn.
# Trải ba mùa lại tới mùa xuân khí ấm tới thì hoa tự khai suy lường máy khí số phải nên cảnh giác

prompt = my_text+" ###>"

tokenized_input = tokenizer(prompt, return_tensors="pt")
input_ids = tokenized_input["input_ids"].cuda()

generation_output = model.generate(
        input_ids=input_ids,
        num_beams=6,
        return_dict_in_generate=True,
        output_scores=True,
        max_new_tokens=130
)
for seq in generation_output.sequences:
    output = tokenizer.decode(seq, skip_special_tokens=True)
    print(output.split("###>")[1].strip())

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


花 好 月 圓
hoa đẹp trăng tròn rực rỡ như ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc ngọc
