<a href="https://www.kaggle.com/code/aisuko/ft-smollm-135m-instruct-on-hf-ultrafeedback?scriptVersionId=192965113" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
!pip install -U -q transformers==4.39.3
!pip install -U -q accelerate==0.28.0
!pip install -U -q datasets==2.18.0
# !pip install -U -q peft==0.10.0
!pip install -U -q bitsandbytes==0.43.1
!pip install -U -q trl==0.8.6

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf 24.6.1 requires cubinlinker, which is not installed.
cudf 24.6.1 requires cupy-cuda11x>=12.0.0, which is not installed.
cudf 24.6.1 requires ptxcompiler, which is not installed.
cuml 24.6.1 requires cupy-cuda11x>=12.0.0, which is not installed.
dask-cudf 24.6.1 requires cupy-cuda11x>=12.0.0, which is not installed.
cudf 24.6.1 requires cuda-python<12.0a0,>=11.7.1, but you have cuda-python 12.5.0 which is incompatible.
distributed 2024.5.1 requires dask==2024.5.1, but you have dask 2024.7.0 which is incompatible.
gcsfs 2024.5.0 requires fsspec==2024.5.0, but you have fsspec 2024.2.0 which is incompatible.
rapids-dask-dependency 24.6.0a0 requires dask==2024.5.1, but you have dask 2024.7.0 which is incompatible.
s3fs 2024.5.0 requires fsspec==2024.5.0.*, but you have fsspec 2024.2.0 which is incom

In [2]:
import warnings

warnings.filterwarnings("ignore")

In [3]:
import os
from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
login(token=user_secrets.get_secret("HUGGINGFACE_TOKEN"))

os.environ["WANDB_API_KEY"]=user_secrets.get_secret("WANDB_API_KEY")
os.environ["WANDB_PROJECT"] = "Fine-tuning HuggingFace SmolLM-135M-Instruct with ultrafeedback"
os.environ["WANDB_NAME"] = "ft-smollm-135M-instruct-on-hf-ultrafeedback"
os.environ["MODEL_NAME"] = "HuggingFaceTB/SmolLM-135M-Instruct"
os.environ["TOKENIZER_NAME"] = "HuggingFaceTB/SmolLM-135M-Instruct"
os.environ["DATASET"] = "HuggingFaceH4/ultrafeedback_binarized"

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [4]:
from datasets import load_dataset

ds=load_dataset(os.getenv("DATASET"), split=["train_prefs","test_prefs"])
ds

Downloading readme:   0%|          | 0.00/6.77k [00:00<?, ?B/s]

Downloading data: 100%|██████████| 226M/226M [00:09<00:00, 24.6MB/s]
Downloading data: 100%|██████████| 226M/226M [00:35<00:00, 6.35MB/s]
Downloading data: 100%|██████████| 7.29M/7.29M [00:00<00:00, 10.2MB/s]
Downloading data: 100%|██████████| 3.72M/3.72M [00:00<00:00, 7.61MB/s]
Downloading data: 100%|██████████| 184M/184M [00:07<00:00, 24.5MB/s]
Downloading data: 100%|██████████| 3.02M/3.02M [00:02<00:00, 1.45MB/s]


Generating train_prefs split:   0%|          | 0/61135 [00:00<?, ? examples/s]

Generating train_sft split:   0%|          | 0/61135 [00:00<?, ? examples/s]

Generating test_prefs split:   0%|          | 0/2000 [00:00<?, ? examples/s]

Generating test_sft split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Generating train_gen split:   0%|          | 0/61135 [00:00<?, ? examples/s]

Generating test_gen split:   0%|          | 0/1000 [00:00<?, ? examples/s]

[Dataset({
     features: ['prompt', 'prompt_id', 'chosen', 'rejected', 'messages', 'score_chosen', 'score_rejected'],
     num_rows: 61135
 }),
 Dataset({
     features: ['prompt', 'prompt_id', 'chosen', 'rejected', 'messages', 'score_chosen', 'score_rejected'],
     num_rows: 2000
 })]

In [5]:
train_ds=ds[0].shuffle(seed=42).select(range(35000))
eval_ds=ds[1].shuffle(seed=42).select(range(1000))

print(train_ds)
print(eval_ds)

Dataset({
    features: ['prompt', 'prompt_id', 'chosen', 'rejected', 'messages', 'score_chosen', 'score_rejected'],
    num_rows: 35000
})
Dataset({
    features: ['prompt', 'prompt_id', 'chosen', 'rejected', 'messages', 'score_chosen', 'score_rejected'],
    num_rows: 1000
})


In [6]:
from transformers import AutoTokenizer

tokenizer=AutoTokenizer.from_pretrained(os.getenv("TOKENIZER_NAME"))

# decoder-only architecture, no need eos token
tokenizer.pad_token=tokenizer.eos_token

tokenizer_config.json:   0%|          | 0.00/3.59k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/801k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.10M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/655 [00:00<?, ?B/s]

In [7]:
import torch, multiprocessing

def preprocess(x):
    x["chosen"]=tokenizer.apply_chat_template(x["chosen"], tokenize=False)
    x["rejected"]=tokenizer.apply_chat_template(x["rejected"], tokenize=False)
    return x

train_ds=train_ds.map(preprocess, num_proc=multiprocessing.cpu_count(), load_from_cache_file=False)
eval_ds=eval_ds.map(preprocess, num_proc=multiprocessing.cpu_count(), load_from_cache_file=False)

Map (num_proc=4):   0%|          | 0/35000 [00:00<?, ? examples/s]

Map (num_proc=4):   0%|          | 0/1000 [00:00<?, ? examples/s]

In [8]:
from transformers import AutoModelForCausalLM
model=AutoModelForCausalLM.from_pretrained(
    os.getenv("MODEL_NAME"),
    torch_dtype=torch.float16,
#     device_map={"": 0},
    device_map="cuda",
    trust_remote_code=True
)

model.gradient_checkpointing_enable()
model.device

config.json:   0%|          | 0.00/735 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/269M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

device(type='cuda', index=0)

In [9]:
from trl import ORPOTrainer, ORPOConfig

orpo_config=ORPOConfig(
    output_dir=os.getenv("WANDB_NAME"),
    evaluation_strategy="steps",
    do_eval=True,
    optim="adamw_8bit",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=2,
    per_device_eval_batch_size=4,
    log_level="debug",
    logging_steps=100,
    learning_rate=3e-4,
    eval_steps=100,
    save_steps=100,
    save_strategy="epoch",
    num_train_epochs=1,
    warmup_ratio=0.1,
    lr_scheduler_type="linear",
    beta=0.1, # beta is ORPO's lambda
    max_length=1024,
    report_to="tensorboard",
    run_name=os.getenv('WANDB_NAME')
)

trainer = ORPOTrainer(
        model=model,
        train_dataset=train_ds,
        eval_dataset=eval_ds,
        args=orpo_config,
        tokenizer=tokenizer,
)

trainer.train()

2024-08-17 11:33:46.937257: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-17 11:33:46.937367: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-17 11:33:47.054199: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Map:   0%|          | 0/35000 [00:00<?, ? examples/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (3109 > 2048). Running this sequence through the model will result in indexing errors


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

You have loaded a model on multiple GPUs. `is_model_parallel` attribute will be force-set to `True` to avoid any unexpected behavior such as device placement mismatching.
Currently training with a batch size of: 4
***** Running training *****
  Num examples = 35,000
  Num Epochs = 1
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 2
  Total optimization steps = 4,375
  Number of trainable parameters = 134,515,008
Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss,Validation Loss,Runtime,Samples Per Second,Steps Per Second,Rewards/chosen,Rewards/rejected,Rewards/accuracies,Rewards/margins,Logps/rejected,Logps/chosen,Logits/rejected,Logits/chosen,Nll Loss,Log Odds Ratio,Log Odds Chosen
100,2.2684,1.125755,126.0759,7.932,1.983,-0.130074,-0.130179,0.468,0.000105,-1.301792,-1.30074,17.883713,17.778275,1.05141,-0.743454,0.008248
200,1.1427,1.138304,126.189,7.925,1.981,-0.129507,-0.129539,0.474,3.2e-05,-1.295389,-1.295073,28.967312,28.610441,1.063348,-0.749561,0.011674
300,1.135,1.130501,126.2705,7.92,1.98,-0.12897,-0.128756,0.464,-0.000214,-1.287559,-1.289698,32.89053,32.529854,1.054724,-0.75777,0.011712
400,1.15,1.135388,126.3526,7.914,1.979,-0.130294,-0.12969,0.462,-0.000604,-1.296897,-1.302939,35.126678,34.745575,1.059154,-0.762345,0.007326
500,1.1138,1.134546,126.4571,7.908,1.977,-0.131099,-0.130889,0.455,-0.00021,-1.308889,-1.310988,36.930759,36.574543,1.058839,-0.75707,0.014787
600,1.1617,1.136389,126.3699,7.913,1.978,-0.131171,-0.130859,0.466,-0.000312,-1.308595,-1.311713,38.410069,38.066883,1.060192,-0.761974,0.020426
700,1.136,1.134092,126.4339,7.909,1.977,-0.131852,-0.131376,0.461,-0.000476,-1.31376,-1.318518,40.197121,39.832649,1.058078,-0.760139,0.014524
800,1.155,1.134862,126.467,7.907,1.977,-0.13188,-0.131369,0.462,-0.000511,-1.31369,-1.318805,41.281212,40.944908,1.058808,-0.760542,0.015279
900,1.185,1.153327,126.4864,7.906,1.976,-0.133869,-0.133051,0.457,-0.000818,-1.330508,-1.338689,42.593811,42.306679,1.076638,-0.766889,0.017125
1000,1.1612,1.124505,126.5859,7.9,1.975,-0.130972,-0.130105,0.455,-0.000867,-1.30105,-1.309716,43.618687,43.303822,1.048016,-0.764893,0.011102


***** Running Evaluation *****
  Num examples = 1000
  Batch size = 4
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 4
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 4
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 4
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 4
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 4
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 4
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 4
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 4
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 4
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 4
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 4
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 4
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 4
***** Running Evalua

TrainOutput(global_step=4375, training_loss=1.1421776419503349, metrics={'train_runtime': 22450.0321, 'train_samples_per_second': 1.559, 'train_steps_per_second': 0.195, 'total_flos': 0.0, 'train_loss': 1.1421776419503349, 'epoch': 1.0})

In [10]:
kwargs={
    'model_name': os.getenv("WANDB_NAME"),
    'finetuned_from': os.getenv('MODEL_NAME'),
#     'tasks': 'Text-Generation',
#     'dataset_tags':'',
    'dataset': os.getenv("DATASET")
}

tokenizer.push_to_hub(os.getenv("WANDB_NAME"))
trainer.push_to_hub(**kwargs)

README.md:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

tokenizer config file saved in ft-smollm-135M-instruct-on-hf-ultrafeedback/tokenizer_config.json
Special tokens file saved in ft-smollm-135M-instruct-on-hf-ultrafeedback/special_tokens_map.json
Uploading the following files to aisuko/ft-smollm-135M-instruct-on-hf-ultrafeedback: tokenizer_config.json,README.md,special_tokens_map.json,vocab.json,tokenizer.json,merges.txt
Saving model checkpoint to ft-smollm-135M-instruct-on-hf-ultrafeedback
Configuration saved in ft-smollm-135M-instruct-on-hf-ultrafeedback/config.json
Configuration saved in ft-smollm-135M-instruct-on-hf-ultrafeedback/generation_config.json
Model weights saved in ft-smollm-135M-instruct-on-hf-ultrafeedback/model.safetensors
tokenizer config file saved in ft-smollm-135M-instruct-on-hf-ultrafeedback/tokenizer_config.json
Special tokens file saved in ft-smollm-135M-instruct-on-hf-ultrafeedback/special_tokens_map.json
Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Languag

Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

events.out.tfevents.1723894655.0cfb6be9c987.24.0:   0%|          | 0.00/81.5k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/269M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.30k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/aisuko/ft-smollm-135M-instruct-on-hf-ultrafeedback/commit/167144aed08259cd07706475aa0a08f7b75ee4e9', commit_message='End of training', commit_description='', oid='167144aed08259cd07706475aa0a08f7b75ee4e9', pr_url=None, pr_revision=None, pr_num=None)

In [11]:
model=AutoModelForCausalLM.from_pretrained(
    os.getenv("WANDB_NAME"), 
    torch_dtype=torch.float16, 
    device_map="cuda", 
    trust_remote_code=True
)

chat=[
    [{"role":"user","content":"How is vanilla cultivated?"}],
    [{"role": "user", "content": "How much money do I have if I have one dollar?"}],
    [{"role": "user", "content": "Where is Berlin?"}],
    [{"role": "user", "content": "Give me a list of 5 European countries."}],
    [{"role": "user", "content": "What is AI?"}],
    [{"role": "user", "content": "What can you do right? Exactly?"}]
]


for c in chat:
    p=tokenizer.apply_chat_template(c, tokenize=False)
    inputs = tokenizer(p, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, do_sample=True, pad_token_id=tokenizer.eos_token_id, top_p=0.9, max_new_tokens=150)
    result = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(result)

loading configuration file ft-smollm-135M-instruct-on-hf-ultrafeedback/config.json
Model config LlamaConfig {
  "_name_or_path": "ft-smollm-135M-instruct-on-hf-ultrafeedback",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 576,
  "initializer_range": 0.02,
  "intermediate_size": 1536,
  "max_position_embeddings": 2048,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 9,
  "num_hidden_layers": 30,
  "num_key_value_heads": 3,
  "pad_token_id": 2,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": true,
  "torch_dtype": "float16",
  "transformers_version": "4.39.3",
  "use_cache": true,
  "vocab_size": 49152
}

loading weights file ft-smollm-135M-instruct-on-hf-ultrafeedback/model.safetensors
Instantiating LlamaForCausalLM model under default dtype torch.float

user
How is vanilla cultivated?
assistant
Monetization is a term used to describe the method of producing something by treating it like a tangible object or artifact. In the context of vanilla, it's a creative and innovative process that involves the use of a special flavor spice called vanilla extract, which has been brewed, mixed, and then filtered to produce a natural and creamy sensation.

The process of cultivating vanilla is known as the "rolling" or "rolling" of vanilla, as it involves rolling the spice, which is often known for its unique flavor, into a ball, which is then rolled into a tinyball. The process is similar to the rolling of tea, where a powdered tea leaves are combined with a lot of spices and other ingredients to create a
user
How much money do I have if I have one dollar?
assistant
If you have one dollar, you don't have any money, and there's no cash to generate income. All you can earn is a direct payment from your bank account or another type of credit card. It

# Helpful papers

* https://arxiv.org/abs/2403.07691