# Overview

Here are some backgound information, see from [GitHub discussion](https://github.com/orgs/SkywardAI/discussions/14).


# Architecture of OpenELM

> See the architecture from [OpenELM-270M's config.json](https://huggingface.co/apple/OpenELM-270M/blob/main/config.json)

* RoPE to encode positional information
* Group-query attention(GQA) for more efficient inference
* FlashAttention
* RMSNorm


# OPenELM's Training

> Besides DPO, there are many others RLHF techniques,like ORPO. Here we use DPO.

They ran 350k training steps with a batch size of 4M tokens, yielding a total of 1.4T tokens used for pre-training. For reference, Llama2 was trained on 2T tokens, Gemma on 6T tokens, and Llama3 on 15T tokens.

And also, Apple trained them on a cleaned version of Ultrafeedback using DPO(set to 0.1) and statistical rehection sampling method with these hyperparameters.

In [1]:
!pip install -U -q transformers==4.39.3
!pip install -U -q accelerate==0.28.0
!pip install -U -q datasets==2.18.0
# !pip install -U -q peft==0.10.0
!pip install -U -q bitsandbytes==0.43.1
!pip install -U -q trl==0.8.6

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf 24.4.1 requires cubinlinker, which is not installed.
cudf 24.4.1 requires cupy-cuda11x>=12.0.0, which is not installed.
cudf 24.4.1 requires ptxcompiler, which is not installed.
cuml 24.4.0 requires cupy-cuda11x>=12.0.0, which is not installed.
dask-cudf 24.4.1 requires cupy-cuda11x>=12.0.0, which is not installed.
cudf 24.4.1 requires cuda-python<12.0a0,>=11.7.1, but you have cuda-python 12.5.0 which is incompatible.
distributed 2024.1.1 requires dask==2024.1.1, but you have dask 2024.5.2 which is incompatible.
gcsfs 2024.3.1 requires fsspec==2024.3.1, but you have fsspec 2024.2.0 which is incompatible.
rapids-dask-dependency 24.4.1a0 requires dask==2024.1.1, but you have dask 2024.5.2 which is incompatible.
rapids-dask-dependency 24.4.1a0 requires dask-expr==0.4.0, but you have dask-expr 1.1.2 which is

In [2]:
import warnings

warnings.filterwarnings("ignore")

In [3]:
import os
from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
login(token=user_secrets.get_secret("HUGGINGFACE_TOKEN"))

os.environ["WANDB_API_KEY"]=user_secrets.get_secret("WANDB_API_KEY")
os.environ["WANDB_PROJECT"] = "Fine-tuning openELM-270m with ultrafeedback"
os.environ["WANDB_NAME"] = "ft-openelm-270m-ultrafeedback"
os.environ["MODEL_NAME"] = "apple/OpenELM-270M"
os.environ["TOKENIZER_NAME"] = "meta-llama/Llama-2-7b-hf"
os.environ["DATASET"] = "HuggingFaceH4/ultrafeedback_binarized"

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [4]:
from transformers import AutoTokenizer

tokenizer=AutoTokenizer.from_pretrained(
    os.getenv("TOKENIZER_NAME"), 
    add_eos_token=True, 
    use_fast=True)

tokenizer.pad_token=tokenizer.eos_token
tokenizer.padding_side="left"

tokenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

# Loading dataset

In [5]:
from datasets import load_dataset

ds=load_dataset(os.getenv("DATASET"), split=["train_prefs","test_prefs"])
ds

Downloading readme:   0%|          | 0.00/6.77k [00:00<?, ?B/s]

Downloading data: 100%|██████████| 226M/226M [00:09<00:00, 24.5MB/s] 
Downloading data: 100%|██████████| 226M/226M [00:08<00:00, 27.5MB/s] 
Downloading data: 100%|██████████| 7.29M/7.29M [00:01<00:00, 5.10MB/s]
Downloading data: 100%|██████████| 3.72M/3.72M [00:00<00:00, 6.57MB/s]
Downloading data: 100%|██████████| 184M/184M [00:07<00:00, 24.8MB/s] 
Downloading data: 100%|██████████| 3.02M/3.02M [00:01<00:00, 2.38MB/s]


Generating train_prefs split:   0%|          | 0/61135 [00:00<?, ? examples/s]

Generating train_sft split:   0%|          | 0/61135 [00:00<?, ? examples/s]

Generating test_prefs split:   0%|          | 0/2000 [00:00<?, ? examples/s]

Generating test_sft split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Generating train_gen split:   0%|          | 0/61135 [00:00<?, ? examples/s]

Generating test_gen split:   0%|          | 0/1000 [00:00<?, ? examples/s]

[Dataset({
     features: ['prompt', 'prompt_id', 'chosen', 'rejected', 'messages', 'score_chosen', 'score_rejected'],
     num_rows: 61135
 }),
 Dataset({
     features: ['prompt', 'prompt_id', 'chosen', 'rejected', 'messages', 'score_chosen', 'score_rejected'],
     num_rows: 2000
 })]

# Fit low GPU

In [6]:
train_ds=ds[0].shuffle(seed=42).select(range(300))
eval_ds=ds[1].shuffle(seed=42).select(range(100))

print(train_ds)
print(eval_ds)

Dataset({
    features: ['prompt', 'prompt_id', 'chosen', 'rejected', 'messages', 'score_chosen', 'score_rejected'],
    num_rows: 300
})
Dataset({
    features: ['prompt', 'prompt_id', 'chosen', 'rejected', 'messages', 'score_chosen', 'score_rejected'],
    num_rows: 100
})


In [7]:
import torch, multiprocessing

def preprocess(x):
    x["chosen"]=tokenizer.apply_chat_template(x["chosen"], tokenize=False)
    x["rejected"]=tokenizer.apply_chat_template(x["rejected"], tokenize=False)
    return x

train_ds=train_ds.map(preprocess, num_proc=multiprocessing.cpu_count(), load_from_cache_file=False)
eval_ds=eval_ds.map(preprocess, num_proc=multiprocessing.cpu_count(), load_from_cache_file=False)

Map (num_proc=4):   0%|          | 0/300 [00:00<?, ? examples/s]


No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.


No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.


No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.


No chat template is defined for this tokenizer - using

Map (num_proc=4):   0%|          | 0/100 [00:00<?, ? examples/s]


No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.


No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.


No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.


No chat template is defined for this tokenizer - using

In [8]:
from transformers import AutoModelForCausalLM
model=AutoModelForCausalLM.from_pretrained(
    os.getenv("MODEL_NAME"),
    torch_dtype=torch.float16,
#     device_map={"": 0},
    device_map="cuda",
    trust_remote_code=True
)

model.gradient_checkpointing_enable()
model.device

config.json:   0%|          | 0.00/1.30k [00:00<?, ?B/s]

configuration_openelm.py:   0%|          | 0.00/14.3k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/apple/OpenELM-270M:
- configuration_openelm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_openelm.py:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/apple/OpenELM-270M:
- modeling_openelm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/1.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

device(type='cuda', index=0)

In [9]:
from trl import ORPOTrainer, ORPOConfig

orpo_config=ORPOConfig(
    output_dir=os.getenv("WANDB_NAME"),
    evaluation_strategy="steps",
    do_eval=True,
    optim="adamw_8bit",
    per_device_train_batch_size=8,
    gradient_accumulation_steps=2,
    per_device_eval_batch_size=8,
    log_level="debug",
    logging_steps=100,
    learning_rate=8e-6,
    eval_steps=100,
    save_steps=100,
    save_strategy="epoch",
    num_train_epochs=1,
    warmup_ratio=0.1,
    lr_scheduler_type="linear",
    beta=0.1, # beta is ORPO's lambda
    max_length=1024,
    report_to="wandb",
    run_name=os.getenv('WANDB_NAME')
)

trainer = ORPOTrainer(
        model=model,
        train_dataset=train_ds,
        eval_dataset=eval_ds,
        args=orpo_config,
        tokenizer=tokenizer,
)

trainer.train()

2024-06-24 06:29:42.662166: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-24 06:29:42.662273: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-24 06:29:42.794953: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Map:   0%|          | 0/300 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

You have loaded a model on multiple GPUs. `is_model_parallel` attribute will be force-set to `True` to avoid any unexpected behavior such as device placement mismatching.
Currently training with a batch size of: 8
***** Running training *****
  Num examples = 300
  Num Epochs = 1
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 2
  Total optimization steps = 19
  Number of trainable parameters = 271,527,168
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[34m[1mwandb[0m: Currently logged in as: [33murakiny[0m ([33mcausal_language_trainer[0m). Use [1m`wandb login --relogin`[0m to force relogin


Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss,Validation Loss


Saving model checkpoint to ft-openelm-270m-ultrafeedback/checkpoint-19
Configuration saved in ft-openelm-270m-ultrafeedback/checkpoint-19/config.json
Configuration saved in ft-openelm-270m-ultrafeedback/checkpoint-19/generation_config.json
Model weights saved in ft-openelm-270m-ultrafeedback/checkpoint-19/model.safetensors
tokenizer config file saved in ft-openelm-270m-ultrafeedback/checkpoint-19/tokenizer_config.json
Special tokens file saved in ft-openelm-270m-ultrafeedback/checkpoint-19/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=19, training_loss=2.1115851151315788, metrics={'train_runtime': 255.1737, 'train_samples_per_second': 1.176, 'train_steps_per_second': 0.074, 'total_flos': 0.0, 'train_loss': 2.1115851151315788, 'epoch': 1.0})

In [10]:
kwargs={
    'model_name': os.getenv("WANDB_NAME"),
    'finetuned_from': os.getenv('MODEL_NAME'),
#     'tasks': 'Text-Generation',
#     'dataset_tags':'',
    'dataset': os.getenv("DATASET")
}

tokenizer.push_to_hub(os.getenv("WANDB_NAME"))
trainer.push_to_hub(**kwargs)

tokenizer config file saved in ft-openelm-270m-ultrafeedback/tokenizer_config.json
Special tokens file saved in ft-openelm-270m-ultrafeedback/special_tokens_map.json
Uploading the following files to aisuko/ft-openelm-270m-ultrafeedback: README.md,tokenizer.model,tokenizer.json,special_tokens_map.json,tokenizer_config.json


tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Saving model checkpoint to ft-openelm-270m-ultrafeedback
Configuration saved in ft-openelm-270m-ultrafeedback/config.json
Configuration saved in ft-openelm-270m-ultrafeedback/generation_config.json
Model weights saved in ft-openelm-270m-ultrafeedback/model.safetensors
tokenizer config file saved in ft-openelm-270m-ultrafeedback/tokenizer_config.json
Special tokens file saved in ft-openelm-270m-ultrafeedback/special_tokens_map.json


Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

model.safetensors:   0%|          | 0.00/543M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/aisuko/ft-openelm-270m-ultrafeedback/commit/209431fc805b493e67d0159a751a464f09c49b65', commit_message='End of training', commit_description='', oid='209431fc805b493e67d0159a751a464f09c49b65', pr_url=None, pr_revision=None, pr_num=None)

# Inference

In [12]:
model=AutoModelForCausalLM.from_pretrained(
    os.getenv("WANDB_NAME"), 
    torch_dtype=torch.float16, 
    device_map="cuda", 
    trust_remote_code=True
)

chat=[
    [{"role":"user","content":"How is vanilla cultivated?"}],
    [{"role": "user", "content": "How much money do I have if I have one dollar?"}],
    [{"role": "user", "content": "Where is Berlin?"}],
    [{"role": "user", "content": "Give me a list of 5 European countries."}],
    [{"role": "user", "content": "What is AI?"}],
    [{"role": "user", "content": "What can you do right? Exactly?"}]
]


for c in chat:
    p=tokenizer.apply_chat_template(c, tokenize=False)
    inputs = tokenizer(p, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, do_sample=True, pad_token_id=tokenizer.eos_token_id, top_p=0.9, max_new_tokens=150)
    result = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(result)

loading configuration file ft-openelm-270m-ultrafeedback/config.json
loading configuration file ft-openelm-270m-ultrafeedback/config.json
Model config OpenELMConfig {
  "_name_or_path": "ft-openelm-270m-ultrafeedback",
  "activation_fn_name": "swish",
  "architectures": [
    "OpenELMForCausalLM"
  ],
  "auto_map": {
    "AutoConfig": "apple/OpenELM-270M--configuration_openelm.OpenELMConfig",
    "AutoModelForCausalLM": "apple/OpenELM-270M--modeling_openelm.OpenELMForCausalLM"
  },
  "bos_token_id": 1,
  "eos_token_id": 2,
  "ffn_dim_divisor": 256,
  "ffn_multipliers": [
    0.5,
    0.73,
    0.97,
    1.2,
    1.43,
    1.67,
    1.9,
    2.13,
    2.37,
    2.6,
    2.83,
    3.07,
    3.3,
    3.53,
    3.77,
    4.0
  ],
  "ffn_with_glu": true,
  "head_dim": 64,
  "initializer_range": 0.02,
  "max_context_length": 2048,
  "model_dim": 1280,
  "model_type": "openelm",
  "normalization_layer_name": "rms_norm",
  "normalize_qk_projections": true,
  "num_gqa_groups": 4,
  "num_kv_head

[INST] How is vanilla cultivated? [/INST].
[INST] Why doesn't it have an "Evacuate the Hole" function? It seems like it's the most used item. What is it used for? [/INST] We're going to be focusing on the basics so far, and then as a guide we'll go through a little bit more about what we're doing. And, I'm going to make it clear that this isn't about an official guide, I'm just here to provide you guys with a few general tips and tricks. And, if you want to learn a little more, you can read my latest post: How to Get A Procedural Vanilla Map
So, the


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


[INST] How much money do I have if I have one dollar? [/INST] [INST] How much money do I have if I have one dollar? [/INST]

I have been saving and earning since I was 20. I have no debt, so that's not a concern. I have no savings and I am 23. I'm living paycheck to paycheck, so my savings is pretty minimal and I have had no money coming in so far. 

I've had a few days now where I have been feeling like I don't have money and it's affecting my life in a negative way. I don't know if it is my job or my bad mood or my own self doubt, but my de


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


[INST] Where is Berlin? [/INST] [/INST]
I was very lucky to be invited to the second International Conference on "Five Reasons to Improve Your Life as a Young Professional in Berlin". This event is about how to use one's time to make a difference in the world.
I came from a very good background. I've been an active professional since I was 20 years old and had a good job for 12 years. I had the chance to work as a researcher for some startups and had a great impact on the world.
For many years I had the good fortune to be involved with the UN Millennium Development Goals (UNDGs) which I worked as a project leader on an


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


[INST] Give me a list of 5 European countries. [/INST]'

The following is a list of 5 countries, as per the UNESCO website:


1) Ireland

2) Iceland

3) Portugal

4) Luxembourg

5) Cyprus

6) Malta

7) Greece

8) Latvia

9) The Netherlands

10) Malta

11) Portugal

12) Denmark

13) Spain

14) Belgium

15) The Netherlands

16) Luxembourg

17) Spain

18) Turkey

19) Luxembourg

20) Bulgaria




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


[INST] What is AI? [/INST] the internet as a whole is a lot more complex. As far as the US is concerned, a great deal of the time, it's been really easy for everyone to be aware of things. And the main question here is whether or not it's something we want, or whether it's something that we should be doing. But the problem is, what do people expect to find and get out of it? That's a lot of information. When it's really hard to find anything at all. And, the first part of the question that we've all been asking is whether the government's role is just as important as other governments'. When there's so much information out there, what's going
[INST] What can you do right? Exactly? [/INST]

First thing that comes to mind: Don't eat junk food. Even if you're a vegetarian, I've noticed this to be true on this sub as well (and you don't even have to starve yourself to lose weight). As for my opinion, my mom has a different opinion. She doesn't even say "eat junk food" but has a different o

We will compare OpenELM-270M to the other 1B parameters models, because we want to check the if we can better results from the larger models.

# Credit

* https://medium.com/@bnjmn_marie/fine-tune-tiny-chat-models-with-apple-openelm-and-orpo-f7be4fc137cd
* https://huggingface.co/apple/OpenELM