How to Fine-tune the Model? #37

berkecanrizai · 2023-04-20T20:57:42Z

Hi, I want to fine-tune the 7b model, am I supposed to download the provided checkpoint and fine-tune it as shown in this repo: https://github.com/EleutherAI/gpt-neox#using-custom-data . Would they be compatible and did anyone here give it a shot? Thanks.

Ph0rk0z · 2023-04-21T12:31:24Z

It looks like it needs much more training before any fine-tuning you do will be worth it.

devonkinghorn · 2023-04-21T22:14:40Z

What about using the databricks-dolly-15k dataset? https://github.com/databrickslabs/dolly/tree/master/data

Ph0rk0z · 2023-04-21T22:18:43Z

from: https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYpb63e1ZR3aePczz3zlbJW-Y4/edit#gid=0

Model	RAM	lambada(ppl)	lambada(acc)	hellaswag(acc_norm)	winogrande(acc)	piqa(acc)	coqa(f1)	average

stablelm-base-alpha-7b	32	17.6493	41.06%	41.22%	50.12%	66.76%		49.79%

Look at the perplexity scores of this model right now. It is worse than a 500m. Wait till they finish it. You can thumbs down me 100x but it won't fix it.

samuelazran · 2023-04-22T15:17:19Z

@Ph0rk0z any idea whats the plan for release date of further checkpoints? I think training it on more than 1 trillion tokens can give it advantage compare to other pre-trained models.

Ph0rk0z · 2023-04-22T15:32:09Z

I wish.. I don't work for them. Hopefully they finish training and get rid of the disclaimers. Then this will be a great model for long contexts.. the first I found besides RWKV.

snirbenyosef · 2023-04-22T15:53:45Z

I'm also interested in fine-tuning the model on my book, is it possible?
did someone have an idea how to start?
I tried with transformers but the result was not that good. I just gave it some text. but i will need to process that text i guess.

juanps90 · 2023-04-22T18:40:41Z

I'm interested in finetuning as well. Does anyone have any recommendation for this?

Ph0rk0z · 2023-04-22T18:44:00Z

https://github.com/oobabooga/text-generation-webui/blob/main/docs/Using-LoRAs.md

https://github.com/johnsmith0031/alpaca_lora_4bit

If you want to try to make a lora.

mcmonkey4eva · 2023-04-24T15:48:01Z

You can fine-tune now with the links Ph0rk0z posted above, but ... yeah wait for the next release, the Alpha's are just that - initial Alpha's not meant for real usage, just meant to be open public development.

aamir-gmail · 2023-05-02T03:33:04Z

Hi, I want to fine-tune the 7b model, am I supposed to download the provided checkpoint and fine-tune it as shown in this repo: https://github.com/EleutherAI/gpt-neox#using-custom-data . Would they be compatible and did anyone here give it a shot? Thanks.

I have a training script for 7B and 3B , where can I send it,

snirbenyosef · 2023-05-02T06:13:13Z

Hi, I want to fine-tune the 7b model, am I supposed to download the provided checkpoint and fine-tune it as shown in this repo: https://github.com/EleutherAI/gpt-neox#using-custom-data . Would they be compatible and did anyone here give it a shot? Thanks.

I have a training script for 7B and 3B , where can I send it,

@aamir-gmail can you send it to me snirben@gmail.com please?

vonbarnekowa · 2023-05-02T07:08:22Z

@aamir-gmail, would be cool if you can share it here.

Shaila96 · 2023-05-02T16:21:27Z

@aamir-gmail could you please send it to me also? shaila.zaman96@gmail.com

aamir-gmail · 2023-05-03T03:19:03Z

Here you go the full training script `# Developed by Aamir Mirza

create a conda virtual environment python 3.9

install PyTorch 1.13.1 ( not 2.0)

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

install the latest transformers

conda install -c conda-forge transformers

install deepspeed from GitHub not pip install

build deepspeed with CPU Adam optimiser support like this

git clone https://github.com/microsoft/DeepSpeed

DS_BUILD_CPU_ADAM=1 pip install .

accelerate via pip

pip install Ninja

conda install -c conda-forge mpi4py

train via commandline for example

deepspeed train_gptNX_v2.py --num_gpus=2

In my case I have 2x 3090 24GB

from transformers import GPTNeoXForCausalLM, GPTNeoXTokenizerFast, TextDataset,
DefaultDataCollator, DataCollatorForLanguageModeling, DataCollatorWithPadding
from transformers import Trainer, TrainingArguments
from datasets import load_dataset
import os

os.environ['OMPI_MCA_opal_cuda_support'] = 'true'
os.environ['TOKENIZERS_PARALLELISM'] = 'false'

If you got a single GPU then change this to one

os.environ["WORLD_SIZE"] = "2"

Change this to your requirement for example 4096 (MAX)

MAX_LEN = 1024

stage2_config = """{
"bf16": {
"enabled": "auto",
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},

"optimizer": {
    "type": "AdamW",
    "params": {
        "lr": "auto",
        "betas": "auto",
        "eps": "auto",
        "weight_decay": "auto"
    }
},

"scheduler": {
    "type": "WarmupLR",
    "params": {
        "warmup_min_lr": "auto",
        "warmup_max_lr": "auto",
        "warmup_num_steps": "auto"
    }
},

"zero_optimization": {
    "stage": 2,
    "offload_optimizer": {
        "device": "cpu",
        "pin_memory": true
    },
    "allgather_partitions": true,
    "allgather_bucket_size": 2e8,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 2e8,
    "contiguous_gradients": true
},

"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto"

} """

class CustomTrainer(Trainer):
def compute_loss(self, model_a, inputs_a, return_outputs=False):
strd = ' '
outputs = model_a(**inputs_a, labels=inputs_a["input_ids"])
loss = outputs.loss
return (loss, outputs) if return_outputs else loss

tokenizer = GPTNeoXTokenizerFast.from_pretrained("stabilityai/stablelm-base-alpha-3b")

def process_data(examples):
texts = examples["text"]
# Remove empty lines
texts = [text for text in texts if len(text) > 0 and not text.isspace()]
# Remove lines that are too long
texts = [text for text in texts if len(text) < 512]
# Remove lines that are too short
texts = [text for text in texts if len(text) > 16]
# add newline character
texts = [text + ' ' + '\n' for text in texts]
examples["text"] = texts
return examples

process dataset columns [text] use tokenizer to get input_ids and attention mask

def process_data_add_mask(examples):
text = examples['text']
tokenizer.pad_token = tokenizer.eos_token
# Tokenize text
encoded_dict = tokenizer(
text,
padding=True,
truncation=True,
max_length=MAX_LEN
)
# Add input_ids and attention_mask to example
examples['input_ids'] = encoded_dict['input_ids']
examples['attention_mask'] = encoded_dict['attention_mask']
return examples

imdb_dataset = load_dataset('imdb')
imdb_dataset_train = imdb_dataset['train']
imdb_dataset_train = imdb_dataset_train.shuffle()
imdb_dataset_train = imdb_dataset_train.map(process_data, batched=True, remove_columns=['label'])
imdb_dataset_val = imdb_dataset['test']
imdb_dataset_val = imdb_dataset_val.shuffle()
imdb_dataset_val = imdb_dataset_val.map(process_data, batched=True, remove_columns=['label'])
train_dataset = imdb_dataset_train.map(process_data_add_mask, remove_columns=["text"], batched=True)
val_dataset = imdb_dataset_val.map(process_data_add_mask, remove_columns=["text"], batched=True)
strs = " "

model = GPTNeoXForCausalLM.from_pretrained("stabilityai/stablelm-base-alpha-3b")

absolute path required for deepspeed config

you can use the JSON above to create your own config

z_optimiser = '/two-tb/train_GPTNX/zeromq_config/stablelm-base-alpha-3b_config.json'
data_collator = DataCollatorWithPadding(tokenizer=tokenizer,
return_tensors="pt")
training_args_v2 = TrainingArguments(
output_dir="./trained_model",
learning_rate=2e-5,
save_total_limit=2,
fp16=True,
per_device_train_batch_size=1,
per_device_eval_batch_size=12,
evaluation_strategy="epoch",
deepspeed=z_optimiser,
num_train_epochs=1
)

Set up the trainer

trainer = CustomTrainer(
model=model,
args=training_args_v2,
train_dataset=train_dataset,
eval_dataset=val_dataset,
data_collator=data_collator,
tokenizer=tokenizer,
)
trainer.train()

trainer.save_model()

`

aamir-gmail · 2023-05-03T04:48:33Z

I have shard the script works for 3B as well as 7 B

…

On Wed, May 3, 2023 at 2:21 AM Shaila Zaman ***@***.***> wrote: @aamir-gmail <https://github.com/aamir-gmail> could you please send it to me also? ***@***.*** — Reply to this email directly, view it on GitHub <#37 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJA2ECO7YGIAWLBC245OMZDXEEYBFANCNFSM6AAAAAAXGAX72I> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Kind Regards Aamir Mirza

Appleyc · 2023-05-06T02:08:39Z

I've been too busy recently, I just saw this message now, thank you very much！在 2023-05-03 11:19:14，"Aamir Mirza" ***@***.***> 写道： Here you go the full training script `# Developed by Aamir Mirza create a conda virtual environment python 3.9 install PyTorch 1.13.1 ( not 2.0) conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia install the latest transformers conda install -c conda-forge transformers install deepspeed from GitHub not pip install build deepspeed with CPU Adam optimiser support like this git clone https://github.com/microsoft/DeepSpeed DS_BUILD_CPU_ADAM=1 pip install . accelerate via pip pip install Ninja conda install -c conda-forge mpi4py train via commandline for example deepspeed train_gptNX_v2.py --num_gpus=2 In my case I have 2x 3090 24GB from transformers import GPTNeoXForCausalLM, GPTNeoXTokenizerFast, TextDataset, DefaultDataCollator, DataCollatorForLanguageModeling, DataCollatorWithPadding from transformers import Trainer, TrainingArguments from datasets import load_dataset import os os.environ['OMPI_MCA_opal_cuda_support'] = 'true' os.environ['TOKENIZERS_PARALLELISM'] = 'false' If you got a single GPU then change this to one os.environ["WORLD_SIZE"] = "2" Change this to your requirement for example 4096 (MAX) MAX_LEN = 1024 stage2_config = """{ "bf16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "optimizer": { "type": "AdamW", "params": { "lr": "auto", "betas": "auto", "eps": "auto", "weight_decay": "auto" } }, "scheduler": { "type": "WarmupLR", "params": { "warmup_min_lr": "auto", "warmup_max_lr": "auto", "warmup_num_steps": "auto" } }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 2e8, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2e8, "contiguous_gradients": true }, "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto" } """ class CustomTrainer(Trainer): def compute_loss(self, model_a, inputs_a, return_outputs=False): strd = ' ' outputs = model_a(**inputs_a, labels=inputs_a["input_ids"]) loss = outputs.loss return (loss, outputs) if return_outputs else loss tokenizer = GPTNeoXTokenizerFast.from_pretrained("stabilityai/stablelm-base-alpha-3b") def process_data(examples): texts = examples["text"] # Remove empty lines texts = [text for text in texts if len(text) > 0 and not text.isspace()] # Remove lines that are too long texts = [text for text in texts if len(text) < 512] # Remove lines that are too short texts = [text for text in texts if len(text) > 16] # add newline character texts = [text + ' ' + '\n' for text in texts] examples["text"] = texts return examples process dataset columns [text] use tokenizer to get input_ids and attention mask def process_data_add_mask(examples): text = examples['text'] tokenizer.pad_token = tokenizer.eos_token # Tokenize text encoded_dict = tokenizer( text, padding=True, truncation=True, max_length=MAX_LEN ) # Add input_ids and attention_mask to example examples['input_ids'] = encoded_dict['input_ids'] examples['attention_mask'] = encoded_dict['attention_mask'] return examples imdb_dataset = load_dataset('imdb') imdb_dataset_train = imdb_dataset['train'] imdb_dataset_train = imdb_dataset_train.shuffle() imdb_dataset_train = imdb_dataset_train.map(process_data, batched=True, remove_columns=['label']) imdb_dataset_val = imdb_dataset['test'] imdb_dataset_val = imdb_dataset_val.shuffle() imdb_dataset_val = imdb_dataset_val.map(process_data, batched=True, remove_columns=['label']) train_dataset = imdb_dataset_train.map(process_data_add_mask, remove_columns=["text"], batched=True) val_dataset = imdb_dataset_val.map(process_data_add_mask, remove_columns=["text"], batched=True) strs = " " model = GPTNeoXForCausalLM.from_pretrained("stabilityai/stablelm-base-alpha-3b") absolute path required for deepspeed config you can use the JSON above to create your own config z_optimiser = '/two-tb/train_GPTNX/zeromq_config/stablelm-base-alpha-3b_config.json' data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors="pt") training_args_v2 = TrainingArguments( output_dir="./trained_model", learning_rate=2e-5, save_total_limit=2, fp16=True, per_device_train_batch_size=1, per_device_eval_batch_size=12, evaluation_strategy="epoch", deepspeed=z_optimiser, num_train_epochs=1 ) Set up the trainer trainer = CustomTrainer( model=model, args=training_args_v2, train_dataset=train_dataset, eval_dataset=val_dataset, data_collator=data_collator, tokenizer=tokenizer, ) trainer.train() trainer.save_model() ` — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

aamir-gmail · 2023-05-22T07:22:25Z

Same works for 3B and 7B

…

On Mon, May 22, 2023 at 2:08 PM xuantoan02 ***@***.***> wrote: Hi, I want to fine-tune the 7b model, am I supposed to download the provided checkpoint and fine-tune it as shown in this repo: https://github.com/EleutherAI/gpt-neox#using-custom-data . Would they be compatible and did anyone here give it a shot? Thanks. I have a training script for 7B and 3B , where can I send it, could you please send it to me .. ***@***.*** thank you ! — Reply to this email directly, view it on GitHub <#37 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJA2ECME7PICQHZ5RCSL6SLXHLRFTANCNFSM6AAAAAAXGAX72I> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Kind Regards Aamir Mirza

berkecanrizai changed the title ~~How To Fine-tune the Model?~~ How to Fine-tune the Model? Apr 20, 2023

mcmonkey4eva added the question Further information is requested label Apr 24, 2023

twmmason closed this as completed Apr 25, 2023

twmmason reopened this Apr 25, 2023

uaktags mentioned this issue Apr 26, 2023

How to train the StableLM-Tuned-Alpha-3b or StableLM-Tuned-Alpha-7b? I want to know the details of the fine-tuning. Thanks. #55

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Fine-tune the Model? #37

How to Fine-tune the Model? #37

berkecanrizai commented Apr 20, 2023

Ph0rk0z commented Apr 21, 2023

devonkinghorn commented Apr 21, 2023

Ph0rk0z commented Apr 21, 2023

samuelazran commented Apr 22, 2023

Ph0rk0z commented Apr 22, 2023 •

edited

snirbenyosef commented Apr 22, 2023

juanps90 commented Apr 22, 2023

Ph0rk0z commented Apr 22, 2023

mcmonkey4eva commented Apr 24, 2023

aamir-gmail commented May 2, 2023

snirbenyosef commented May 2, 2023

vonbarnekowa commented May 2, 2023

Shaila96 commented May 2, 2023

aamir-gmail commented May 3, 2023

aamir-gmail commented May 3, 2023 via email

Appleyc commented May 6, 2023 via email

aamir-gmail commented May 22, 2023 via email

How to Fine-tune the Model? #37

How to Fine-tune the Model? #37

Comments

berkecanrizai commented Apr 20, 2023

Ph0rk0z commented Apr 21, 2023

devonkinghorn commented Apr 21, 2023

Ph0rk0z commented Apr 21, 2023

samuelazran commented Apr 22, 2023

Ph0rk0z commented Apr 22, 2023 • edited

snirbenyosef commented Apr 22, 2023

juanps90 commented Apr 22, 2023

Ph0rk0z commented Apr 22, 2023

mcmonkey4eva commented Apr 24, 2023

aamir-gmail commented May 2, 2023

snirbenyosef commented May 2, 2023

vonbarnekowa commented May 2, 2023

Shaila96 commented May 2, 2023

aamir-gmail commented May 3, 2023

create a conda virtual environment python 3.9

install PyTorch 1.13.1 ( not 2.0)

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

install the latest transformers

conda install -c conda-forge transformers

install deepspeed from GitHub not pip install

build deepspeed with CPU Adam optimiser support like this

git clone https://github.com/microsoft/DeepSpeed

DS_BUILD_CPU_ADAM=1 pip install .

accelerate via pip

pip install Ninja

conda install -c conda-forge mpi4py

train via commandline for example

deepspeed train_gptNX_v2.py --num_gpus=2

In my case I have 2x 3090 24GB

If you got a single GPU then change this to one

Change this to your requirement for example 4096 (MAX)

process dataset columns [text] use tokenizer to get input_ids and attention mask

absolute path required for deepspeed config

you can use the JSON above to create your own config

Set up the trainer

trainer.save_model()

aamir-gmail commented May 3, 2023 via email

Appleyc commented May 6, 2023 via email

aamir-gmail commented May 22, 2023 via email

Ph0rk0z commented Apr 22, 2023 •

edited