support sft training on d2l #100

llauraa23 · 2024-01-09T23:56:56Z

out of memory on 24G single gpu during training

execute with "python -m example.rlhf.supervised_finetuning_d2l"

CambioML · 2024-01-16T16:37:01Z

example/autorate/auto-rater.ipynb

+    "from nltk.tokenize import word_tokenize\n",
+    "\n",
+    "# from openai import openai_object\n",
+    "openai.api_key = \"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\""


let's remove openai.api_key.

CambioML · 2024-01-16T16:37:38Z

example/autorate/auto-rater.ipynb

could you please help describe why this is called auto-rater.ipynb, it looks like this file is used for generating QA dataset.

This script for auto evaluation is incomplete. It also contains redundant code from previous QA generation code. Since the latest version of pykoi/uniflow already have auto rater, shall we remove my related commits?

CambioML · 2024-01-16T16:42:00Z

example/rlhf/supervised_finetuning_d2l.py

let's rename this file to supervised_finetuning_demo_d2l.py to indicate this is a demo file.

CambioML · 2024-01-16T16:44:38Z

example/rlhf/supervised_finetuning_d2l.py

+# run supervised finetuning
+from peft import LoraConfig
+config = RLHFConfig(base_model_path="mistralai/Mistral-7B-Instruct-v0.1",
+                    dataset_type="local_csv", dataset_name="data/chapter22_trnvalfromseed_data_processed.csv",


qq: I do not see this file in the data folder? Is it missing?

I am not sure if I should add these data files.

CambioML · 2024-01-16T16:45:56Z

example/rlhf/supervised_finetuning_d2l.py

+                        r=512,                   
+                        lora_alpha=1024,                        
+                        lora_dropout=0.05, 
+                        target_modules=["q_proj","k_proj","v_proj","o_proj",], # "gate_proj","up_proj","down_proj",], #"lm_head",],


qq: what is the target_modules parameter used for?

It specifies the modules (e.g., which components in the attention layers) to be updated when we using peft training.

CambioML · 2024-01-16T16:47:42Z

pykoi/rlhf/customize_data_collator.py

+
+        batch["labels"] = labels
+
+        return batch


nit: add a new line at the end of the file. If you have not done so, please setup your dev environment following https://www.notion.so/goldpiggy/Python-Linter-and-formatter-Setup-30fb3b81f0904af889832e4c697c5ec9?pvs=4

Thanks! I resolved it and ran pylint on other files as well.

CambioML · 2024-01-16T16:49:03Z

pykoi/rlhf/supervised_finetuning.py

+        self.model.resize_token_embeddings(len(self.tokenizer))
+
+        # dh: try the customized data collator that only predicts the answer part
+        data_collator = DataCollatorForCompletionOnlyLM(


qq: shall we make this configurable to avoid breaking running the code in the old way.

CambioML · 2024-01-16T16:52:01Z

pykoi/rlhf/supervised_finetuning.py

            packing=True,
+            data_collator=data_collator,


qq: could you please help explain in the PR description why we added this data_collator while we do not need this before.

I got this that this is for training the instruction following objective by masking out the query instead of the casual language model objective for only the next token.

CambioML · 2024-01-16T16:52:42Z

pykoi/rlhf/supervised_finetuning.py

@@ -208,15 +287,15 @@ def create_datasets(self, tokenizer, args):
        train_dataset = ConstantLengthDataset(
            tokenizer,
            dataset["train"],
-            formatting_func=self.prepare_sample_text,
+            formatting_func=self.prepare_d2l_text,


qq: same as my comments above, we should make this configurable to maintain the old functionality.

CambioML · 2024-01-24T06:41:55Z

example/rlhf/supervised_finetuning_d2l.py

@@ -0,0 +1,45 @@
+"""Demo for the supervised fine tuning.
+
+python -m example.rlhf.supervised_finetuning_demo


nit: it should be python -m example.rlhf.supervised_finetuning_demo_d2l.py.

CambioML · 2024-01-24T06:45:55Z

pykoi/rlhf/customize_data_collator.py

+from typing import Any, Dict, List, Tuple, Union
+from transformers import DataCollatorForLanguageModeling
+import numpy as np
+class DataCollatorForCompletionOnlyLM(DataCollatorForLanguageModeling):


qq: in this example, https://huggingface.co/docs/trl/sft_trainer#advanced-usage, it looks like it directly imports from trl import SFTTrainer, DataCollatorForCompletionOnlyLM.

it looks like DataCollatorForCompletionOnlyLM is doing what we want to mask out question from the training objective and trl already have an implementation. I am curious regarding why we wrote our version of DataCollatorForCompletionOnlyLM here.

It is Rachel and Yunfan's customized implementation

CambioML · 2024-01-24T07:05:57Z

pykoi/rlhf/supervised_finetuning.py

            packing=True,
+            data_collator=data_collator,


I got this that this is for training the instruction following objective by masking out the query instead of the casual language model objective for only the next token.

CambioML · 2024-01-24T07:15:36Z

pykoi/rlhf/supervised_finetuning.py

+        INTRO_BLURB = (
+            "Below is an instruction that describes a task. Write a response that appropriately completes the request."
+        )


qq that does INTRO_BLURB needed for SFT.

Unless user always put this as a part of their system prompt, I am wondering if user forget to include this as a part of their system prompt. It might hurt the inference performance.

This is Yunfei's prompt and I kept it in order to reproduce his result in pykoi. I agree with the case you mentioned.

CambioML · 2024-01-24T07:24:00Z

pykoi/rlhf/supervised_finetuning.py

@@ -208,15 +287,15 @@ def create_datasets(self, tokenizer, args):
        train_dataset = ConstantLengthDataset(


One caveat here is that ConstantLengthDataset always prepare your dataset into seq_length by breaking one coherent c + q + a into multiple data point if len(c + q + a) > seq_length

While DataCollatorForCompletionOnlyLM implementation for SFTTrainer is to train mask query and train an object for response only.

However, I am a bit confused that your dataset is not prepared to train on response only (mask out query) but still casual langauge model object for next token.

Does the collator mask out the query?

CambioML · 2024-01-24T07:30:10Z

@llauraa23 Take a look at huggingface/trl#1083 (comment) and https://huggingface.co/docs/trl/sft_trainer#advanced-usage

llauraa23 · 2024-01-24T08:04:49Z

@llauraa23 Take a look at huggingface/trl#1083 (comment) and https://huggingface.co/docs/trl/sft_trainer#advanced-usage

Thanks! I will have a look over the weekend. Have to work on the dpo and post writing.

CambioML · 2024-01-26T04:49:37Z

close this PR because it is also in #101. The future work is to use #101 into two PR for SFT and DPO.

llauraa23 and others added 3 commits December 4, 2023 23:34

auto rater, sample data and prompt engineering

eb4878f

Merge branch 'CambioML:main' into main

5c9cb03

support supervised fine tuning on d2l.

1ab4f32

execute with "python -m example.rlhf.supervised_finetuning_d2l"

llauraa23 requested review from CambioML and goldmermaid as code owners January 9, 2024 23:56

CambioML reviewed Jan 16, 2024

View reviewed changes

CambioML reviewed Jan 24, 2024

View reviewed changes

CambioML closed this Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support sft training on d2l #100

support sft training on d2l #100

llauraa23 commented Jan 9, 2024

CambioML Jan 16, 2024

CambioML Jan 16, 2024

llauraa23 Jan 24, 2024

CambioML Jan 16, 2024

llauraa23 Jan 24, 2024

CambioML Jan 16, 2024

llauraa23 Jan 24, 2024

CambioML Jan 16, 2024

llauraa23 Jan 24, 2024

CambioML Jan 16, 2024

llauraa23 Jan 24, 2024

CambioML Jan 16, 2024

llauraa23 Jan 24, 2024

CambioML Jan 16, 2024

CambioML Jan 24, 2024

llauraa23 Jan 24, 2024

CambioML Jan 16, 2024

llauraa23 Jan 24, 2024

CambioML Jan 24, 2024

llauraa23 Jan 24, 2024

CambioML Jan 24, 2024

llauraa23 Jan 24, 2024

CambioML Jan 24, 2024

CambioML Jan 24, 2024

llauraa23 Jan 24, 2024

CambioML Jan 24, 2024

llauraa23 Jan 24, 2024

CambioML commented Jan 24, 2024

llauraa23 commented Jan 24, 2024

CambioML commented Jan 26, 2024

		@@ -0,0 +1,45 @@
		"""Demo for the supervised fine tuning.

		python -m example.rlhf.supervised_finetuning_demo

		@@ -208,15 +287,15 @@ def create_datasets(self, tokenizer, args):
		train_dataset = ConstantLengthDataset(

support sft training on d2l #100

support sft training on d2l #100

Conversation

llauraa23 commented Jan 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CambioML commented Jan 24, 2024

llauraa23 commented Jan 24, 2024

CambioML commented Jan 26, 2024