# Full Stack Practice of LLM Training - LLM Post-Training @ RLChina 2024

- Author: [Cheng Deng](https://www.cdeng.net/)[✉️]((davendw49@gmail.com), [Jun Wang](http://www0.cs.ucl.ac.uk/staff/jun.wang/)

---
## Main Task

In this part, we'll walk through the process of post-training a large language model (LLM) to adapt it to a specific task or domain and align to human using basic RLHF methods including PPO and DPO. Whether you're fine-tuning a model to generate more relevant outputs for customer support chatbots or specializing it for scientific text summarization, post-training allows you to customize the pretrained model to meet your specific needs.

In this section, we will set up the environment to fine-tuning the model we have pre-trained before. The roadmap has moved forward, so keep going!

![](https://www.cdeng.net/resources/imgs/RLChina24/c.png)

Here is the prerequisite knowledge required:

- `Transformers` model
- `DataLoader` method
- Basic machine learning and OS knowledge


## Supervised Fine-tuning with Instruction Tuning Data



In [None]:
!pip install datasets

In [2]:
import json
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments, DataCollatorForSeq2Seq

In [4]:
# Load the Alpaca Dataset
# Assuming you've downloaded the Alpaca dataset as a JSON file named 'alpaca_data.json'
with open("alpaca_data.json", "r") as f:
    alpaca_data = json.load(f)

# Convert it to a Hugging Face Dataset
dataset = Dataset.from_list(alpaca_data)

In [None]:
# Load the Model and Tokenizer
model_name = "gpt2"  # Change to "huggingface/llama" or any other model you want to fine-tune
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Add padding token if not available (e.g., for GPT-2)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

In [None]:
# Step 6: Tokenize the Dataset
def format_instruction(example):
    prompt = f"### Instruction:\n{example['instruction']}\n\n### Input:\n{example['input']}\n\n### Response:\n"
    return {"prompt": prompt, "output": example["output"]}

def tokenize_function(example):
    formatted = format_instruction(example)
    tokenized_prompt = tokenizer(
        formatted["prompt"], truncation=True, padding="max_length", max_length=512
    )
    tokenized_output = tokenizer(
        formatted["output"], truncation=True, padding="max_length", max_length=512
    )
    # Prepare input and labels for the causal LM model
    input_ids = tokenized_prompt["input_ids"] + tokenized_output["input_ids"]
    labels = [-100] * len(tokenized_prompt["input_ids"]) + tokenized_output["input_ids"]

    return {
        "input_ids": input_ids,
        "attention_mask": tokenized_prompt["attention_mask"] + tokenized_output["attention_mask"],
        "labels": labels,
    }

tokenized_dataset = dataset.map(tokenize_function, batched=False)

# Prepare Data Collator
data_collator = DataCollatorForSeq2Seq(
    tokenizer=tokenizer,
    model=model,
    pad_to_multiple_of=8,
)

In [8]:
# Training Arguments
training_args = TrainingArguments(
    output_dir="./alpaca-finetuned-model",
    evaluation_strategy="steps",
    learning_rate=5e-5,
    weight_decay=0.01,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=1,
    save_strategy="steps",
    logging_dir="./logs",
    logging_steps=10,
    load_best_model_at_end=True,
    save_total_limit=2,
    fp16=True,  # Use FP16 if your GPU supports it
)

In [11]:
subset_dataset = tokenized_dataset.select(range(100))

In [None]:
# Set Up Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    eval_dataset=subset_dataset,  # You may want to split for training and evaluation
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Train the Model
trainer.train()

# Save the Model
trainer.save_model("./alpaca-finetuned-model")

## Train with LoRA

- TODO

In [None]:
model = LlamaForCausalLM.from_pretrained(
    base_model,
    load_in_8bit=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)

tokenizer = LlamaTokenizer.from_pretrained(
    base_model
)

In [None]:
model = prepare_model_for_int8_training(model)

config = LoraConfig(
    r=lora_r,
    lora_alpha=lora_alpha,
    target_modules=lora_target_modules,
    lora_dropout=lora_dropout,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, config)