<a href="https://colab.research.google.com/github/peremartra/Apress_LLProjects_Book/blob/main/P2-MHF/7_2_Aligning_DPO_phi3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div>
    <h1>Large Language Models Projects</a></h1>
    <h3>Apply and Implement Strategies for Large Language Models</h3>
    <h2>7.1-Creating and Publishing Your Own LLM</h2>
    <h3>Aligning with DPO a phi3-3 model.</h3>
    <p>by <b>Pere Martra</b></p>
</div>



The revolution we're currently experiencing around Large Language Models began with the emergence of ChatGPT and its GPT-3.5 model.

Something different had been done with GPT-3.5, which was actually a derivative of GPT-3, a model that did not generate nearly as much excitement as its successor.

Many people, including myself, believe that the main difference was the use of Alignemet using RLHF - Reinforcement Learning from Human Feedback.

Nowadays RLHF has been displaced by a technique that achieves the same result in a much more efficient way: DPO - Direct Preference Optimization.

Both DPO and RLHF are alignment techniques that require a dataset containing correct and incorrect responses to the same prompt.

But from here, the differences begin. RLHF uses this dataset to train a second model, called a reward model, which will be used in the alignment process. DPO, on the other hand, uses the dataset directly to train the final model. This is the main difference between the two techniques.

As you can imagine, DPO is a more direct technique that requires fewer resources. When we're talking about models with tens of billions of parameters, any reduction in resource consumption can result in significant cost savings.

The implementation of DPO that we are going to use is the one developed by Hugging Face in their TRL library, which stands for Transformer Reinforcement Learning. DPO can be considered a reinforcement learning technique, where the model is rewarded during its training phase based on its responses.

_______________________

Since is necesary to save the created model, the notebook mounts a disk in  Google Drive. If you run it locally, you don't need to execute this line of code. You can actually also run it on Google Colab without mounting a disk in your drive, but then every time you close the session you'll lose the saved model, as it will be saved in the temporary directory of the Google Colab session.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Now that the disk is mounted, it's time to load the necessary libraries:

The only one that might be new to you is the trl library, which stands for Transformer Reinforcement Learning. You'll be importing the DPOTrainer class from this library, which you'll use to perform the DPO fine-tuning of the model.

In [None]:
!pip install -q datasets==2.19.1
!pip install -q trl==0.8.6
!pip install -q peft==0.11.1
!pip install -q transformers==4.41.0
!pip install -q bitsandbytes==0.43.1
!pip install -q sentencepiece==0.1.99
!pip install -q accelerate==0.30.1
!pip install -q huggingface_hub==0.23.1

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m16.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m17.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m245.2/245.2 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.6/302.6 kB[0m [31m27.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.4/102.4 kB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.3/21.3 MB[0m [31m59.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━

In [None]:
#Import necessary classes.
import gc
import torch

import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import TrainingArguments, BitsAndBytesConfig
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, PeftModel
from trl import DPOTrainer
import bitsandbytes as bnb

from getpass import getpass


Another necessary step is to log in to Hugging Face.

In [None]:
hf_token = getpass("Hugging Face: ")

Hugging Face: ··········


In [None]:
!huggingface-cli login --token $hf_token

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


## Format dataset

The model I've chosen is the Microsoft Phi-3 mini with the 4k context. It's a 3.8B parameter model that is very competitive and in many cases outperforms 7B parameter models.
I've chosen a small model so that its training can be done with few resources on Google Colab or on a not very large GPU.


In [None]:
model_name = "microsoft/Phi-3-mini-4k-instruct"
new_model = "phi-3-mini-dpo-apress"

In [None]:
# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

tokenizer_config.json:   0%|          | 0.00/3.17k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/568 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Before you begin training the model, is necesary need to load the dataset and transform it to fit the format required by the DPOTrainer class, that consists of three fields: the prompt, the chosen answer, and a discarded answer.

I'm loading just a fer rows of the Dataset, feel free to use the full Dataset if you have enough time.

You may need more or less 2 hours of a A100 GPU to train with Full Dataset for a 20 epochs. If you use the configuration in the notebook you are going to need just 40 minutes in a L4 GPU.

In [None]:
# Load dataset
dataset_original =  load_dataset("argilla/distilabel-capybara-dpo-7k-binarized",
                                 split='train[300:2500]')
dataset_eval = load_dataset("argilla/distilabel-capybara-dpo-7k-binarized",
                            split='train[:300]')

# Save columns
original_columns = dataset_original.column_names
print(original_columns)

Downloading readme:   0%|          | 0.00/11.7k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/156M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/7563 [00:00<?, ? examples/s]

['source', 'conversation', 'original_response', 'generation_prompt', 'raw_generation_responses', 'new_generations', 'prompt', 'chosen', 'rejected', 'rating_chosen', 'rating_rejected', 'chosen_model', 'rejected_model']


In [None]:
dataset_original

Dataset({
    features: ['source', 'conversation', 'original_response', 'generation_prompt', 'raw_generation_responses', 'new_generations', 'prompt', 'chosen', 'rejected', 'rating_chosen', 'rating_rejected', 'chosen_model', 'rejected_model'],
    num_rows: 2200
})

The dataset has many more columns than are actually necessary for the DPO process. However, I'm going to take advantage of a couple of them to filter the data to be used.

In [None]:
dataset_filtered = dataset_original.filter(
  lambda r: r["rating_chosen"]>=4.5 and r["rating_rejected"] <= 2.5
)

Filter:   0%|          | 0/2200 [00:00<?, ? examples/s]

This first filter only retrieves those rows where the rating of the chosen response is very high and the rating of the discarded responses is very low. This is a way to facilitate the model's learning, although it's also possible that it doesn't help in the last epochs of training.

I'm going to perform a second filter to keep the prompt length under control, as the selected model only accepts a length of 4000 tokens.

In [None]:
dataset_filtered = dataset_filtered.map(lambda r: {"messages": len(r["chosen"])}).filter(lambda r: r["messages"]<3 and len(r["prompt"]) + len(r["chosen"]) + len(r["rejected"]) < 3800)


Map:   0%|          | 0/400 [00:00<?, ? examples/s]

Filter:   0%|          | 0/400 [00:00<?, ? examples/s]

In [None]:
dataset_filtered

Dataset({
    features: ['source', 'conversation', 'original_response', 'generation_prompt', 'raw_generation_responses', 'new_generations', 'prompt', 'chosen', 'rejected', 'rating_chosen', 'rating_rejected', 'chosen_model', 'rejected_model', 'messages'],
    num_rows: 169
})

The dataset still have all the columns,but the number of rows has been significantly reduced. Let me warn you that 169 rows are too few to perform proper training; again, this reduction is so that will be possible to execute the notebook in just some minutes, and get results.

In [None]:
#Repeat the same filters with the Validation Dataset.
dataset_eval_filtered = dataset_eval.filter(
  lambda r: r["rating_chosen"]>=4.5 and r["rating_rejected"] <= 2.5
)
dataset_eval_filtered = dataset_eval_filtered.map(lambda r: {"messages": len(r["chosen"])}).filter(lambda r: r["messages"]<3 and len(r["prompt"]) + len(r["chosen"]) + len(r["rejected"]) < 3800)
dataset_eval_filtered

Filter:   0%|          | 0/300 [00:00<?, ? examples/s]

Map:   0%|          | 0/51 [00:00<?, ? examples/s]

Filter:   0%|          | 0/51 [00:00<?, ? examples/s]

Dataset({
    features: ['source', 'conversation', 'original_response', 'generation_prompt', 'raw_generation_responses', 'new_generations', 'prompt', 'chosen', 'rejected', 'rating_chosen', 'rating_rejected', 'chosen_model', 'rejected_model', 'messages'],
    num_rows: 28
})

Now, it's a matter of creating a function to adapt the dataset's structure to what's required by the DPOTraining class.

I have to confess that I've cheated a little bit. The function comes from the Hugging Face dataset card. I only had to remove an error that they had missed.

In summary, the function takes a row and retrieves only the three necessary columns. It also applies a small format to the responses, which I've adapted to the format required by the Model, adding the label <|end|> after the responses.


In [None]:
def chatml_format(example):
    # get everything except the last message as input
    prompt = tokenizer.apply_chat_template(example["chosen"][:-1], tokenize=False,
                                           add_generation_prompt=True)
    # get the last assistant responses
    chosen = example["chosen"][-1]["content"] + "<|end|>\n"
    rejected = example["rejected"][-1]["content"] + "<|end|>\n"

    return {
        "prompt": prompt,
        "chosen": chosen,
        "rejected": rejected,
    }

In [None]:
original_columns

['source',
 'conversation',
 'original_response',
 'generation_prompt',
 'raw_generation_responses',
 'new_generations',
 'prompt',
 'chosen',
 'rejected',
 'rating_chosen',
 'rating_rejected',
 'chosen_model',
 'rejected_model']

I'll use the dataset's map function to execute the transformation on each row, and also remove the original columns.

In [None]:
# Format dataset
dataset = dataset_filtered.map(
    chatml_format,
    remove_columns=dataset_filtered.column_names
)
# Print sample
dataset[12]

Map:   0%|          | 0/169 [00:00<?, ? examples/s]

{'prompt': '<s><|user|>\nSolve 36146684-304553543134. Only respond with math and no words.<|end|>\n<|assistant|>\n',
 'chosen': '36146684 - 304553543134 = -304517396450<|end|>\n',
 'rejected': '(36146684 / 3134) * (36146684 mod 3134) + (30455354 / 17) * (30455354 mod 17) = 11415845286790903\nAlternatively, using prime factorization and the Chinese Remainder Theorem:\n36146684 = 2^5 * 9573, 30455354 = 2 * 29 * 4171\n36146684 mod 9573 = 4332, 30455354 mod 29 = 13, 30455354 mod 4171 = 3965\n(36146684 / 9573) * 4332 + (30455354 / 29) * 13 + (30455354 / 4171) * 3965 = 11415845286790903<|end|>\n'}

In [None]:
# Format dataset
dataset_eval = dataset_eval_filtered.map(
    chatml_format,
    remove_columns=original_columns
)

Map:   0%|          | 0/28 [00:00<?, ? examples/s]

In [None]:
# Print sample
dataset_eval[2]

{'prompt': '<s><|user|>\nAssist me in calculating 9319357631 plus 595. Numbers and symbols only, please.<|end|>\n<|assistant|>\n',
 'chosen': 'The sum of 9319357631 and 595 is 9319358226.<|end|>\n',
 'rejected': 'The result of adding 9319357631 and 595 is 9319363626.<|end|>\n',
 'messages': 2}

The format is adapted to chat in phi3:
https://huggingface.co/microsoft/Phi-3-mini-128k-instruct



<|user|>
I am going to Paris, what should I see?<|end|>

<|assistant|>
Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:\n\n1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.\n2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.\n3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.\n\nThese are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world."<|end|>

<|user|>
What is so great about #1?<|end|>



In [None]:
dataset

Dataset({
    features: ['prompt', 'chosen', 'rejected'],
    num_rows: 169
})

In [None]:
dataset_eval

Dataset({
    features: ['prompt', 'chosen', 'rejected', 'messages'],
    num_rows: 28
})

## Train model with DPO

## Finetuning with DPOTrainer.

Time to start working with the necessary configurations to perform alignment using DPO.



In [None]:
# LoRA configuration
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    #target_modules=['o_proj', 'qkv_proj'] #phi-3
    target_modules="all-linear"
)

The value of **r** indicates the size of the reparameterization; the higher the value, the more parameters are trained. An 8 is at the upper limit of what is recommended for small models.

To further accentuate the weight of the new training, I use the **lora_alpha** value. It's a multiplier that adjusts the layers inserted by LoRA. Normally it's left at 1, but in the case of DPO, I've seen values as high as 128.

The recommendation is that **lora_alpha** should be double the value of **r**. Since **r** varies depending on the model size, you may end up with a very high lora_alpha value if you want to fine-tune a large model and, for example, specify an **r** of 64.

In [None]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16
)

The quantization configuration holds no secrets, we are reducing the model's precision to 4 bits.

In [None]:
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config
)
model.config.use_cache = False

`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

The next step is to create the training parameters.

In [None]:
# Training arguments
training_args = TrainingArguments(
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=2,
    gradient_checkpointing=True,
    remove_unused_columns=True,
    learning_rate=5.0e-06,
    evaluation_strategy="epoch",
    logging_strategy="epoch",
    lr_scheduler_type="cosine",
    num_train_epochs=6,
    save_strategy="epoch",
    logging_steps=1,
    output_dir=new_model,
    optim="paged_adamw_32bit",
    warmup_steps=2,
    bf16=True,
    report_to="none",
)




I'll try to explain the value of the most important and specific ones.

**lr_scheduler_type**="cosine": The learning rate is adjusted according to a cosine schedule. It starts at the value specified in learning_rate and then gradually decreases.

**warmup_steps**=2:  For the first two epochs, the learning rate is adjusted by increasing its value instead of decreasing it. The aim is to stabilize the learning process.

**Gradient_accumulation_steps**=2: To save memory. I accumulate the gradients over two steps before updating the model weights.

With these parameters, I've tried to find a training setup with low memory requirements, thanks to the use of gradient accumulation, gradient checkpointing, a small batch size, and the use of bf16 along with the paged_adamw_32bit optimizer.

In [None]:
# Create DPO trainer
trainer = DPOTrainer(
    model,
    args=training_args,
    train_dataset=dataset,
    eval_dataset=dataset_eval,
    tokenizer=tokenizer,
    peft_config=peft_config,
    beta=0.1,
    max_prompt_length=2048,
    max_length=2048,
)



Map:   0%|          | 0/169 [00:00<?, ? examples/s]

Map:   0%|          | 0/28 [00:00<?, ? examples/s]

The parameters for DPOTrainer are quite simple. You need to pass it the configurations you've created, the two evaluation datasets, and the maximum length of the prompt and response.

The indicated beta value is a standard that balances the new training with the model's base knowledge. If you want the new training to have more weight, perhaps because you're training for a very specific task, you could specify a lower beta value.

In [None]:
# Fine-tune model with DPO
trainer.train()

Could not estimate the number of tokens of the input, floating-point operations will not be computed


Epoch,Training Loss,Validation Loss,Rewards/chosen,Rewards/rejected,Rewards/accuracies,Rewards/margins,Logps/rejected,Logps/chosen,Logits/rejected,Logits/chosen
0,0.556,0.467086,-0.04665,-0.827436,0.785714,0.780786,-466.585724,-197.412384,2.150229,1.9278
2,0.2176,0.423314,-0.43775,-2.05102,0.785714,1.61327,-478.821533,-201.323395,2.031296,1.829692
4,0.1522,0.443292,-0.545608,-2.304958,0.785714,1.75935,-481.360931,-202.401962,1.998406,1.805197
5,0.1481,0.445689,-0.538263,-2.287733,0.75,1.74947,-481.18869,-202.328537,2.000554,1.809534




TrainOutput(global_step=252, training_loss=0.2573409667090764, metrics={'train_runtime': 2648.8686, 'train_samples_per_second': 0.383, 'train_steps_per_second': 0.095, 'total_flos': 0.0, 'train_loss': 0.2573409667090764, 'epoch': 5.929411764705883})

It seems to have worked reasonably well, although there might be a potential overfitting issue, where the model adapts better to the training data than to the evaluation data. To mitigate overfitting, you could expand the dataset and try increasing the **lora_dropout** parameter in **LoraConfig**.


## Upload model

In [None]:
#PATH_MODEL="/content/drive/MyDrive/final_checkpoint"
PATH_MODEL="/content/drive/MyDrive/apress_checkpoint"

In [None]:
# Save artifacts
trainer.model.save_pretrained(PATH_MODEL)
tokenizer.save_pretrained(PATH_MODEL)





('/content/drive/MyDrive/apress_checkpoint/tokenizer_config.json',
 '/content/drive/MyDrive/apress_checkpoint/special_tokens_map.json',
 '/content/drive/MyDrive/apress_checkpoint/tokenizer.model',
 '/content/drive/MyDrive/apress_checkpoint/added_tokens.json',
 '/content/drive/MyDrive/apress_checkpoint/tokenizer.json')

Execute this cell only if you are having memory issues. (Not you, of course, I mean your environment 🤗).

In [None]:
#Flush memory
#del dpo_trainer, model
gc.collect()
torch.cuda.empty_cache()

Now, you're going to load the original model again, but this time in its unquantized format.

In [None]:
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    return_dict=True,
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


The original model and the saved training are being merged.

In [None]:
model = PeftModel.from_pretrained(base_model, PATH_MODEL)
model = model.merge_and_unload()

 The model that you have in memory is now a combination of the base model and the adapter that you have trained. You can now save this new model and upload it to Hugging Face.

In [None]:
model.save_pretrained(new_model)
tokenizer.save_pretrained(new_model)

('phi-3-mini-dpo-apress/tokenizer_config.json',
 'phi-3-mini-dpo-apress/special_tokens_map.json',
 'phi-3-mini-dpo-apress/tokenizer.model',
 'phi-3-mini-dpo-apress/added_tokens.json',
 'phi-3-mini-dpo-apress/tokenizer.json')

In [None]:
model.push_to_hub(new_model,
                  private=True,
                  use_temp_dir=False,
                  token=hf_token)
tokenizer.push_to_hub(new_model,
                      private=True,
                      use_temp_dir=False,
                      token=hf_token)

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/oopere/phi-3-mini-dpo-apress/commit/8f3a1219ba0aa22a62690f089fad5dd8a70be601', commit_message='Upload tokenizer', commit_description='', oid='8f3a1219ba0aa22a62690f089fad5dd8a70be601', pr_url=None, pr_revision=None, pr_num=None)

## Inference

Let's test the new model and compare with the original

In [None]:
# Format prompt
message = [
    {"role": "user", "content": "3713841893836/4?\nLimit your response to mathematical expressions and symbols."}
]

In [None]:
#new_model="oopere/martra-test-phi-3-mini-dpo"
tokenizer_new_model = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer_new_model.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline_new = transformers.pipeline(
    "text-generation",
    model=new_model,
    tokenizer=tokenizer_new_model
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
# Generate text
sequences = pipeline_new(
    prompt,
    do_sample=True,
    temperature=0.1,
    top_p=0.2,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


<s><|user|>
3713841893836/4?
Limit your response to mathematical expressions and symbols.<|end|>
<|assistant|>
 3713841893836 / 4 = 928460473459


PERFECT! The new model only returns numbers, as we want!

If you don't want to wait to the training, just test my model on hugging Face. It has been trained with the same Dataset for 2 hours in a A100 GPU on Colab.

In [None]:
#Test my Model on hugging Face.
new_model="oopere/martra-phi-3-mini-dpo"
tokenizer_new_model = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer_new_model.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline_new = transformers.pipeline(
    "text-generation",
    model=new_model,
    tokenizer=tokenizer_new_model
)



Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
# Generate text
sequences = pipeline_new(
    prompt,
    do_sample=True,
    temperature=0.1,
    top_p=0.2,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


<s><|user|>
3713841893836/4?
Limit your response to mathematical expressions and symbols.<|end|>
<|assistant|>
 3713841893836 / 4 = 928460473459


# Summary

The model alignment process has been a complete success. The truth is, with the Hugging Face libraries, everything is straightforward.

The challenge is knowing about the technique, when to apply it, and having the necessary data.

In this notebook, you've addressed the first two points.