# **Teaching SmolLM to do grammatical error correction**

The goal of this small project is to train a SmolLM-135M model to perform grammatical error correction (GEC) using the Grammarly CoEdIT dataset. This [dataset](https://huggingface.co/datasets/grammarly/coedit), derived from the [CoEdIT project](https://arxiv.org/abs/2305.09857), provides a rich collection of text editing instructions and examples. The task involves several key steps that mimic conventional alignment processes:

## **2.1 Supervised Fine-Tuning (SFT) on Training Data**

### Major steps
* Fine-tune the [SmolLM-135M model](https://huggingface.co/HuggingFaceTB/SmolLM-135M) using the CoEdIT dataset, which includes input sentences with grammatical errors and their corrected versions.
* Use the training GEC portion of the CoEdIT dataset to teach the model how to correct grammatical errors effectively.
* Calculate the BLEU score on the validation set to evaluate the model's performance in generating grammatically correct sentences. Ensure that this evaluation process is reusable for later comparisons.
* Search for an optimal set of hyperparameters, such as the learning rate.

In [None]:
! pip install datasets
! pip install trl
! pip install fast_edit_distance
! pip install evaluate

Collecting datasets
  Downloading datasets-3.0.0-py3-none-any.whl.metadata (19 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading datasets-3.0.0-py3-none-any.whl (474 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.3/474.3 kB[0m [31m33.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (39.9 MB)
[2K

In [None]:
from datasets import load_dataset
from datasets import Dataset

# Download the GEC data
full_train_ds = load_dataset("grammarly/coedit", split="train")
full_test_ds = load_dataset("grammarly/coedit", split="validation")

README.md:   0%|          | 0.00/1.88k [00:00<?, ?B/s]

train.jsonl:   0%|          | 0.00/19.7M [00:00<?, ?B/s]

validation.jsonl:   0%|          | 0.00/692k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/69071 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1712 [00:00<?, ? examples/s]

In [None]:
# Filter examples, keeping only GEC task
def filter_dataset(input_dataset: Dataset) -> Dataset:

    # Filter the dataset for GEC values only
    filtered_dataset = input_dataset.filter(lambda example: example['task'] == 'gec')

    return filtered_dataset

full_train_ds = filter_dataset(full_train_ds)
full_test_ds = filter_dataset(full_test_ds)

Filter:   0%|          | 0/69071 [00:00<?, ? examples/s]

Filter:   0%|          | 0/1712 [00:00<?, ? examples/s]

In [None]:
print(f"Number of train samples: {len(full_train_ds)}")
print(f"Number of test samples: {len(full_test_ds)}")

Number of train samples: 19823
Number of test samples: 485


Expected number of train and test samples are 19823 and 485, respectively.

In [None]:
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed

device = "cuda" if torch.cuda.is_available() else "cpu"

model_name = "HuggingFaceTB/SmolLM-135M"

# Load the model and the tokenizer from huggingface
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")

model = AutoModelForCausalLM.from_pretrained(
    model_name,
#     load_in_8bit=True,
#     attn_implementation="flash_attention_2",
#     device_map="auto"
).to(device)

# Add a padding token to the tokenizer
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': '[PAD]'})
    model.resize_token_embeddings(len(tokenizer))
    # set the pad token for the model
    model.config.pad_token_id = tokenizer.pad_token_id

In [None]:
print("The new vocab size is", model.config.vocab_size)

The new vocab size is 49153


In [None]:
# Tokenize input with tokenizer to test out the newly added pad token
inputs = tokenizer(
    ["Hello, how are you?", "I just know that I am a fine boy"],
    padding=True,  # Automatically pads to the longest sequence
    return_tensors="pt"
)

print("Input IDs:", inputs['input_ids'])
print("Attention Mask:", inputs['attention_mask'])

Input IDs: tensor([[49152, 49152, 49152, 19556,    28,   638,   359,   346,    47],
        [   57,   915,   699,   338,   339,   744,   253,  4979,  7706]])
Attention Mask: tensor([[0, 0, 0, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1]])


In [None]:
# Get the token that represents the eos in the tokenizer
eos_token = tokenizer.eos_token
eos_token_id = tokenizer.eos_token_id

print("End of sentence token: ", eos_token)
print("End of sentence token ID: ", eos_token_id)

End of sentence token:  <|endoftext|>
End of sentence token ID:  0


In [None]:
# TRL - Transformer Reinforcement Learning -- https://huggingface.co/docs/trl/en/index
from trl import SFTConfig, SFTTrainer, DataCollatorForCompletionOnlyLM

# Run SFT - Supervised Fine-Tuning
def formatting_prompts_func(input_dataset):

    output_text = []

    for i in range(len(input_dataset["src"])):

        instruction = input_dataset["src"][i]
        response = input_dataset["tgt"][i]
        # Create the instruction tuning input with an eos token
        text = f"### Instruction: {instruction}\n ### Response: {response}{tokenizer.eos_token}"

        output_text.append(text)

    return output_text

def train_model_wit_sft(
    train_dataset,
    eval_dataset,
    load=False,
    model_path=None,
    model=None
):

    if load:
        if model_path == None:
            raise ValueError(
                "If 'load' is True, 'model_path' must be provided."
            )
        model = AutoModelForCausalLM.from_pretrained(model_path).to(device)
    else:
        # Make sure the needed arguments are set
        if model == None:
            raise ValueError(
                "The 'model' must be provided when 'load' is False."
            )

#         peft_config = LoraConfig(
#             r=16,
#             lora_alpha=32,
#             lora_dropout=0.05,
#             bias="none",
#             task_type="CAUSAL_LM",
#         )

        sft_config = SFTConfig(
            output_dir="./results",        # output directory for model predictions and checkpoints
            learning_rate=1e-2,            # learning rate
            num_train_epochs=3,            # number of training epochs
#             warmup_steps=4,                # number of warmup steps for learning rate scheduler
#            gradient_accumulation_steps=3, # gradient accumulation steps
            lr_scheduler_type="cosine",    # type of learning rate scheduler
            report_to="none",              # Do not report to any model tracking software
            max_seq_length=512,            # The maximum length of each data point instance
            packing=False,                 # Don't combine different data points
            eval_strategy="steps",
        )

        instruction_template = "### Instruction:"
        response_template = " ### Response:"

        collator = DataCollatorForCompletionOnlyLM(
            instruction_template=instruction_template,
            response_template=response_template,
            tokenizer=tokenizer,
            mlm=False
        )

        trainer = SFTTrainer(
            model=model,
            train_dataset=train_dataset,
            eval_dataset=eval_dataset,
            args=sft_config,
            formatting_func=formatting_prompts_func,
            data_collator=collator,
        )

        trainer.train()
        if model_path != None:
            trainer.save_model(model_path)

    return model

In [None]:
# # Mount personal drive to save model
# from google.colab import drive
# drive.mount('/content/drive')

# Unzip the file if needed
import zipfile

with zipfile.ZipFile('/content/drive/MyDrive/Cohere Application/smol_model-20240915T135500Z-001.zip', 'r') as zip_ref:
    zip_ref.extractall('/content/drive/MyDrive/Cohere Application')

# !unzip /content/drive/MyDrive/Cohere Application/sollm.zip -d /content/drive/MyDrive/Cohere Application/

In [None]:
# model_path = '/kaggle/input/smollm/transformers/default/1/smol_model'
# model_path = '/kaggle/working/'
model_path = "/content/drive/MyDrive/Cohere Application/smol_model"

model = train_model_wit_sft(
    train_dataset=full_train_ds,
    eval_dataset=full_test_ds,
    load=True,
    model_path=model_path,
    model=model
)

In [None]:
# Example of how to run inference on a single example
text = "Fix grammatically: I likes turtles"

# Quick test if your model works properly
def format_text(text: str) -> str:

    # Formatting the input that was adopted for training
    model_input = f"### Instruction: {text}\n ### Response: "

    return model_input

def process_output(model_output):

    if "### Response:" in model_output:
        processed_output = model_output.split("### Response:")[1].strip()
    else:
        processed_output = model_output

    return processed_output

def make_inference(model, tokenizer, prompt):

    # Tokenize the iput text
    inputs = tokenizer(
        format_text(prompt),
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=128
    ).to(device)

    # Run inference
    outputs = model.generate(
        **inputs,
        max_new_tokens=128,
        pad_token_id=tokenizer.pad_token_id
    )

    # print("Original decoded: ", tokenizer.decode(outputs[0]), "\n")

    # format_text_length = len(tokenizer(format_text(prompt))['input_ids'])
    # output_text = tokenizer.decode(outputs[0][format_text_length:], skip_special_tokens=True).strip()
    # Remove the input prompt from the output
    output_text = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
    output_text = process_output(output_text)

    return output_text

print(make_inference(model, tokenizer, text))

I like turtles.


Expected output: I like turtles.

### Evaluating the fine-tuned model

In [None]:
import evaluate
import gc
from tqdm import tqdm

def batch_collate_eval(batch):

    prompts = [format_text(item["src"]) for item in batch]
    new_batch = dict(
        tokenizer(
            prompts,
            max_length=128,
            padding=True,
            truncation=True,
            return_tensors="pt"
        ).to(device)
    )
    new_batch['tgt'] = [item['tgt'] for item in batch]
    new_batch['src'] = [item['src'] for item in batch]

    return new_batch

# BLEU Score
def evaluate_model(model, tokenizer, ds):

    # TODO - compute and call preds and targets for the bleu.compute in the following.

    # Create a dataloader for the evaluation
    eval_dataloader = torch.utils.data.DataLoader(
        ds,
        batch_size=20,
        shuffle=False,
        collate_fn=batch_collate_eval
    )

    preds = []
    targets = []

    batch_bar = tqdm(total=len(eval_dataloader), dynamic_ncols=True, leave=False, position=0, desc='Evaluation')

    for i, batch in enumerate(eval_dataloader):

        input_ids = batch['input_ids']
        attention_mask = batch['attention_mask']
        tgt = batch['tgt']

        outputs = model.generate(
            input_ids=input_ids.to(device),
            attention_mask=attention_mask.to(device),
            max_new_tokens=128,
            pad_token_id=tokenizer.pad_token_id
        )

        # Remove the input prompt from the output
        output_texts = tokenizer.batch_decode(outputs, skip_special_tokens=True)
        preds.extend([process_output(text.strip()) for text in output_texts])
        targets.extend(tgt)

        batch_bar.set_postfix(iteration="{}".format(i))

        batch_bar.update()

    batch_bar.close()

    # Compute the BLEU score
    bleu = evaluate.load("bleu")
    results = bleu.compute(predictions=preds, references=targets)

    return results["bleu"]

In [None]:
# Evaluate model, use the function given above
evaluate_model(model, tokenizer, full_test_ds)



0.4561982696887417

Expected BLEU score after 1 epoch SFT is ~ 0.48.

## **2.2 Create a preference optimization dataset**

### Major steps
* *Generate Output Variants* -- For each input sentence in the training set, I'll be using the fine-tuned model to generate two different output variants.
 * Different decoding strategies will be used, such as varying the temperature or beam size, to produce diverse outputs. An approach based on the desired balance between diversity and quality will be selected.

* *Preference Annotation* -- I will then measure the edit distance between each **generated predicted variant** and **ground truth correction**. The variant with the lower edit distance will be labelled as "chosen" and the one with the higher edit distance as "rejected."

### NOTE:
Beyond using the edit distance, It would make sense to consider using a metric that can understand the meaning of words as opposed to just using the changes in the n-grams. This could help prioritize datapoints that might have been rejected because of a lot of edits but probably makes more sense grammatically and means the same thing. I could use a reference based metric like QUESTEval with the BERTScore variant to check meaning preservation in grammatical corrections. This together with a metric like GLEU which is a sentence based metric. This would however be left for future work.

In [None]:
from fast_edit_distance import edit_distance
import pandas as pd

# Create preference optimization dataset

def generate_ds_variants(model, tokenizer, ds, batch_size):

    set_seed(42) # Set a seed for repeatability

    generated_dataset = {
        "chosen_variants": [],
        "rejected_variants": [],
    }

    # Create a dataloader for the dataset
    eval_dataloader = torch.utils.data.DataLoader(
        ds,
        batch_size=batch_size,
        shuffle=False,
        collate_fn=batch_collate_eval
    )

    batch_bar = tqdm(total=len(eval_dataloader), dynamic_ncols=True, leave=False, position=0, desc='Dataset Generation')

    for i, batch in enumerate(eval_dataloader):

        input_ids = batch['input_ids']
        attention_mask = batch['attention_mask']
        ground_truth_batch = batch['tgt']

        with torch.no_grad():
            # Generate the tokens in two variants
            var1_outputs = model.generate(
                input_ids=input_ids.to(device),
                attention_mask=attention_mask.to(device),
                max_new_tokens=128,
                pad_token_id=tokenizer.pad_token_id,
                num_beams=8,
                early_stopping=True
            )
            var2_outputs = model.generate(
                        input_ids=input_ids.to(device),
                        attention_mask=attention_mask.to(device),
                        max_new_tokens=128,
                        pad_token_id=tokenizer.pad_token_id,
                        do_sample=True,
                        temperature=0.9,
                        top_k=40,
                        top_p=0.9
                    )
        # Batch decode the generated tokens
        decoded_var1_outputs = tokenizer.batch_decode(
            var1_outputs,
            skip_special_tokens=True
        )
        decoded_var2_outputs = tokenizer.batch_decode(
            var2_outputs,
            skip_special_tokens=True
        )
        # process their outputs to obtain cleaner output
        decoded_var1_outputs = [process_output(output.strip()) for output in decoded_var1_outputs]
        decoded_var2_outputs = [process_output(output.strip()) for output in decoded_var2_outputs]

        # confirm the sizes of the output
        assert len(decoded_var1_outputs) == len(decoded_var2_outputs) == len(ground_truth_batch)

        for var1, var2, ground_truth in zip(decoded_var1_outputs, decoded_var2_outputs, ground_truth_batch):

            if edit_distance(var1, ground_truth) <= edit_distance(var2, ground_truth):
                # Favour the beam search approach
                generated_dataset["chosen_variants"].append(var1)
                generated_dataset["rejected_variants"].append(var2)
            else:
                # Favour the sampling approach
                generated_dataset["chosen_variants"].append(var2)
                generated_dataset["rejected_variants"].append(var1)

            batch_bar.set_postfix(iteration="{}".format(i))

        torch.cuda.empty_cache()
        gc.collect()

        batch_bar.update()

    batch_bar.close()

    return generated_dataset

In [None]:
generated_dataset = generate_ds_variants(
  model=model,
  tokenizer=tokenizer,
  ds=full_train_ds,
  batch_size=25
)

# Create a dataframe from the dataset
generated_dataset_df = pd.DataFrame(generated_dataset)
# Save the dataset
generated_dataset_df.to_csv("/kaggle/working/generated_DPO_dataset.csv", index=False)

## NOTE: Load this dataset before running the DPO

In [None]:
# (Load and) Visualize the created dataset -- display at least 5 lines of the dataset.
generated_dataset_df = pd.read_csv("/content/drive/MyDrive/Cohere Application/generated_DPO_dataset.csv")
generated_dataset_df.head(5)

Unnamed: 0,chosen_variants,rejected_variants
0,"For example, countries with a lot of deserts c...","For example, countries with a lot of deserts c..."
1,"As the number of people grows, the need for a ...","As the number of people grows, the need for a ..."
2,Besides some technological determinists that a...,Besides some technologically determinists that...
3,Safety is one of the crucial problems that man...,5 safety is one of the crucial problems that m...
4,"On the one hand, more and more viruses and hac...","On the one hand, more and more viruses and hac..."


## **2.3 Run Direct Preference Optimization (DPO)**

### Major steps
* Using the preference optimization dataset to further train the model through DPO, a method that leverages human-like preferences for model training.
* After running DPO, measure the BLEU score on the test set. Compare this performance to the baseline established during the SFT phase.
* Search for an optimal set of hyperparameters, such as the learning rate and number of epochs.

In [None]:
# ! pip install -U flash-attn
# ! pip install -U bitsandbytes
# ! pip install -U peft

In [None]:
import os
from trl import DPOConfig, DPOTrainer, CPOTrainer, CPOConfig
from transformers import AutoModelForCausalLM
from datasets import Dataset
# from peft import LoraConfig, get_peft_model

In [None]:
# process the missing values in the generated dataset
generated_dataset_df = generated_dataset_df.fillna("")

# Add the prompt to the dataframe
generated_dataset_df["prompt"] = full_train_ds["src"]

# Define a custom template for formatting
template = "### Instruction: {}\n ### Response: {}"

# Iterate through each row using iterrows()
for index, row in generated_dataset_df.iterrows():

    formatted_text_1 = template.format(row['prompt'], row['chosen_variants'])
    formatted_text_2 = template.format(row['prompt'], row['rejected_variants'])

    generated_dataset_df.at[index, 'chosen_variants'] = formatted_text_1
    generated_dataset_df.at[index, 'rejected_variants'] = formatted_text_2


generated_dataset_df.head()

Unnamed: 0,chosen_variants,rejected_variants,prompt
0,### Instruction: Remove all grammatical errors...,### Instruction: Remove all grammatical errors...,Remove all grammatical errors from this text: ...
1,### Instruction: Improve the grammaticality: A...,### Instruction: Improve the grammaticality: A...,Improve the grammaticality: As the number of p...
2,### Instruction: Improve the grammaticality of...,### Instruction: Improve the grammaticality of...,Improve the grammaticality of this sentence: B...
3,### Instruction: Remove all grammatical errors...,### Instruction: Remove all grammatical errors...,Remove all grammatical errors from this text: ...
4,### Instruction: Fix grammaticality in this se...,### Instruction: Fix grammaticality in this se...,Fix grammaticality in this sentence: On one ha...


In [None]:
# Create the dataset for the DPO
dpo_dataset_dict = {
    "prompt": [f"### Instruction: {prompt}\n ### Response: " for prompt in full_train_ds["src"]], # Add the template for the prompt
    "chosen": generated_dataset_df["chosen_variants"].tolist(),
    "rejected": generated_dataset_df["rejected_variants"].tolist()
}

dpo_dataset = Dataset.from_dict(dpo_dataset_dict)

assert len(dpo_dataset_dict["prompt"]) == len(dpo_dataset_dict["chosen"]) == len(dpo_dataset_dict["rejected"])

In [None]:
torch.cuda.empty_cache()

In [None]:
# Run Direct Preference Optimization (DPO)

model_path = '/content/drive/MyDrive/Cohere Application/smol_model'

# Create the model to be trained
model_DPO = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
).to(device)

In [None]:
training_args = DPOConfig(
    output_dir="./results_dpo",
    beta=0.2,
    num_train_epochs=3,            # number of training epochs
    report_to="none",              # Do not report to any model tracking software
    max_length=512,                # The maximum length of each data point instance
    max_prompt_length=512,          # Don't combine different data points
    learning_rate=5e-7
)

dpo_trainer = DPOTrainer(
    model=model_DPO, # New model to be trained - Further finetuning the SFT trained model
#     ref_model=model, # The reference model - This would be the same model copied by the trainer
    args=training_args,
    train_dataset=dpo_dataset,
    tokenizer=tokenizer,  # for visual language models, use tokenizer=processor instead
)

dpo_trainer.train()



Tokenizing train dataset:   0%|          | 0/19823 [00:00<?, ? examples/s]

Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss
500,0.6259
1000,0.6255
1500,0.6119
2000,0.6227
2500,0.6114
3000,0.5883
3500,0.6018
4000,0.5989
4500,0.6048
5000,0.5967


TrainOutput(global_step=7434, training_loss=0.6039258512015108, metrics={'train_runtime': 1463.392, 'train_samples_per_second': 40.638, 'train_steps_per_second': 5.08, 'total_flos': 0.0, 'train_loss': 0.6039258512015108, 'epoch': 3.0})

In [None]:
# TODO: Evaluate model, use evaluate_model function
evaluate_model(model_DPO, tokenizer, full_test_ds)



0.449365944007056

# **Exploring Alternative DPO Variants for Improved Model Performance**

## Considering employing a different version or variant of DPO

### Major Steps
* Choose a variant of DPO or another preference-based optimization method that could potentially enhance the model's performance.
* Describe the specific differences in this approach compared to the initial DPO method used.
* Train the model using this alternative DPO method and measure its performance on the test set using the BLEU score.
* Compare these results with the baseline performance achieved during the initial Supervised Fine-Tuning (SFT) and the first DPO implementation.
* Select a few GEC example after SFT, DPO and this DPO variant phases and compare the quality of the corrections, which one is prefered as a human?

## NOTE
Since I generated the dataset using a model that is not very optimal, It would make sense to see if making reference to the original model is what keeps the model in check to be good.

It would be interesting to see how the Contrastive preference optimization performs. This procedure pays no attention to a reference model and caps the optimization on the quality of the training data.

It would also be intresting to see how adding a loss like "Simpo" influences this. This loss is supposed to stabilize the training process. I have tried other versions of DPO but they keep diverging (loss of zero) and generates nothing.

In [None]:
# Create the model to be trained
model_path = '/content/drive/MyDrive/Cohere Application/smol_model'


model_cPO = AutoModelForCausalLM.from_pretrained(model_path).to(device)

training_args = CPOConfig(
    output_dir="./results_cpo",
    beta=0.2,
    num_train_epochs=3,            # number of training epochs
    report_to="none",              # Do not report to any model tracking software
    max_length=512,                # The maximum length of each data point instance
    max_prompt_length=512,          # Don't combine different data points
)

cpo_trainer = CPOTrainer(
    model=model_cPO, # New model to be trained - Further finetuning the SFT trained model
    args=training_args,
    train_dataset=dpo_dataset,
    tokenizer=tokenizer,  # for visual language models, use tokenizer=processor instead
)

cpo_trainer.train()



Map:   0%|          | 0/19823 [00:00<?, ? examples/s]

Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss
500,0.6788
1000,0.6655
1500,0.6661
2000,0.6634
2500,0.6544
3000,0.4863
3500,0.4829
4000,0.4811
4500,0.4903
5000,0.4722


TrainOutput(global_step=7434, training_loss=0.5250388210359234, metrics={'train_runtime': 1446.9556, 'train_samples_per_second': 41.099, 'train_steps_per_second': 5.138, 'total_flos': 0.0, 'train_loss': 0.5250388210359234, 'epoch': 3.0})

In [None]:
evaluate_model(model_cPO, tokenizer, full_test_ds)



0.4260040325311729

In [None]:
# Sample a small portion of the dataset
num_samples = 5
sampled = full_test_ds.select(range(num_samples))

In [None]:
print("ORIGINAL INSTRUCTIONS\n")
for sample in sampled:
  print(sample["src"])
  print()

print("SFT GENERATIONS\n")
for sample in sampled:
  print(make_inference(model, tokenizer, sample["src"]))
  print()

print("DPO GENERATIONS\n")
for sample in sampled:
  print(make_inference(model_DPO, tokenizer, sample["src"]))
  print()

print("CPO GENERATIONS\n")
for sample in sampled:
  print(make_inference(model_cPO, tokenizer, sample["src"]))
  print()

ORIGINAL INSTRUCTIONS

Fix grammaticality: First of all, from you read just to found in the poems or novel what well-known critic have already found out, you looses the pleasures of reading something which is expecting to be a new experience to you.

Fix grammatical errors: Their research shown that before Hurricane Sandy only " about 50 percent during resident used the emergency departments, " and " only about 35 percents sought inpatient cares there and less than 10 percent used the hospitals when needing surgeries with any kind. "

Fix grammar: It been widely blelieved tha every student interested within some subject which might not be interested by other students so it is difficult to forced students to study subjects which they unwilling to study it, otherwise they will fail in it and because of that they will feel disappointed to do any thing and this a significant issue.

Fix grammatical errors: This is why I totally agree like the following comments: " My upbringings teaches me

COMMENTS:

It is really interesting to find out that despite the BLEU score evaluation saying that SFT is better, the DPO gives better results when read.

As a human, I would go for the DPO results. This is probably a shortcoming of some automatic metrics that do not take word meanings into consideration.

My other experiment with CPO was a complete failure. It is important to make the reference model a guide and not stray away from it based on the dataset. It makes more sense since the dataset is also not of a trusted quality. Given more time, I would have used a metric like SOME (It optimizes human evaluations by fine-tuning BERT separately for each of the following criteria: grammaticality, fluency, and meaning preservation.) for the dataset preparation, optimized the SFT model and used a better diversity criteria that will give two different quality results. This was fun though, I learned more than I anticipated.