## GPT for code dictation

The challenge is twofold. First, we have to get GPT-2 to understand LaTeX, which is quite different from the natural languages like English on which GPT-2 was initially trained. Second, we have to teach GPT-2 to translate text from English to LaTeX, a task that not only involves language translation but also requires an understanding of the context and semantics of the text.

Our data? This might come as a shock, but there are not dataset for this specific task anywhere online. So, we took it upon ourself to write 50 simple examples of English to LaTeX translation. This is by far the smallest dataset used in this book, but it will be a great aid in exploring just how much transfer learning will help us here. With only 50 examples, we will need to rely on GPT-2 recognition of a translation task and its ability to transfer that knowledge to this task.

In [1]:
from transformers import AutoTokenizer, TextDataset, DataCollatorForLanguageModeling, AutoModelForCausalLM, pipeline, \
                         Trainer, TrainingArguments
import pandas as pd
from datasets import Dataset


In [2]:
MODEL = 'gpt2'

tokenizer = AutoTokenizer.from_pretrained(MODEL)  # load up a standard gpt2 model

tokenizer.pad_token = tokenizer.eos_token  # set the pad token to avoid a warning


In [3]:
data = pd.read_csv('english_to_latex.csv')

print(data.shape)

data.head(2)

(50, 2)


Unnamed: 0,English,LaTeX
0,integral from a to b of x squared,"\int_{a}^{b} x^2 \,dx"
1,integral from negative 1 to 1 of x squared,"\int_{-1}^{1} x^2 \,dx"


In [4]:
data.head(10)

Unnamed: 0,English,LaTeX
0,integral from a to b of x squared,"\int_{a}^{b} x^2 \,dx"
1,integral from negative 1 to 1 of x squared,"\int_{-1}^{1} x^2 \,dx"
2,integral from negative 1 to infinity of x cubed,"\int_{-1}^{\inf} x^3 \,dx"
3,integral from 0 to infinity of x squared,"\int_{0}^{\inf} x^2 \,dx"
4,integral from 0 to infinity of y squared,"\int_{0}^{\inf} y^2 \,dy"
5,integral from 1 to 2 of x over 2,"\int_{1}^{2} \frac{x}{2} \,dx"
6,f of x equals x squared,f(x) = x^2
7,h of x equals x squared,h(x) = x^2
8,g of x equals x squared,g(x) = x^2
9,g of x equals x to the eighth power,g(x) = x^8


In [5]:
# Add our singular prompt
CONVERSION_PROMPT = 'Convert English to LaTeX\n'  # LaTeX conversion task

CONVERSION_TOKEN = 'LaTeX:'


# This is our "training prompt" that we want GPT2 to recognize and learn
training_examples = f'{CONVERSION_PROMPT}English: ' + data['English'] + '\n' + CONVERSION_TOKEN + ' ' + data['LaTeX'].astype(str)

print(training_examples[0])


Convert English to LaTeX
English: integral from a to b of x squared
LaTeX: \int_{a}^{b} x^2 \,dx


In [6]:
task_df = pd.DataFrame({'text': training_examples})

task_df.head(2)

Unnamed: 0,text
0,Convert English to LaTeX\nEnglish: integral fr...
1,Convert English to LaTeX\nEnglish: integral fr...


In [7]:
# adding the EOS token at the end so the model knows when to stop predicting

task_df['text'] = task_df['text'].map(lambda x: f'{x}{tokenizer.eos_token}')

In [8]:
latex_data = Dataset.from_pandas(task_df)  # turn a pandas DataFrame into a Dataset



In [9]:
latex_data

Dataset({
    features: ['text'],
    num_rows: 50
})

In [10]:
def preprocess(examples):  
    # tokenize our text but don't pad because our collator will pad for us dynamically
    return tokenizer(examples['text'], truncation=True)

latex_data = latex_data.map(preprocess, batched=True)

latex_data = latex_data.train_test_split(train_size=.8)

Map:   0%|          | 0/50 [00:00<?, ? examples/s]

In [11]:
latex_data

DatasetDict({
    train: Dataset({
        features: ['text', 'input_ids', 'attention_mask'],
        num_rows: 40
    })
    test: Dataset({
        features: ['text', 'input_ids', 'attention_mask'],
        num_rows: 10
    })
})

In [12]:
latex_data['train'][0]

{'text': 'Convert English to LaTeX\nEnglish: y squared over x^2\nLaTeX: \\frac{y^2}{x^2}<|endoftext|>',
 'input_ids': [3103,
  1851,
  3594,
  284,
  4689,
  49568,
  198,
  15823,
  25,
  331,
  44345,
  625,
  2124,
  61,
  17,
  198,
  14772,
  49568,
  25,
  3467,
  31944,
  90,
  88,
  61,
  17,
  18477,
  87,
  61,
  17,
  92,
  50256],
 'attention_mask': [1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1]}

In [13]:
# standard data collator for auto-regressive language modelling
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

In [14]:
latex_gpt2 = AutoModelForCausalLM.from_pretrained(MODEL)

In [12]:
latex_data

DatasetDict({
    train: Dataset({
        features: ['text', 'input_ids', 'attention_mask'],
        num_rows: 40
    })
    test: Dataset({
        features: ['text', 'input_ids', 'attention_mask'],
        num_rows: 10
    })
})

In [15]:
latex_gpt2

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

# Attempt 1 at fine-tuning GPT2 at a LaTeX conversion task

In [18]:
training_args = TrainingArguments(
    output_dir="./english_to_latex",
    overwrite_output_dir=True, # overwrite the content of the output directory
    num_train_epochs=5, # number of training epochs
    per_device_train_batch_size=1, # batch size for training
    per_device_eval_batch_size=20,  # batch size for evaluation
    load_best_model_at_end=True,
    logging_steps=5,
    log_level='info',
    evaluation_strategy='epoch',
    save_strategy='epoch',
    #use_mps_device=True
)

trainer = Trainer(
    model=latex_gpt2,
    args=training_args,
    train_dataset=latex_data["train"],
    eval_dataset=latex_data["test"],
    data_collator=data_collator,
)

trainer.evaluate()

The following columns in the evaluation set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: text. If text are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 10
  Batch size = 20
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


  0%|          | 0/1 [00:00<?, ?it/s]

Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mgagfaster[0m. Use [1m`wandb login --relogin`[0m to force relogin


{'eval_loss': 4.203288555145264,
 'eval_runtime': 3.4476,
 'eval_samples_per_second': 2.901,
 'eval_steps_per_second': 0.29}

In [19]:
trainer.train()

The following columns in the training set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: text. If text are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 40
  Num Epochs = 5
  Instantaneous batch size per device = 1
  Total train batch size (w. parallel, distributed & accumulation) = 1
  Gradient Accumulation steps = 1
  Total optimization steps = 200
  Number of trainable parameters = 124,439,808


  0%|          | 0/200 [00:00<?, ?it/s]

{'loss': 4.1111, 'learning_rate': 4.875e-05, 'epoch': 0.12}
{'loss': 2.5837, 'learning_rate': 4.75e-05, 'epoch': 0.25}
{'loss': 1.6822, 'learning_rate': 4.6250000000000006e-05, 'epoch': 0.38}
{'loss': 1.7097, 'learning_rate': 4.5e-05, 'epoch': 0.5}
{'loss': 1.3972, 'learning_rate': 4.375e-05, 'epoch': 0.62}
{'loss': 1.2991, 'learning_rate': 4.25e-05, 'epoch': 0.75}
{'loss': 1.1156, 'learning_rate': 4.125e-05, 'epoch': 0.88}


The following columns in the evaluation set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: text. If text are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 10
  Batch size = 20


{'loss': 1.2092, 'learning_rate': 4e-05, 'epoch': 1.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./english_to_latex\checkpoint-40
Configuration saved in ./english_to_latex\checkpoint-40\config.json
Configuration saved in ./english_to_latex\checkpoint-40\generation_config.json


{'eval_loss': 0.9055206179618835, 'eval_runtime': 2.8575, 'eval_samples_per_second': 3.5, 'eval_steps_per_second': 0.35, 'epoch': 1.0}


Model weights saved in ./english_to_latex\checkpoint-40\pytorch_model.bin


{'loss': 0.6179, 'learning_rate': 3.875e-05, 'epoch': 1.12}
{'loss': 0.8465, 'learning_rate': 3.7500000000000003e-05, 'epoch': 1.25}
{'loss': 0.5951, 'learning_rate': 3.625e-05, 'epoch': 1.38}
{'loss': 0.82, 'learning_rate': 3.5e-05, 'epoch': 1.5}
{'loss': 0.6109, 'learning_rate': 3.375000000000001e-05, 'epoch': 1.62}
{'loss': 0.7012, 'learning_rate': 3.2500000000000004e-05, 'epoch': 1.75}
{'loss': 0.9065, 'learning_rate': 3.125e-05, 'epoch': 1.88}


The following columns in the evaluation set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: text. If text are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 10
  Batch size = 20


{'loss': 0.811, 'learning_rate': 3e-05, 'epoch': 2.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./english_to_latex\checkpoint-80
Configuration saved in ./english_to_latex\checkpoint-80\config.json
Configuration saved in ./english_to_latex\checkpoint-80\generation_config.json


{'eval_loss': 0.7492607831954956, 'eval_runtime': 2.6741, 'eval_samples_per_second': 3.74, 'eval_steps_per_second': 0.374, 'epoch': 2.0}


Model weights saved in ./english_to_latex\checkpoint-80\pytorch_model.bin


{'loss': 0.7313, 'learning_rate': 2.8749999999999997e-05, 'epoch': 2.12}
{'loss': 0.4878, 'learning_rate': 2.7500000000000004e-05, 'epoch': 2.25}
{'loss': 0.6402, 'learning_rate': 2.625e-05, 'epoch': 2.38}
{'loss': 0.3136, 'learning_rate': 2.5e-05, 'epoch': 2.5}
{'loss': 0.448, 'learning_rate': 2.375e-05, 'epoch': 2.62}
{'loss': 0.455, 'learning_rate': 2.25e-05, 'epoch': 2.75}
{'loss': 0.5764, 'learning_rate': 2.125e-05, 'epoch': 2.88}


The following columns in the evaluation set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: text. If text are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 10
  Batch size = 20


{'loss': 0.6171, 'learning_rate': 2e-05, 'epoch': 3.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./english_to_latex\checkpoint-120
Configuration saved in ./english_to_latex\checkpoint-120\config.json
Configuration saved in ./english_to_latex\checkpoint-120\generation_config.json


{'eval_loss': 0.7866764068603516, 'eval_runtime': 2.614, 'eval_samples_per_second': 3.826, 'eval_steps_per_second': 0.383, 'epoch': 3.0}


Model weights saved in ./english_to_latex\checkpoint-120\pytorch_model.bin


{'loss': 0.5864, 'learning_rate': 1.8750000000000002e-05, 'epoch': 3.12}
{'loss': 0.5278, 'learning_rate': 1.75e-05, 'epoch': 3.25}
{'loss': 0.511, 'learning_rate': 1.6250000000000002e-05, 'epoch': 3.38}
{'loss': 0.428, 'learning_rate': 1.5e-05, 'epoch': 3.5}
{'loss': 0.3756, 'learning_rate': 1.3750000000000002e-05, 'epoch': 3.62}
{'loss': 0.5387, 'learning_rate': 1.25e-05, 'epoch': 3.75}
{'loss': 0.4331, 'learning_rate': 1.125e-05, 'epoch': 3.88}


The following columns in the evaluation set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: text. If text are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 10
  Batch size = 20


{'loss': 0.5542, 'learning_rate': 1e-05, 'epoch': 4.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./english_to_latex\checkpoint-160
Configuration saved in ./english_to_latex\checkpoint-160\config.json
Configuration saved in ./english_to_latex\checkpoint-160\generation_config.json


{'eval_loss': 0.7714928388595581, 'eval_runtime': 2.449, 'eval_samples_per_second': 4.083, 'eval_steps_per_second': 0.408, 'epoch': 4.0}


Model weights saved in ./english_to_latex\checkpoint-160\pytorch_model.bin


{'loss': 0.4106, 'learning_rate': 8.75e-06, 'epoch': 4.12}
{'loss': 0.3094, 'learning_rate': 7.5e-06, 'epoch': 4.25}
{'loss': 0.2339, 'learning_rate': 6.25e-06, 'epoch': 4.38}
{'loss': 0.3559, 'learning_rate': 5e-06, 'epoch': 4.5}
{'loss': 0.4612, 'learning_rate': 3.75e-06, 'epoch': 4.62}
{'loss': 0.479, 'learning_rate': 2.5e-06, 'epoch': 4.75}
{'loss': 0.4318, 'learning_rate': 1.25e-06, 'epoch': 4.88}


The following columns in the evaluation set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: text. If text are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 10
  Batch size = 20


{'loss': 0.4322, 'learning_rate': 0.0, 'epoch': 5.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./english_to_latex\checkpoint-200
Configuration saved in ./english_to_latex\checkpoint-200\config.json
Configuration saved in ./english_to_latex\checkpoint-200\generation_config.json


{'eval_loss': 0.7662519812583923, 'eval_runtime': 2.3396, 'eval_samples_per_second': 4.274, 'eval_steps_per_second': 0.427, 'epoch': 5.0}


Model weights saved in ./english_to_latex\checkpoint-200\pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./english_to_latex\checkpoint-80 (score: 0.7492607831954956).


{'train_runtime': 527.7856, 'train_samples_per_second': 0.379, 'train_steps_per_second': 0.379, 'train_loss': 0.8088667315244674, 'epoch': 5.0}


TrainOutput(global_step=200, training_loss=0.8088667315244674, metrics={'train_runtime': 527.7856, 'train_samples_per_second': 0.379, 'train_steps_per_second': 0.379, 'train_loss': 0.8088667315244674, 'epoch': 5.0})

In [21]:
book_data = TextDataset(
    tokenizer=tokenizer,
    file_path='latex-guide-cos423.txt',  # train on a LaTeX cheat sheet they made
    block_size=128
)

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=False,  # MLM is Masked Language Modelling
)

latex_gpt2 = AutoModelForCausalLM.from_pretrained(MODEL)

training_args = TrainingArguments(
    output_dir="./math_book",
    overwrite_output_dir=True, # overwrite the content of the output directory
    num_train_epochs=10, # number of training epochs
    per_device_train_batch_size=2, # batch size for training
    per_device_eval_batch_size=32,  # batch size for evaluation
    load_best_model_at_end=True,
    logging_steps=10,
    evaluation_strategy='epoch',
    save_strategy='epoch',
    #use_mps_device=True
)

trainer = Trainer(
    model=latex_gpt2,
    args=training_args,
    data_collator=data_collator,
    train_dataset=book_data.examples[:int(len(book_data.examples)*.8)],
    eval_dataset=book_data.examples[int(len(book_data.examples)*.8):]
)

Loading features from cached file cached_lm_GPT2TokenizerFast_128_latex-guide-cos423.txt [took 0.100 s]
loading configuration file config.json from cache at C:\Users\user/.cache\huggingface\hub\models--gpt2\snapshots\11c5a3d5811f50298f278a704980280950aedb10\config.json
Model config GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_par

In [22]:
trainer.evaluate()  # initial loss for the cheat sheet

***** Running Evaluation *****
  Num examples = 12
  Batch size = 32


  0%|          | 0/1 [00:00<?, ?it/s]

Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"


{'eval_loss': 2.3705146312713623,
 'eval_runtime': 9.7187,
 'eval_samples_per_second': 1.235,
 'eval_steps_per_second': 0.103}

In [23]:
trainer.train()

***** Running training *****
  Num examples = 47
  Num Epochs = 10
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 1
  Total optimization steps = 240
  Number of trainable parameters = 124,439,808


  0%|          | 0/240 [00:00<?, ?it/s]

{'loss': 2.3348, 'learning_rate': 4.791666666666667e-05, 'epoch': 0.42}
{'loss': 1.8919, 'learning_rate': 4.5833333333333334e-05, 'epoch': 0.83}


***** Running Evaluation *****
  Num examples = 12
  Batch size = 32


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./math_book\checkpoint-24
Configuration saved in ./math_book\checkpoint-24\config.json
Configuration saved in ./math_book\checkpoint-24\generation_config.json


{'eval_loss': 1.800382137298584, 'eval_runtime': 9.1703, 'eval_samples_per_second': 1.309, 'eval_steps_per_second': 0.109, 'epoch': 1.0}


Model weights saved in ./math_book\checkpoint-24\pytorch_model.bin


{'loss': 1.5392, 'learning_rate': 4.375e-05, 'epoch': 1.25}
{'loss': 1.7714, 'learning_rate': 4.166666666666667e-05, 'epoch': 1.67}


***** Running Evaluation *****
  Num examples = 12
  Batch size = 32


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./math_book\checkpoint-48
Configuration saved in ./math_book\checkpoint-48\config.json
Configuration saved in ./math_book\checkpoint-48\generation_config.json


{'eval_loss': 1.7820323705673218, 'eval_runtime': 8.4437, 'eval_samples_per_second': 1.421, 'eval_steps_per_second': 0.118, 'epoch': 2.0}


Model weights saved in ./math_book\checkpoint-48\pytorch_model.bin


{'loss': 1.758, 'learning_rate': 3.958333333333333e-05, 'epoch': 2.08}
{'loss': 1.3062, 'learning_rate': 3.7500000000000003e-05, 'epoch': 2.5}
{'loss': 1.42, 'learning_rate': 3.541666666666667e-05, 'epoch': 2.92}


***** Running Evaluation *****
  Num examples = 12
  Batch size = 32


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./math_book\checkpoint-72
Configuration saved in ./math_book\checkpoint-72\config.json
Configuration saved in ./math_book\checkpoint-72\generation_config.json


{'eval_loss': 1.7941612005233765, 'eval_runtime': 9.2241, 'eval_samples_per_second': 1.301, 'eval_steps_per_second': 0.108, 'epoch': 3.0}


Model weights saved in ./math_book\checkpoint-72\pytorch_model.bin


{'loss': 0.9094, 'learning_rate': 3.3333333333333335e-05, 'epoch': 3.33}
{'loss': 1.2751, 'learning_rate': 3.125e-05, 'epoch': 3.75}


***** Running Evaluation *****
  Num examples = 12
  Batch size = 32


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./math_book\checkpoint-96
Configuration saved in ./math_book\checkpoint-96\config.json
Configuration saved in ./math_book\checkpoint-96\generation_config.json


{'eval_loss': 1.8194289207458496, 'eval_runtime': 9.0477, 'eval_samples_per_second': 1.326, 'eval_steps_per_second': 0.111, 'epoch': 4.0}


Model weights saved in ./math_book\checkpoint-96\pytorch_model.bin


{'loss': 1.1598, 'learning_rate': 2.916666666666667e-05, 'epoch': 4.17}
{'loss': 0.9004, 'learning_rate': 2.7083333333333332e-05, 'epoch': 4.58}


***** Running Evaluation *****
  Num examples = 12
  Batch size = 32


{'loss': 1.2563, 'learning_rate': 2.5e-05, 'epoch': 5.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./math_book\checkpoint-120
Configuration saved in ./math_book\checkpoint-120\config.json
Configuration saved in ./math_book\checkpoint-120\generation_config.json


{'eval_loss': 1.8632580041885376, 'eval_runtime': 9.1643, 'eval_samples_per_second': 1.309, 'eval_steps_per_second': 0.109, 'epoch': 5.0}


Model weights saved in ./math_book\checkpoint-120\pytorch_model.bin


{'loss': 0.8823, 'learning_rate': 2.2916666666666667e-05, 'epoch': 5.42}
{'loss': 1.0693, 'learning_rate': 2.0833333333333336e-05, 'epoch': 5.83}


***** Running Evaluation *****
  Num examples = 12
  Batch size = 32


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./math_book\checkpoint-144
Configuration saved in ./math_book\checkpoint-144\config.json
Configuration saved in ./math_book\checkpoint-144\generation_config.json


{'eval_loss': 1.9114995002746582, 'eval_runtime': 8.7895, 'eval_samples_per_second': 1.365, 'eval_steps_per_second': 0.114, 'epoch': 6.0}


Model weights saved in ./math_book\checkpoint-144\pytorch_model.bin


{'loss': 0.9486, 'learning_rate': 1.8750000000000002e-05, 'epoch': 6.25}
{'loss': 0.9845, 'learning_rate': 1.6666666666666667e-05, 'epoch': 6.67}


***** Running Evaluation *****
  Num examples = 12
  Batch size = 32


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./math_book\checkpoint-168
Configuration saved in ./math_book\checkpoint-168\config.json
Configuration saved in ./math_book\checkpoint-168\generation_config.json


{'eval_loss': 1.9399720430374146, 'eval_runtime': 9.2008, 'eval_samples_per_second': 1.304, 'eval_steps_per_second': 0.109, 'epoch': 7.0}


Model weights saved in ./math_book\checkpoint-168\pytorch_model.bin


{'loss': 0.7079, 'learning_rate': 1.4583333333333335e-05, 'epoch': 7.08}
{'loss': 0.7384, 'learning_rate': 1.25e-05, 'epoch': 7.5}
{'loss': 0.7703, 'learning_rate': 1.0416666666666668e-05, 'epoch': 7.92}


***** Running Evaluation *****
  Num examples = 12
  Batch size = 32


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./math_book\checkpoint-192
Configuration saved in ./math_book\checkpoint-192\config.json
Configuration saved in ./math_book\checkpoint-192\generation_config.json


{'eval_loss': 1.9770740270614624, 'eval_runtime': 8.6418, 'eval_samples_per_second': 1.389, 'eval_steps_per_second': 0.116, 'epoch': 8.0}


Model weights saved in ./math_book\checkpoint-192\pytorch_model.bin


{'loss': 0.9469, 'learning_rate': 8.333333333333334e-06, 'epoch': 8.33}
{'loss': 0.7242, 'learning_rate': 6.25e-06, 'epoch': 8.75}


***** Running Evaluation *****
  Num examples = 12
  Batch size = 32


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./math_book\checkpoint-216
Configuration saved in ./math_book\checkpoint-216\config.json
Configuration saved in ./math_book\checkpoint-216\generation_config.json


{'eval_loss': 1.9836373329162598, 'eval_runtime': 9.1966, 'eval_samples_per_second': 1.305, 'eval_steps_per_second': 0.109, 'epoch': 9.0}


Model weights saved in ./math_book\checkpoint-216\pytorch_model.bin


{'loss': 0.707, 'learning_rate': 4.166666666666667e-06, 'epoch': 9.17}
{'loss': 0.655, 'learning_rate': 2.0833333333333334e-06, 'epoch': 9.58}


***** Running Evaluation *****
  Num examples = 12
  Batch size = 32


{'loss': 0.7198, 'learning_rate': 0.0, 'epoch': 10.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./math_book\checkpoint-240
Configuration saved in ./math_book\checkpoint-240\config.json
Configuration saved in ./math_book\checkpoint-240\generation_config.json


{'eval_loss': 1.9988597631454468, 'eval_runtime': 9.1585, 'eval_samples_per_second': 1.31, 'eval_steps_per_second': 0.109, 'epoch': 10.0}


Model weights saved in ./math_book\checkpoint-240\pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./math_book\checkpoint-48 (score: 1.7820323705673218).


{'train_runtime': 1540.3496, 'train_samples_per_second': 0.305, 'train_steps_per_second': 0.156, 'train_loss': 1.1407005727291106, 'epoch': 10.0}


TrainOutput(global_step=240, training_loss=1.1407005727291106, metrics={'train_runtime': 1540.3496, 'train_samples_per_second': 0.305, 'train_steps_per_second': 0.156, 'train_loss': 1.1407005727291106, 'epoch': 10.0})

In [24]:
trainer.save_model()

Saving model checkpoint to ./math_book
Configuration saved in ./math_book\config.json
Configuration saved in ./math_book\generation_config.json
Model weights saved in ./math_book\pytorch_model.bin


# Restart training now with our own pre-trained "foundation" model

In [26]:
# load up our gpt pre-trained on latex cheat sheets
math_latex_gpt2 = AutoModelForCausalLM.from_pretrained('./math_book')

training_args = TrainingArguments(
    output_dir="./math_english_to_latex",
    overwrite_output_dir=True, #overwrite the content of the output directory
    num_train_epochs=5, # number of training epochs
    per_device_train_batch_size=1, # batch size for training
    per_device_eval_batch_size=20,  # batch size for evaluation
    load_best_model_at_end=True,
    logging_steps=5,
    log_level='info',
    evaluation_strategy='epoch',
    save_strategy='epoch',
    #use_mps_device=True
)

trainer = Trainer(
    model=math_latex_gpt2,
    args=training_args,
    train_dataset=latex_data["train"],
    eval_dataset=latex_data["test"],
    data_collator=data_collator,
)

trainer.evaluate()  # loss is starting slightly lower than before

loading configuration file ./math_book\config.json
Model config GPT2Config {
  "_name_or_path": "./math_book",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "torch_dtype": "float32",
  "transformers_version": "4.30.2",
  "use_cache": true,
  "vocab_size": 50257
}

loading we

Generate config GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 50256,
  "eos_token_id": 50256,
  "transformers_version": "4.30.2"
}

All model checkpoint weights were used when initializing GPT2LMHeadModel.

All the weights of GPT2LMHeadModel were initialized from the model checkpoint at ./math_book.
If your task is similar to the task the model of the checkpoint was trained on, you can already use GPT2LMHeadModel for predictions without further training.
loading configuration file ./math_book\generation_config.json
Generate config GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 50256,
  "eos_token_id": 50256,
  "transformers_version": "4.30.2"
}

Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors!
PyTorch: setting up 

  0%|          | 0/1 [00:00<?, ?it/s]

Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"


{'eval_loss': 3.7863895893096924,
 'eval_runtime': 4.7473,
 'eval_samples_per_second': 2.106,
 'eval_steps_per_second': 0.211}

In [27]:
trainer.train()

The following columns in the training set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: text. If text are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 40
  Num Epochs = 5
  Instantaneous batch size per device = 1
  Total train batch size (w. parallel, distributed & accumulation) = 1
  Gradient Accumulation steps = 1
  Total optimization steps = 200
  Number of trainable parameters = 124,439,808


  0%|          | 0/200 [00:00<?, ?it/s]

{'loss': 3.7394, 'learning_rate': 4.875e-05, 'epoch': 0.12}
{'loss': 2.262, 'learning_rate': 4.75e-05, 'epoch': 0.25}
{'loss': 1.4936, 'learning_rate': 4.6250000000000006e-05, 'epoch': 0.38}
{'loss': 1.7374, 'learning_rate': 4.5e-05, 'epoch': 0.5}
{'loss': 1.3354, 'learning_rate': 4.375e-05, 'epoch': 0.62}
{'loss': 1.2795, 'learning_rate': 4.25e-05, 'epoch': 0.75}
{'loss': 1.0899, 'learning_rate': 4.125e-05, 'epoch': 0.88}


The following columns in the evaluation set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: text. If text are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 10
  Batch size = 20


{'loss': 1.1859, 'learning_rate': 4e-05, 'epoch': 1.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./math_english_to_latex\checkpoint-40
Configuration saved in ./math_english_to_latex\checkpoint-40\config.json
Configuration saved in ./math_english_to_latex\checkpoint-40\generation_config.json


{'eval_loss': 0.9071900248527527, 'eval_runtime': 2.5106, 'eval_samples_per_second': 3.983, 'eval_steps_per_second': 0.398, 'epoch': 1.0}


Model weights saved in ./math_english_to_latex\checkpoint-40\pytorch_model.bin


{'loss': 0.5503, 'learning_rate': 3.875e-05, 'epoch': 1.12}
{'loss': 0.7844, 'learning_rate': 3.7500000000000003e-05, 'epoch': 1.25}
{'loss': 0.555, 'learning_rate': 3.625e-05, 'epoch': 1.38}
{'loss': 0.7919, 'learning_rate': 3.5e-05, 'epoch': 1.5}
{'loss': 0.5989, 'learning_rate': 3.375000000000001e-05, 'epoch': 1.62}
{'loss': 0.6916, 'learning_rate': 3.2500000000000004e-05, 'epoch': 1.75}
{'loss': 0.9072, 'learning_rate': 3.125e-05, 'epoch': 1.88}


The following columns in the evaluation set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: text. If text are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 10
  Batch size = 20


{'loss': 0.8636, 'learning_rate': 3e-05, 'epoch': 2.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./math_english_to_latex\checkpoint-80
Configuration saved in ./math_english_to_latex\checkpoint-80\config.json
Configuration saved in ./math_english_to_latex\checkpoint-80\generation_config.json


{'eval_loss': 0.7646149396896362, 'eval_runtime': 2.4512, 'eval_samples_per_second': 4.08, 'eval_steps_per_second': 0.408, 'epoch': 2.0}


Model weights saved in ./math_english_to_latex\checkpoint-80\pytorch_model.bin


{'loss': 0.7774, 'learning_rate': 2.8749999999999997e-05, 'epoch': 2.12}
{'loss': 0.4572, 'learning_rate': 2.7500000000000004e-05, 'epoch': 2.25}
{'loss': 0.6182, 'learning_rate': 2.625e-05, 'epoch': 2.38}
{'loss': 0.3167, 'learning_rate': 2.5e-05, 'epoch': 2.5}
{'loss': 0.4562, 'learning_rate': 2.375e-05, 'epoch': 2.62}
{'loss': 0.4961, 'learning_rate': 2.25e-05, 'epoch': 2.75}
{'loss': 0.5313, 'learning_rate': 2.125e-05, 'epoch': 2.88}


The following columns in the evaluation set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: text. If text are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 10
  Batch size = 20


{'loss': 0.5984, 'learning_rate': 2e-05, 'epoch': 3.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./math_english_to_latex\checkpoint-120
Configuration saved in ./math_english_to_latex\checkpoint-120\config.json
Configuration saved in ./math_english_to_latex\checkpoint-120\generation_config.json


{'eval_loss': 0.7987378835678101, 'eval_runtime': 2.5266, 'eval_samples_per_second': 3.958, 'eval_steps_per_second': 0.396, 'epoch': 3.0}


Model weights saved in ./math_english_to_latex\checkpoint-120\pytorch_model.bin


{'loss': 0.6198, 'learning_rate': 1.8750000000000002e-05, 'epoch': 3.12}
{'loss': 0.5154, 'learning_rate': 1.75e-05, 'epoch': 3.25}
{'loss': 0.5097, 'learning_rate': 1.6250000000000002e-05, 'epoch': 3.38}
{'loss': 0.4074, 'learning_rate': 1.5e-05, 'epoch': 3.5}
{'loss': 0.4055, 'learning_rate': 1.3750000000000002e-05, 'epoch': 3.62}
{'loss': 0.5098, 'learning_rate': 1.25e-05, 'epoch': 3.75}
{'loss': 0.4168, 'learning_rate': 1.125e-05, 'epoch': 3.88}


The following columns in the evaluation set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: text. If text are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 10
  Batch size = 20


{'loss': 0.512, 'learning_rate': 1e-05, 'epoch': 4.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./math_english_to_latex\checkpoint-160
Configuration saved in ./math_english_to_latex\checkpoint-160\config.json
Configuration saved in ./math_english_to_latex\checkpoint-160\generation_config.json


{'eval_loss': 0.7477323412895203, 'eval_runtime': 2.5146, 'eval_samples_per_second': 3.977, 'eval_steps_per_second': 0.398, 'epoch': 4.0}


Model weights saved in ./math_english_to_latex\checkpoint-160\pytorch_model.bin


{'loss': 0.4391, 'learning_rate': 8.75e-06, 'epoch': 4.12}
{'loss': 0.3068, 'learning_rate': 7.5e-06, 'epoch': 4.25}
{'loss': 0.2539, 'learning_rate': 6.25e-06, 'epoch': 4.38}
{'loss': 0.35, 'learning_rate': 5e-06, 'epoch': 4.5}
{'loss': 0.4575, 'learning_rate': 3.75e-06, 'epoch': 4.62}
{'loss': 0.4442, 'learning_rate': 2.5e-06, 'epoch': 4.75}
{'loss': 0.3808, 'learning_rate': 1.25e-06, 'epoch': 4.88}


The following columns in the evaluation set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: text. If text are not expected by `GPT2LMHeadModel.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 10
  Batch size = 20


{'loss': 0.4189, 'learning_rate': 0.0, 'epoch': 5.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to ./math_english_to_latex\checkpoint-200
Configuration saved in ./math_english_to_latex\checkpoint-200\config.json
Configuration saved in ./math_english_to_latex\checkpoint-200\generation_config.json


{'eval_loss': 0.7468925714492798, 'eval_runtime': 2.5896, 'eval_samples_per_second': 3.862, 'eval_steps_per_second': 0.386, 'epoch': 5.0}


Model weights saved in ./math_english_to_latex\checkpoint-200\pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./math_english_to_latex\checkpoint-200 (score: 0.7468925714492798).


{'train_runtime': 559.3815, 'train_samples_per_second': 0.358, 'train_steps_per_second': 0.358, 'train_loss': 0.7766265428066254, 'epoch': 5.0}


TrainOutput(global_step=200, training_loss=0.7766265428066254, metrics={'train_runtime': 559.3815, 'train_samples_per_second': 0.358, 'train_steps_per_second': 0.358, 'train_loss': 0.7766265428066254, 'epoch': 5.0})

In [28]:
trainer.save_model()  # save this model

Saving model checkpoint to ./math_english_to_latex
Configuration saved in ./math_english_to_latex\config.json
Configuration saved in ./math_english_to_latex\generation_config.json
Model weights saved in ./math_english_to_latex\pytorch_model.bin


In [29]:
loaded_model = AutoModelForCausalLM.from_pretrained('./math_english_to_latex')
latex_generator = pipeline('text-generation', model=loaded_model, tokenizer=tokenizer)

text_sample = 'g of x equals integral from 0 to 1 of x squared'
conversion_text_sample = f'{CONVERSION_PROMPT}English: {text_sample}\n{CONVERSION_TOKEN}'

print(latex_generator(
    conversion_text_sample, num_beams=2, early_stopping=True, temperature=0.7,
    max_new_tokens=24
)[0]['generated_text'])

loading configuration file ./math_english_to_latex\config.json
Model config GPT2Config {
  "_name_or_path": "./math_english_to_latex",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "torch_dtype": "float32",
  "transformers_version": "4.30.2",
  "use_cache": true,
  "vocab_si

Convert English to LaTeX
English: g of x equals integral from 0 to 1 of x squared
LaTeX: g(x) = \int_{0}^{1} x^2 \,dx^2 \,dx


In [30]:
# Another example
text_sample = 'r of x is sum from 0 to x of x squared'
conversion_text_sample = f'{CONVERSION_PROMPT}English: {text_sample}\n{CONVERSION_TOKEN}'

print(latex_generator(
    conversion_text_sample, num_beams=5, early_stopping=True, temperature=0.7,
    max_length=len(tokenizer.encode(conversion_text_sample)) + 20
)[0]['generated_text'])

Generate config GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 50256,
  "do_sample": true,
  "eos_token_id": 50256,
  "max_length": 50,
  "transformers_version": "4.30.2"
}

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Convert English to LaTeX
English: r of x is sum from 0 to x of x squared
LaTeX: \frac{r}{x^2} x^2 \,dx^2} x^


In [31]:
print(latex_generator(
    text_sample, num_beams=5, early_stopping=True, temperature=0.7,
    max_length=len(tokenizer.encode(conversion_text_sample)) + 20
)[0]['generated_text'])

Generate config GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 50256,
  "do_sample": true,
  "eos_token_id": 50256,
  "max_length": 50,
  "transformers_version": "4.30.2"
}

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


r of x is sum from 0 to x of x squared
x^2 \,dx^2 \,dx^2 \,dx^2 \,dx^2 \,dx^2 \,dx^


In [32]:
# try a few shot with standard gpt2
few_shot_prompt = CONVERSION_PROMPT+"""English: f of x is sum from 0 to x of x squared
LaTeX: f(x) = \sum_{0}^{x} x^2 \,dx \
###
LCT
English: f of x equals integral from 0 to pi of x to the fourth power
LaTeX: f(x) = \int_{0}^{\pi} x^4 \,dx \
###
LCT
English: pi to the 8th power
LaTeX:"""

print(non_finetuned_latex_generator(
    few_shot_prompt, num_beams=1, early_stopping=True, temperature=0.1,
    max_length=len(tokenizer.encode(few_shot_prompt)) + 20
)[0]['generated_text'])

NameError: name 'non_finetuned_latex_generator' is not defined

In [80]:
# Just ask with standard gpt2
print(non_finetuned_latex_generator(
    conversion_text_sample, num_beams=1, early_stopping=True, temperature=0.1,
    max_length=len(tokenizer.encode(conversion_text_sample)) + 20
)[0]['generated_text'])

Generate config GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 50256,
  "do_sample": true,
  "eos_token_id": 50256,
  "max_length": 50,
  "transformers_version": "4.30.2"
}

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Convert English to LaTeX
English: pi to the 16th power
LaTeX: pi to the 16th power
LaTeX: pi to the 16th power
LaTeX:
