# aitextgen Training Hello World

_Last Updated: Feb 21, 2021 (v.0.4.0)_

by Max Woolf

A "Hello World" Tutorial to show how training works with aitextgen, even on a CPU!

In [1]:
from aitextgen.TokenDataset import TokenDataset
from aitextgen.tokenizers import train_tokenizer
from aitextgen.utils import GPT2ConfigCPU
from aitextgen import aitextgen
import os, os.path

  from .autonotebook import tqdm as notebook_tqdm


First, download this [text file of Shakespeare's plays](https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt), to the folder with this notebook, then put the name of the downloaded Shakespeare text for training into the cell below.

In [2]:
file_name = "number_theory.json"



You can now train a custom Byte Pair Encoding Tokenizer on the downloaded text!

This will save one file: `aitextgen.tokenizer.json`, which contains the information needed to rebuild the tokenizer.

In [3]:
train_tokenizer(file_name)
tokenizer_file = 'aitextgen.tokenizer.json'






`GPT2ConfigCPU()` is a mini variant of GPT-2 optimized for CPU-training.

e.g. the # of input tokens here is 64 vs. 1024 for base GPT-2. This dramatically speeds training up.

In [4]:
config = GPT2ConfigCPU()

Instantiate aitextgen using the created tokenizer and config

In [5]:
ai = aitextgen(tokenizer_file=tokenizer_file, config=config)

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}



You can build datasets for training by creating TokenDatasets, which automatically processes the dataset with the appropriate size.

In [6]:
data = TokenDataset(file_name, tokenizer_file=tokenizer_file, block_size=64)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


100%|██████████| 4378/4378 [00:00<00:00, 29892.30it/s]


Train the model! It will save pytorch_model.bin periodically and after completion to the `trained_model` folder. On a 2020 8-core iMac, this took ~25 minutes to run.

The configuration below processes 400,000 subsets of tokens (8 * 50000), which is about just one pass through all the data (1 epoch). Ideally you'll want multiple passes through the data and a training loss less than `2.0` for coherent output; when training a model from scratch, that's more difficult, but with long enough training you can get there!

In [17]:
ai.train(data, batch_size=8, num_steps=50000, generate_every=5000, save_every=500)

pytorch_model.bin already exists in /trained_model and will be overwritten!
  rank_zero_deprecation(
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
  rank_zero_warn(


  0%|          | 0/50000 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disablin

Configuration saved in trained_model/generation_config.json


[1m1,000 steps reached: saving model to /trained_model[0m                   
Loss: 2.300 — Avg: 2.271:   2%|▏         | 1000/50000 [00:51<42:02, 19.43it/s]

Configuration saved in trained_model/generation_config.json


[1m1,500 steps reached: saving model to /trained_model[0m                   
Loss: 2.180 — Avg: 2.250:   3%|▎         | 1500/50000 [01:09<37:39, 21.47it/s]

Configuration saved in trained_model/generation_config.json


[1m2,000 steps reached: saving model to /trained_model[0m                   
Loss: 2.250 — Avg: 2.217:   4%|▍         | 2000/50000 [01:28<35:20, 22.64it/s]

Configuration saved in trained_model/generation_config.json


[1m2,500 steps reached: saving model to /trained_model[0m                   
Loss: 2.270 — Avg: 2.224:   5%|▌         | 2500/50000 [01:46<33:49, 23.41it/s]

Configuration saved in trained_model/generation_config.json


[1m3,000 steps reached: saving model to /trained_model[0m                   
Loss: 2.130 — Avg: 2.163:   6%|▌         | 3000/50000 [02:05<32:41, 23.96it/s]

Configuration saved in trained_model/generation_config.json


[1m3,500 steps reached: saving model to /trained_model[0m                   
Loss: 2.210 — Avg: 2.175:   7%|▋         | 3500/50000 [02:23<31:47, 24.37it/s]

Configuration saved in trained_model/generation_config.json


[1m4,000 steps reached: saving model to /trained_model[0m                   
Loss: 2.080 — Avg: 2.099:   8%|▊         | 4000/50000 [02:42<31:03, 24.68it/s]

Configuration saved in trained_model/generation_config.json


[1m4,500 steps reached: saving model to /trained_model[0m                   
Loss: 2.120 — Avg: 2.093:   9%|▉         | 4500/50000 [03:00<30:24, 24.94it/s]

Configuration saved in trained_model/generation_config.json


[1m5,000 steps reached: saving model to /trained_model[0m                   
Loss: 2.140 — Avg: 2.064:  10%|█         | 5000/50000 [03:18<29:49, 25.14it/s]

Configuration saved in trained_model/generation_config.json


[1m5,000 steps reached: generating sample texts.[0m                         
Loss: 2.140 — Avg: 2.064:  10%|█         | 5000/50000 [03:18<29:49, 25.14it/s]

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


s, we retrady to find the largest possible value of $S$ is $s$, and $s=5=30$, so $L_{7=8$, so $L_{L_{L_7}=30_{8-1=8+7+E=9$.
[1m5,500 steps reached: saving model to /trained_model[0m                   
Loss: 2.050 — Avg: 2.051:  11%|█         | 5500/50000 [03:37<29:18, 25.31it/s]

Configuration saved in trained_model/generation_config.json


[1m6,000 steps reached: saving model to /trained_model[0m                   
Loss: 2.070 — Avg: 2.047:  12%|█▏        | 6000/50000 [03:55<28:48, 25.45it/s]

Configuration saved in trained_model/generation_config.json


[1m6,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.990 — Avg: 2.025:  13%|█▎        | 6500/50000 [04:14<28:20, 25.58it/s]

Configuration saved in trained_model/generation_config.json


[1m7,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.930 — Avg: 1.999:  14%|█▍        | 7000/50000 [04:32<27:54, 25.68it/s]

Configuration saved in trained_model/generation_config.json


[1m7,500 steps reached: saving model to /trained_model[0m                   
Loss: 2.030 — Avg: 1.988:  15%|█▌        | 7500/50000 [04:51<27:29, 25.77it/s]

Configuration saved in trained_model/generation_config.json


[1m8,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.970 — Avg: 1.971:  16%|█▌        | 8000/50000 [05:09<27:04, 25.85it/s]

Configuration saved in trained_model/generation_config.json


[1m8,500 steps reached: saving model to /trained_model[0m                   
Loss: 2.010 — Avg: 1.976:  17%|█▋        | 8500/50000 [05:28<26:41, 25.91it/s]

Configuration saved in trained_model/generation_config.json


[1m9,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.890 — Avg: 1.961:  18%|█▊        | 9000/50000 [05:46<26:18, 25.97it/s]

Configuration saved in trained_model/generation_config.json


[1m9,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.920 — Avg: 1.930:  19%|█▉        | 9500/50000 [06:04<25:55, 26.03it/s]

Configuration saved in trained_model/generation_config.json


[1m10,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.900 — Avg: 1.925:  20%|██        | 10000/50000 [06:23<25:33, 26.08it/s]

Configuration saved in trained_model/generation_config.json


[1m10,000 steps reached: generating sample texts.[0m                         
Loss: 1.900 — Avg: 1.925:  20%|██        | 10000/50000 [06:23<25:33, 26.08it/s]

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


: $a$ and $b^2$ and $b$ are factors. So there are $\\boxed{7}$ such factors."  
},{
    "problem": "What is the greatest common divisor of $b$ and $c$?",
    "level": "Level 3",
    "type
[1m10,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.920 — Avg: 1.909:  21%|██        | 10500/50000 [06:42<25:12, 26.11it/s]

Configuration saved in trained_model/generation_config.json


[1m11,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.960 — Avg: 1.908:  22%|██▏       | 11000/50000 [07:00<24:51, 26.15it/s]

Configuration saved in trained_model/generation_config.json


[1m11,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.950 — Avg: 1.886:  23%|██▎       | 11500/50000 [07:18<24:29, 26.20it/s]

Configuration saved in trained_model/generation_config.json


[1m12,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.850 — Avg: 1.857:  24%|██▍       | 12000/50000 [07:37<24:10, 26.20it/s]

Configuration saved in trained_model/generation_config.json


[1m12,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.890 — Avg: 1.877:  25%|██▌       | 12500/50000 [07:56<23:49, 26.24it/s]

Configuration saved in trained_model/generation_config.json


[1m13,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.830 — Avg: 1.876:  26%|██▌       | 13000/50000 [08:14<23:28, 26.27it/s]

Configuration saved in trained_model/generation_config.json


[1m13,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.850 — Avg: 1.863:  27%|██▋       | 13500/50000 [08:33<23:07, 26.31it/s]

Configuration saved in trained_model/generation_config.json


[1m14,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.820 — Avg: 1.835:  28%|██▊       | 14000/50000 [08:51<22:46, 26.34it/s]

Configuration saved in trained_model/generation_config.json


[1m14,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.870 — Avg: 1.848:  29%|██▉       | 14500/50000 [09:09<22:26, 26.36it/s]

Configuration saved in trained_model/generation_config.json


[1m15,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.820 — Avg: 1.826:  30%|███       | 15000/50000 [09:28<22:06, 26.39it/s]

Configuration saved in trained_model/generation_config.json


[1m15,000 steps reached: generating sample texts.[0m                         
Loss: 1.820 — Avg: 1.826:  30%|███       | 15000/50000 [09:28<22:06, 26.39it/s]

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


ence $(2a+b)(a+b)(b+b)=a+c$. Then $2b+d$. Then $a$ is divisible by $a$, and $b$ is agative, so \\[2a+b+b+d=b+c)(b+d
[1m15,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.870 — Avg: 1.837:  31%|███       | 15500/50000 [09:46<21:46, 26.41it/s]

Configuration saved in trained_model/generation_config.json


[1m16,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.870 — Avg: 1.847:  32%|███▏      | 16000/50000 [10:05<21:26, 26.43it/s]

Configuration saved in trained_model/generation_config.json


[1m16,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.760 — Avg: 1.809:  33%|███▎      | 16500/50000 [10:23<21:06, 26.44it/s]

Configuration saved in trained_model/generation_config.json


[1m17,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.860 — Avg: 1.804:  34%|███▍      | 17000/50000 [10:42<20:47, 26.45it/s]

Configuration saved in trained_model/generation_config.json


[1m17,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.790 — Avg: 1.809:  35%|███▌      | 17500/50000 [11:01<20:28, 26.45it/s]

Configuration saved in trained_model/generation_config.json


[1m18,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.820 — Avg: 1.801:  36%|███▌      | 18000/50000 [11:20<20:08, 26.47it/s]

Configuration saved in trained_model/generation_config.json


[1m18,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.840 — Avg: 1.788:  37%|███▋      | 18500/50000 [11:38<19:49, 26.49it/s]

Configuration saved in trained_model/generation_config.json


[1m19,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.750 — Avg: 1.780:  38%|███▊      | 19000/50000 [11:57<19:30, 26.49it/s]

Configuration saved in trained_model/generation_config.json


[1m19,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.760 — Avg: 1.784:  39%|███▉      | 19500/50000 [12:15<19:10, 26.50it/s]

Configuration saved in trained_model/generation_config.json


[1m20,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.850 — Avg: 1.780:  40%|████      | 20000/50000 [12:34<18:51, 26.52it/s]

Configuration saved in trained_model/generation_config.json


[1m20,000 steps reached: generating sample texts.[0m                         
Loss: 1.850 — Avg: 1.780:  40%|████      | 20000/50000 [12:34<18:51, 26.52it/s]

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


ing these equations: $S$ and $S$ must equal $4$ and $H=5$. The other of the numbers from $1$ to $9$ to $8,$ and $9,$ and $17,$ $17,$ and our answer is $\\boxed{3}$."
},{

[1m20,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.820 — Avg: 1.753:  41%|████      | 20500/50000 [12:52<18:32, 26.52it/s]

Configuration saved in trained_model/generation_config.json


[1m21,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.770 — Avg: 1.782:  42%|████▏     | 21000/50000 [13:11<18:12, 26.54it/s]

Configuration saved in trained_model/generation_config.json


[1m21,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.780 — Avg: 1.763:  43%|████▎     | 21500/50000 [13:30<17:54, 26.53it/s]

Configuration saved in trained_model/generation_config.json


[1m22,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.730 — Avg: 1.741:  44%|████▍     | 22000/50000 [13:49<17:35, 26.53it/s]

Configuration saved in trained_model/generation_config.json


[1m22,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.780 — Avg: 1.756:  45%|████▌     | 22500/50000 [14:08<17:17, 26.51it/s]

Configuration saved in trained_model/generation_config.json


[1m23,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.830 — Avg: 1.748:  46%|████▌     | 23000/50000 [14:27<16:58, 26.51it/s]

Configuration saved in trained_model/generation_config.json


[1m23,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.710 — Avg: 1.724:  47%|████▋     | 23500/50000 [14:46<16:39, 26.51it/s]

Configuration saved in trained_model/generation_config.json


[1m24,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.710 — Avg: 1.736:  48%|████▊     | 24000/50000 [15:04<16:20, 26.53it/s]

Configuration saved in trained_model/generation_config.json


[1m24,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.720 — Avg: 1.696:  49%|████▉     | 24500/50000 [15:23<16:00, 26.54it/s]

Configuration saved in trained_model/generation_config.json


[1m25,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.710 — Avg: 1.698:  50%|█████     | 25000/50000 [15:41<15:41, 26.55it/s]

Configuration saved in trained_model/generation_config.json


[1m25,000 steps reached: generating sample texts.[0m                         
Loss: 1.710 — Avg: 1.698:  50%|█████     | 25000/50000 [15:41<15:41, 26.55it/s]

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 answer is $\\boxed{988103 = \\boxed{3154}$."                                  
},{
    "problem": "A books in she puts. To find the mavens each marching fac " "ticifth he puts,
[1m25,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.720 — Avg: 1.718:  51%|█████     | 25500/50000 [16:00<15:23, 26.54it/s]

Configuration saved in trained_model/generation_config.json


[1m26,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.690 — Avg: 1.711:  52%|█████▏    | 26000/50000 [16:19<15:04, 26.55it/s]

Configuration saved in trained_model/generation_config.json


[1m26,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.670 — Avg: 1.712:  53%|█████▎    | 26500/50000 [16:37<14:44, 26.55it/s]

Configuration saved in trained_model/generation_config.json


[1m27,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.680 — Avg: 1.718:  54%|█████▍    | 27000/50000 [16:56<14:25, 26.56it/s]

Configuration saved in trained_model/generation_config.json


[1m27,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.690 — Avg: 1.700:  55%|█████▌    | 27500/50000 [17:14<14:06, 26.57it/s]

Configuration saved in trained_model/generation_config.json


[1m28,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.690 — Avg: 1.714:  56%|█████▌    | 28000/50000 [17:33<13:47, 26.58it/s]

Configuration saved in trained_model/generation_config.json


[1m28,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.670 — Avg: 1.685:  57%|█████▋    | 28500/50000 [17:51<13:28, 26.59it/s]

Configuration saved in trained_model/generation_config.json


[1m29,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.740 — Avg: 1.710:  58%|█████▊    | 29000/50000 [18:10<13:09, 26.60it/s]

Configuration saved in trained_model/generation_config.json


[1m29,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.740 — Avg: 1.707:  59%|█████▉    | 29500/50000 [18:28<12:50, 26.61it/s]

Configuration saved in trained_model/generation_config.json


[1m30,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.650 — Avg: 1.689:  60%|██████    | 30000/50000 [18:47<12:31, 26.62it/s]

Configuration saved in trained_model/generation_config.json


[1m30,000 steps reached: generating sample texts.[0m                         
Loss: 1.650 — Avg: 1.689:  60%|██████    | 30000/50000 [18:47<12:31, 26.62it/s]

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 numbers ends with a number with an integer divisible by 4 is divisible by 11. Also, $n = 41$ and $n = 21$ is divisible by 18 and 26. Since $n$ is divisible by 6, $n$ is a multiple of $5$ must be
[1m30,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.680 — Avg: 1.670:  61%|██████    | 30500/50000 [19:05<12:12, 26.62it/s]

Configuration saved in trained_model/generation_config.json


[1m31,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.710 — Avg: 1.706:  62%|██████▏   | 31000/50000 [19:24<11:53, 26.63it/s]

Configuration saved in trained_model/generation_config.json


Loss: 1.780 — Avg: 1.704:  63%|██████▎   | 31480/50000 [19:57<11:44, 26.28it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after paral

Configuration saved in trained_model/generation_config.json


[1m32,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.690 — Avg: 1.669:  64%|██████▍   | 32000/50000 [20:35<11:34, 25.90it/s]

Configuration saved in trained_model/generation_config.json


[1m32,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.670 — Avg: 1.654:  65%|██████▌   | 32500/50000 [20:54<11:15, 25.90it/s]

Configuration saved in trained_model/generation_config.json


[1m33,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.660 — Avg: 1.662:  66%|██████▌   | 33000/50000 [21:14<10:56, 25.90it/s]

Configuration saved in trained_model/generation_config.json


[1m33,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.670 — Avg: 1.666:  67%|██████▋   | 33500/50000 [21:33<10:37, 25.90it/s]

Configuration saved in trained_model/generation_config.json


[1m34,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.630 — Avg: 1.643:  68%|██████▊   | 34000/50000 [21:52<10:17, 25.90it/s]

Configuration saved in trained_model/generation_config.json


[1m34,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.660 — Avg: 1.671:  69%|██████▉   | 34500/50000 [22:12<09:58, 25.88it/s]

Configuration saved in trained_model/generation_config.json


[1m35,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.680 — Avg: 1.666:  70%|███████   | 35000/50000 [22:32<09:39, 25.88it/s]

Configuration saved in trained_model/generation_config.json


[1m35,000 steps reached: generating sample texts.[0m                         
Loss: 1.680 — Avg: 1.666:  70%|███████   | 35000/50000 [22:32<09:39, 25.88it/s]

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


s:\n\\begin{align*}\n(a,b)\\equiv (b + b) + b\\equiv 2 \\equiv 4\\pmod{5} \\quad\\implies\\equiv bb+c\\equiv b\\pmod{5}$, we have\n\\[c \\equiv b
[1m35,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.690 — Avg: 1.643:  71%|███████   | 35500/50000 [22:51<09:20, 25.88it/s]

Configuration saved in trained_model/generation_config.json


[1m36,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.670 — Avg: 1.659:  72%|███████▏  | 36000/50000 [23:11<09:01, 25.87it/s]

Configuration saved in trained_model/generation_config.json


[1m36,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.620 — Avg: 1.650:  73%|███████▎  | 36500/50000 [23:31<08:42, 25.86it/s]

Configuration saved in trained_model/generation_config.json


[1m37,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.590 — Avg: 1.633:  74%|███████▍  | 37000/50000 [23:51<08:22, 25.85it/s]

Configuration saved in trained_model/generation_config.json


[1m37,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.620 — Avg: 1.641:  75%|███████▌  | 37500/50000 [24:11<08:03, 25.84it/s]

Configuration saved in trained_model/generation_config.json


[1m38,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.640 — Avg: 1.634:  76%|███████▌  | 38000/50000 [24:31<07:44, 25.83it/s]

Configuration saved in trained_model/generation_config.json


[1m38,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.690 — Avg: 1.647:  77%|███████▋  | 38500/50000 [24:51<07:25, 25.82it/s]

Configuration saved in trained_model/generation_config.json


[1m39,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.650 — Avg: 1.637:  78%|███████▊  | 39000/50000 [25:11<07:06, 25.80it/s]

Configuration saved in trained_model/generation_config.json


[1m39,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.740 — Avg: 1.624:  79%|███████▉  | 39500/50000 [25:31<06:46, 25.80it/s]

Configuration saved in trained_model/generation_config.json


[1m40,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.590 — Avg: 1.623:  80%|████████  | 40000/50000 [25:50<06:27, 25.79it/s]

Configuration saved in trained_model/generation_config.json


[1m40,000 steps reached: generating sample texts.[0m                         
Loss: 1.590 — Avg: 1.623:  80%|████████  | 40000/50000 [25:50<06:27, 25.79it/s]

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


, we have $a + b \\equiv b \\equiv b^{-1} \\pmod{0, \\begin{align*}\nabc &= (a_1)^2 + (a_2a_2 + a_0 + \\cdots + a_1 + a_0).. \\end
[1m40,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.660 — Avg: 1.630:  81%|████████  | 40500/50000 [26:10<06:08, 25.78it/s]

Configuration saved in trained_model/generation_config.json


[1m41,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.650 — Avg: 1.628:  82%|████████▏ | 41000/50000 [26:31<05:49, 25.77it/s]

Configuration saved in trained_model/generation_config.json


[1m41,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.660 — Avg: 1.635:  83%|████████▎ | 41500/50000 [26:51<05:29, 25.76it/s]

Configuration saved in trained_model/generation_config.json


[1m42,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.660 — Avg: 1.634:  84%|████████▍ | 42000/50000 [27:10<05:10, 25.75it/s]

Configuration saved in trained_model/generation_config.json


[1m42,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.650 — Avg: 1.634:  85%|████████▌ | 42500/50000 [27:30<04:51, 25.75it/s]

Configuration saved in trained_model/generation_config.json


[1m43,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.660 — Avg: 1.619:  86%|████████▌ | 43000/50000 [27:50<04:31, 25.74it/s]

Configuration saved in trained_model/generation_config.json


[1m43,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.570 — Avg: 1.600:  87%|████████▋ | 43500/50000 [28:10<04:12, 25.73it/s]

Configuration saved in trained_model/generation_config.json


[1m44,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.620 — Avg: 1.620:  88%|████████▊ | 44000/50000 [28:31<03:53, 25.71it/s]

Configuration saved in trained_model/generation_config.json


[1m44,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.550 — Avg: 1.611:  89%|████████▉ | 44500/50000 [28:51<03:34, 25.70it/s]

Configuration saved in trained_model/generation_config.json


[1m45,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.670 — Avg: 1.631:  90%|█████████ | 45000/50000 [29:11<03:14, 25.69it/s]

Configuration saved in trained_model/generation_config.json


[1m45,000 steps reached: generating sample texts.[0m                         
Loss: 1.670 — Avg: 1.631:  90%|█████████ | 45000/50000 [29:11<03:14, 25.69it/s]

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


s of the first $12$ days is $2+\\boxed{\\frac{2}{3}=6$ times, which day of the week."
},{
    "problem": "What is the greatest common divisor of $579$ and $7000$ and $
[1m45,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.580 — Avg: 1.621:  91%|█████████ | 45500/50000 [29:31<02:55, 25.68it/s]

Configuration saved in trained_model/generation_config.json


[1m46,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.630 — Avg: 1.603:  92%|█████████▏| 46000/50000 [29:51<02:35, 25.68it/s]

Configuration saved in trained_model/generation_config.json


[1m46,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.630 — Avg: 1.629:  93%|█████████▎| 46500/50000 [30:11<02:16, 25.67it/s]

Configuration saved in trained_model/generation_config.json


[1m47,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.650 — Avg: 1.615:  94%|█████████▍| 47000/50000 [30:31<01:56, 25.67it/s]

Configuration saved in trained_model/generation_config.json


[1m47,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.570 — Avg: 1.618:  95%|█████████▌| 47500/50000 [30:50<01:37, 25.66it/s]

Configuration saved in trained_model/generation_config.json


[1m48,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.540 — Avg: 1.602:  96%|█████████▌| 48000/50000 [31:11<01:17, 25.65it/s]

Configuration saved in trained_model/generation_config.json


[1m48,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.590 — Avg: 1.602:  97%|█████████▋| 48500/50000 [31:31<00:58, 25.64it/s]

Configuration saved in trained_model/generation_config.json


[1m49,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.630 — Avg: 1.594:  98%|█████████▊| 49000/50000 [31:52<00:39, 25.63it/s]

Configuration saved in trained_model/generation_config.json


[1m49,500 steps reached: saving model to /trained_model[0m                   
Loss: 1.630 — Avg: 1.605:  99%|█████████▉| 49500/50000 [32:12<00:19, 25.62it/s]

Configuration saved in trained_model/generation_config.json


[1m50,000 steps reached: saving model to /trained_model[0m                   
Loss: 1.600 — Avg: 1.606: 100%|██████████| 50000/50000 [32:32<00:00, 25.61it/s]

Configuration saved in trained_model/generation_config.json


[1m50,000 steps reached: generating sample texts.[0m                         
Loss: 1.600 — Avg: 1.606: 100%|██████████| 50000/50000 [32:32<00:00, 25.61it/s]

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


. The solution is $10111_6}+\\frac{100\\frac{1001_6}{100_6=\\frac{1001_6$1001_6=1331_6=\\boxed{33_6}$."
},{
    "problem": "Find the difference
Loss: 1.600 — Avg: 1.606: 100%|██████████| 50000/50000 [32:32<00:00, 25.61it/s]

`Trainer.fit` stopped: `max_steps=50000` reached.


Loss: 1.600 — Avg: 1.606: 100%|██████████| 50000/50000 [32:32<00:00, 25.61it/s]


Configuration saved in trained_model/generation_config.json


Generate text from your trained model!

In [18]:
ai.generate(1, prompt="What is 1+1")

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}



[1mWhat is 1+1[0m)(1)(1+1)(1+1+1)=6+1=12(3)$, so $\\boxed{56}$."
},{
    "problem": "In a crate is in the finding the remainder when $3x+1$ is divided by


With your trained model, you can reload the model at any time by providing the `pytorch_model.bin` model weights, the `config`, and the `tokenizer`.

In [9]:
ai2 = aitextgen(model_folder="trained_model",
                tokenizer_file="aitextgen.tokenizer.json")

Generate config GenerationConfig {
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}

loading configuration file trained_model/generation_config.json
Generate config GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.26.1"
}



# MIT License

Copyright (c) 2021 Max Woolf

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.