# Explaining MinGPT 
> This blog explains MinGPT Model implementation on Addition Data, by <a href="https://github.com/karpathy/minGPT">Andrej Karpathy</a>.

- toc: true 
- badges: true
- comments: true
- sticky_rank: 1
- author: Rayan
- image: images/diagram.png
- categories: [MinGPT, transformers]

> `Objective : "To Train a GPT model on a dedicated addition dataset to see if a Transformer can learn to add."`

Our Objective is inspired by the addition section in the GPT-3 paper (Language Models are a few shot learners)- https://arxiv.org/pdf/2005.14165v4.pdf

In [1]:
#hide
# Imports
import math
import logging
import gc
import os
import numpy as np
import torchvision
import torch
import matplotlib.pyplot as plt
import random
import torch.nn as nn
from torch.nn import functional as F
import torch.optim as optim
from torch.optim.lr_scheduler import LambdaLR
from torch.utils.data.dataloader import DataLoader

In [2]:
#collapse-hide
#Seeding 
def set_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

# making deterministic, setting our seed
set_seed(42)

In order to generate data, we define our custom Addition Dataset Class. The sum of two n-digit numbers gives a third up to (n+1)-digit number. So our
encoding will simply be the n-digit first number, n-digit second number, 
and (n+1)-digit result, all simply concatenated together. Because each addition
problem is so structured, there is no need to bother the model with encoding
+, =, or other tokens. Each possible sequence has the same length, and simply
contains the raw digits of the addition problem.

As a few examples, the 2-digit problems:
- 85 + 50 = 135 becomes the sequence [8, 5, 5, 0, 1, 3, 5]
- 6 + 39 = 45 becomes the sequence [0, 6, 3, 9, 0, 4, 5]
etc.

We will also only train GPT on the final (n+1)-digits because the first
two n-digits are always assumed to be given. So when we give GPT an exam later,
we will e.g. feed it the sequence [0, 6, 3, 9], which encodes that we'd like
to add 6 + 39, and hope that the model completes the integer sequence with [0, 4, 5] in 3 sequential steps.