# The Architecture Behind GPT{1,2} #

The *weights* for GPT2 are open source, not so for GPT3.  The [paper behind GPT1](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf) is very accesible.

The GPT1 architecture:

![16_11.png](attachment:de5889a7-821e-47c3-b836-c049366cedce.png)

## Some Features of GPT 1 ##

1. Model training consists of two stages
2. The model is pre-trained on text ('next token') prediction
3. The model is then *fine-tuned* to optimize a specific target task
4. Their model is based on Transformer
5. They optimize for four types of language tasks: (1) natural language inference, (2) question answering, (3) semantic similarity, and (4) text classification 

## The HuggingFace Transformers Library ##

The primary 'hub' for transformer-based models and related datasets is [Huggingface](https://huggingface.co/).

+ What you can do with [HuggingFace Transformers](https://huggingface.co/docs/transformers/en/index).  (Note the supported frameworks--in particular, the level of PyTorch support.)
+ Installing the **transformer library** is really easy.  See the [Quickstart Guide](https://huggingface.co/docs/transformers/en/quicktour).
+ Using the **transformer library** *can* also be easy with the [pipeline function](https://huggingface.co/docs/transformers/v4.38.2/en/main_classes/pipelines#transformers.pipeline).  See the [Quickstart Guide](https://huggingface.co/docs/transformers/en/quicktour) for an example. 
+ Pytorch has its own page on Transformers that is [even more detailed](https://pytorch.org/hub/huggingface_pytorch-transformers/)  Most of the links refer to the HuggingFace website, but note the outline of the process:

   + Tokenization
   + Model Selection
   + Model Head Selection
 
The native HuggingFace Transormers library is even easier to use:

## Using GPT-2 to generate new text

In [1]:
from transformers import pipeline, set_seed


generator = pipeline('text-generation', model='gpt2')
set_seed(123)
generator("Hey readers, today is",
          max_length=20,
          num_return_sequences=3)

  from .autonotebook import tqdm as notebook_tqdm
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Hey readers, today is the third day in a row where I am starting to get a little fed'},
 {'generated_text': 'Hey readers, today is a very important weekend, and thanks to all of you, will be a'},
 {'generated_text': 'Hey readers, today is the third day of the New Year after I posted a series on the Internet'}]

## Using GPT2 for 'Sentiment Analysis' ##

There is a GPT2 for 'Sequence Classification'.  See [here](https://huggingface.co/docs/transformers/v4.18.0/en/model_doc/gpt2#transformers.GPT2ForSequenceClassification). 