# GPT
## Introduction to GPT
</br> GPT is a model developed based on the Transformer architecture and focuses on generating pre-training tasks. GPT uses the decoder part of Transformer for autoregressive pre-training, and generates text by learning a large amount of text data.
</br> For specific tasks, Transformer models usually need to be trained from scratch. GPT uses **Pre-training & Fine-tuning**. First, it is pre-trained on large-scale unlabeled text data to learn general patterns and knowledge of language; then, it can be adapted to specific tasks or applications by fine-tuning on smaller task-specific data sets.
</br> Since GPT is a pre-trained model, it is particularly effective in handling language generation tasks, such as text generation, dialogue systems, content creation assistance, etc. The pre-training nature of GPT also makes it excellent at understanding complex language patterns and context. At the same time, due to its extensive pre-training, GPT is able to generalize to a wide range of tasks with little or no task-specific training data, although fine-tuning can further improve its performance on specific tasks.
### Framework of GPT
<img src = https://d3i71xaburhd42.cloudfront.net/cd18800a0fe0b668a1cc19f2ec95b5003d0a5035/4-Figure1-1.png width = 800>
Figure: (left) Transformer architecture and training objectives used in this work. (right) Input transformations for fine-tuning on different tasks

- Unsupervised pre-training: Use a multi-layer Transformer decoder for the language model, which is a variant of the transformer. Train with unsupervised corpus of tokens.
- Supervised fine-tuning: Use parameters to the supervised target task.
- Task-specific input transformations: A traversal-style approach.
### Development of GPT
- GPT-1: Released in 2018, it contains 12 layers of Transformer and has 110 million parameters.
- GPT-2: The model size has been greatly increased and different versions are provided. The largest version contains 1542 layers of Transformer and has 1.5 billion parameters.
- GPT-3: Compared with GPT-2, the model size, amount of pre-training data and used pre-training tasks have increased. Including texts, books, news, Wikipedia, etc. on the Internet.
- InstructGPT: Making language models larger does not mean that they are better able to follow user intentions, so the main problem to solve is how to make language models better able to follow instructions given by humans and implement them in practice.
- ChatGPT: OpenAI's upgraded version based on the GPT-3 model in 2022 is mainly optimized for dialogue tasks, adding input and output of dialogue history, and control of dialogue strategies.
- GPT-4: In the conversation task, the difference between GPT4 and GPT-3.5 or GPT3 is not big. But when the complexity of the task reaches a sufficient threshold, differences emerge, and GPT-4 is more reliable, more creative, and able to handle more nuanced instructions.
## Reference
1. GPT-1 "[Improving Language Understanding by Generative Pre-Training](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf)" by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskeve
2. "[Deep contextualized word representations](https://arxiv.org/abs/1802.05365)" by Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee and Luke Zettlemoyer.r3
2. GPT-2 "[Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)" by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and 
Ilya Sutskeve4
3. GPT-3 "[Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)" by Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah and Jared Kaplan, etc5
4. InstructGPT: "[Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)" by Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, etc.
5. GPT-4 "[GPT-4 Technical Report](https://arxiv.org/abs/2303.08774v2)" by OpenAI.
6. "[GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models](https://arxiv.org/abs/2303.10130)" by Tyna Eloundou, Sam Manning, Pamela Mishkin and Daniel Rock.1

In [3]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# The input text
text = "He worked as a"

# Encoder
encoded_input = tokenizer.encode_plus(
    text, 
    return_tensors='pt', 
    add_special_tokens=True,
    return_attention_mask=True
)

# Generater
output_sequences = model.generate(
    input_ids=encoded_input['input_ids'],
    attention_mask=encoded_input['attention_mask'],
    max_length=16, 
    num_return_sequences=1,
    pad_token_id=model.config.eos_token_id
)

# Decoder
generated_text = tokenizer.decode(output_sequences[0], skip_special_tokens=True)

print(generated_text)


He worked as a security guard at the airport for the past two years.



In [4]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# The input text
text = "He worked as a"

# Encoder
encoded_input = tokenizer.encode_plus(
    text, 
    return_tensors='pt', 
    add_special_tokens=True,
    return_attention_mask=True
)

# Generater
output_sequences = model.generate(
    input_ids=encoded_input['input_ids'],
    attention_mask=encoded_input['attention_mask'],
    max_length=100, 
    num_return_sequences=1,
    pad_token_id=model.config.eos_token_id
)

# Decoder
generated_text = tokenizer.decode(output_sequences[0], skip_special_tokens=True)

print(generated_text)

He worked as a security guard at the airport for the past two years.

"I'm not sure if he's a terrorist or not," said a source close to the investigation. "He's a very good person. He's a very good person. He's a very good person."

The source said the FBI is still looking into the case.

The FBI is also looking into the possibility that the man was involved in a terrorist attack in New York City.




In [5]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# The input text
text = "Bob is good boy, but"

# Encoder
encoded_input = tokenizer.encode_plus(
    text, 
    return_tensors='pt', 
    add_special_tokens=True,
    return_attention_mask=True
)

# Generater
output_sequences = model.generate(
    input_ids=encoded_input['input_ids'],
    attention_mask=encoded_input['attention_mask'],
    max_length=50, 
    num_return_sequences=1,
    pad_token_id=model.config.eos_token_id
)

# Decoder
generated_text = tokenizer.decode(output_sequences[0], skip_special_tokens=True)

print(generated_text)

Bob is good boy, but he's not a good boy. He's a bad boy. He's a bad boy. He's a bad boy. He's a bad boy. He's a bad boy. He's a bad boy. He
