<h1><a href="https://huggingface.co/transformers/model_doc/gpt2.html">HuggingFace OpenAI GPT2</a> <a href="https://huggingface.co/exbert/?model=gpt2&modelKind=bidirectional&sentence=The%20girl%20ran%20to%20a%20local%20pub%20to%20escape%20the%20din%20of%20her%20city.&layer=11&heads=..&threshold=0.7&tokenInd=null&tokenSide=null&maskInds=..&hideClsSep=false">Transformer Visualizer</a></h1>
<h4><a href="https://huggingface.co/transformers/main_classes/processors.html">List of Data Processors</a></h4>
<h4><a href="https://huggingface.co/transformers/pretrained_models.html">List of Pretrained Models</a></h4>
<h4><a href="https://huggingface.co/transformers/main_classes/tokenizer.html">List of Tokenizers</a></h4>
<h4><a href="https://huggingface.co/transformers/main_classes/pipelines.html">List of Pipelines</a></h4>
<h4><a href="https://huggingface.co/transformers/main_classes/optimizer_schedules.html">List of Optimizers</a></h4>

<h3>Installation</h3>

>pip install transformers\[torch]

>pip install transformers\[tf-cpu]

If you don’t have any specific environment variable set, the cache directory will be at ~/.cache/torch/transformers/.


<h2>Text Generation</h2>
In text generation (a.k.a open-ended text generation) the goal is to create a coherent portion of text that is a continuation from the given context. The following example shows how GPT-2 can be used in pipelines to generate text. As a default all models apply Top-K sampling when used in pipelines, as configured in their respective configurations (see gpt-2 config for example).

<pre><code>
from transformers import pipeline

text_generator = pipeline("text-generation")

print(text_generator("As far as I am concerned, I will", max_length=50, do_sample=False))



from transformers import GPT2Tokenizer, <em>TFGPT2Model</em>

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = <em>TFGPT2Model.from_pretrained('gpt2')</em>

text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)

</pre></code>

>>[{'generated_text': 'As far as I am concerned, I will be the first to admit that I am not a fan of the idea of a "free market." I think that the idea of a free market is a bit of a stretch. I think that the idea'}]


Here, the model generates a random text with a total maximal length of 50 tokens from context “As far as I am concerned, I will”. The default arguments of PreTrainedModel.generate() can be directly overriden in the pipeline, as is shown above for the argument max_length.

<h2>Text Summarization</h2>
Summarization is the task of summarizing a document or an article into a shorter text.

An example of a summarization dataset is the CNN / Daily Mail dataset, which consists of long news articles and was created for the task of summarization. If you would like to fine-tune a model on a summarization task, various approaches are described in this document.

Here is an example of using the pipelines to do summarization. It leverages a Bart model that was fine-tuned on the CNN / Daily Mail data set.
<pre><code>
from transformers import pipeline

summarizer = pipeline("summarization")
ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
<em>...</em> 
<em>...</em> 
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18."""
</pre></code>

Because the summarization pipeline depends on the PretrainedModel.generate() method, we can override the default arguments of PretrainedModel.generate() directly in the pipeline for max_length and min_length as shown below. This outputs the following summary:

<pre><code>
    print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False
</pre></code>
>>[{'summary_text': 'Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men.'}]

<h4> Using Model and Tokenizer Example</h4>
Here is an example of doing summarization using a model and a tokenizer. The process is the following:

1. Instantiate a tokenizer and a model from the checkpoint name. Summarization is usually done using an encoder-decoder model, such as Bart or T5.

2. Define the article that should be summarized.

3. Add the T5 specific prefix “summarize: “.

4. Use the PretrainedModel.generate() method to generate the summary.

<pre><code>
from transformers import AutoModelWithLMHead, AutoTokenizer

model = AutoModelWithLMHead.from_pretrained("t5-base")
tokenizer = AutoTokenizer.from_pretrained("t5-base")

\#T5 uses a max_length of 512 so we cut the article to 512 tokens.

inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors="pt", max_length=512)
outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
</pre></code>

<h2>Fine Tuning</h2>
<h3> Fine-tuning with <a href="https://huggingface.co/transformers/main_classes/trainer.html">Trainer</a></h3>
<pre>
<code>
from transformers import TFDistilBertForSequenceClassification, TFTrainer, TFTrainingArguments

training_args = TFTrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # total number of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    logging_steps=10,
)

with training_args.strategy.scope():
    model = TFDistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")

trainer = TFTrainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=val_dataset             # evaluation dataset
)

trainer.train()
</code>
</pre>

<h3>Fine-tuning with Tensorflow</h3>
<pre>
<code>
from torch.utils.data import DataLoader
from transformers import DistilBertForSequenceClassification, AdamW

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
model.to(device)
model.train()

train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)

optim = AdamW(model.parameters(), lr=5e-5)

for epoch in range(3):
    for batch in train_loader:
        optim.zero_grad()
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)
        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs[0]
        loss.backward()
        optim.step()

model.eval()
</code>
</pre>

In [None]:
from transformers import GPT2Tokenizer, GPT2Model
import tensorflow as tf

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2', return_dict=True)

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state