**Install Transformers library**

In [3]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/ed/d5/f4157a376b8a79489a76ce6cfe147f4f3be1e029b7144fa7b8432e8acb26/transformers-4.4.2-py3-none-any.whl (2.0MB)
[K     |████████████████████████████████| 2.0MB 14.9MB/s 
Collecting tokenizers<0.11,>=0.10.1
[?25l  Downloading https://files.pythonhosted.org/packages/71/23/2ddc317b2121117bf34dd00f5b0de194158f2a44ee2bf5e47c7166878a97/tokenizers-0.10.1-cp37-cp37m-manylinux2010_x86_64.whl (3.2MB)
[K     |████████████████████████████████| 3.2MB 39.0MB/s 
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/08/cd/342e584ee544d044fb573ae697404ce22ede086c9e87ce5960772084cad0/sacremoses-0.0.44.tar.gz (862kB)
[K     |████████████████████████████████| 870kB 51.9MB/s 
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone
  Created wheel for sacremoses: filename=sacremoses-0.0.44-cp37-none-any.whl size=886084 sha256=3651bbb82a

## Sequence Classification
    
  Sequence classification is the task of classifying sequences according to a given number of classes. 

## Sentiment analysis
Identifying if a sequence is positive or negative.

### Pipelines
It leverages a fine-tuned model on sst2, which is a [GLUE](https://gluebenchmark.com/) task

### glue/sst2

The [Stanford Sentiment Treebank](https://nlp.stanford.edu/sentiment/index.html) consists of sentences from movie reviews and human annotations of their sentiment. The task is to predict the sentiment (positive/negative) of a given sentence.

In [4]:
from transformers import pipeline

nlp_sc = pipeline('sentiment-analysis')

print(nlp_sc('I love learning new things!'))

print(nlp_sc('I hate unlearning old things!'))

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=629.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=267844284.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=48.0, style=ProgressStyle(description_w…


[{'label': 'POSITIVE', 'score': 0.9997938275337219}]
[{'label': 'NEGATIVE', 'score': 0.9741692543029785}]


## Paraphrases
 Sequence classification using a model to determine if two sequences are paraphrases of each other. 

### glue/mrpc
The [Microsoft Research Paraphrase Corpus](https://www.microsoft.com/en-us/download/details.aspx?id=52398) (Dolan & Brockett, 2005) is a corpus of sentence pairs automatically extracted from online news sources, with human annotations for whether the sentences in the pair are semantically equivalent.

### Using a Tokenizer and a Model

In [47]:
import tensorflow as tf

from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")

Some layers from the model checkpoint at bert-base-cased-finetuned-mrpc were not used when initializing TFBertForSequenceClassification: ['dropout_183']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at bert-base-cased-finetuned-mrpc.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


In [48]:
classes = ["not paraphrase", "is paraphrase"]

sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"

In [49]:
paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, return_tensors="tf")

not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, return_tensors="tf")


In [50]:
paraphrase_classification_logits = model(paraphrase)
print(paraphrase_classification_logits)

not_paraphrase_classification_logits = model(not_paraphrase)
print(not_paraphrase_classification_logits)

TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[-0.3494552,  1.9003875]], dtype=float32)>, hidden_states=None, attentions=None)
TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[ 0.5386368, -2.219714 ]], dtype=float32)>, hidden_states=None, attentions=None)


In [51]:
paraphrase_results = tf.nn.softmax(paraphrase_classification_logits[0], axis=1).numpy()[0]
not_paraphrase_results = tf.nn.softmax(not_paraphrase_classification_logits[0], axis=1).numpy()[0]

In [52]:
print(tokenizer.decode(paraphrase["input_ids"][0]))
for i in range(len(classes)):
    print(f"{classes[i]}: {round(paraphrase_results[i] * 100)}%")

print()

print(tokenizer.decode(not_paraphrase["input_ids"][0]))
for i in range(len(classes)):
    print(f"{classes[i]}: {round(not_paraphrase_results[i] * 100)}%")

[CLS] The company HuggingFace is based in New York City [SEP] HuggingFace's headquarters are situated in Manhattan [SEP]
not paraphrase: 10%
is paraphrase: 90%

[CLS] The company HuggingFace is based in New York City [SEP] Apples are especially bad for your health [SEP]
not paraphrase: 94%
is paraphrase: 6%


### Using Model (BERT) Fine-tuning

In [55]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfd

from transformers import (
    BertTokenizer,
    TFBertForSequenceClassification,
    glue_convert_examples_to_features)

In [56]:
# Load dataset, tokenizer, model from pretrained model/vocabulary
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')

model = TFBertForSequenceClassification.from_pretrained('bert-base-cased')

data = tfd.load('glue/mrpc')

All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:absl:Load dataset info from /root/tensorflow_datasets/glue/mrpc/1.0.0
INFO:absl:Reusing dataset glue (/root/tensorflow_datasets/glue/mrpc/1.0.0)
INFO:absl:Constructing tf.data.Dataset for split None, from /root/tensorflow_datasets/glue/mrpc/1.0.0


In [57]:
data

{'test': <PrefetchDataset shapes: {idx: (), label: (), sentence1: (), sentence2: ()}, types: {idx: tf.int32, label: tf.int64, sentence1: tf.string, sentence2: tf.string}>,
 'train': <PrefetchDataset shapes: {idx: (), label: (), sentence1: (), sentence2: ()}, types: {idx: tf.int32, label: tf.int64, sentence1: tf.string, sentence2: tf.string}>,
 'validation': <PrefetchDataset shapes: {idx: (), label: (), sentence1: (), sentence2: ()}, types: {idx: tf.int32, label: tf.int64, sentence1: tf.string, sentence2: tf.string}>}

In [58]:
# Prepare dataset for GLUE as a tf.data.Dataset instance
train_dataset = glue_convert_examples_to_features(data['train'], tokenizer, max_length=128, task='mrpc')
valid_dataset = glue_convert_examples_to_features(data['validation'], tokenizer, max_length=128, task='mrpc')



In [59]:
train_dataset

<FlatMapDataset shapes: ({input_ids: (None,), token_type_ids: (None,), attention_mask: (None,)}, ()), types: ({input_ids: tf.int32, token_type_ids: tf.int32, attention_mask: tf.int32}, tf.int64)>

In [60]:
train_dataset = train_dataset.shuffle(100).batch(32)
valid_dataset = valid_dataset.batch(32)

In [61]:
for sample in train_dataset.batch(1):
    tf.print(sample)
    break

({'attention_mask': [[[1 1 1 ... 0 0 0]
  [1 1 1 ... 0 0 0]
  [1 1 1 ... 0 0 0]
  ...
  [1 1 1 ... 0 0 0]
  [1 1 1 ... 0 0 0]
  [1 1 1 ... 0 0 0]]], 'input_ids': [[[101 1130 1382 ... 0 0 0]
  [101 1130 6036 ... 0 0 0]
  [101 1760 6700 ... 0 0 0]
  ...
  [101 3957 4994 ... 0 0 0]
  [101 13643 117 ... 0 0 0]
  [101 1109 5835 ... 0 0 0]]], 'token_type_ids': [[[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]]}, [[1 1 1 ... 1 1 1]])


In [62]:
# Prepare training: Compile tf.keras model with optimizer, loss and learning rate schedule
optimizer = tf.keras.optimizers.Adam()
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')

model.compile(optimizer=optimizer, loss=loss, metrics=[metric])

In [63]:
# Train and evaluate using tf.keras.Model.fit()
history = model.fit(train_dataset, epochs=1, validation_data=valid_dataset)



























In [64]:
!mkdir saved_model

mkdir: cannot create directory ‘saved_model’: File exists


In [65]:
model.save_pretrained('saved_model/')

In [66]:
# Load the TensorFlow model
model_tf = TFBertForSequenceClassification.from_pretrained('saved_model/')

Some layers from the model checkpoint at saved_model/ were not used when initializing TFBertForSequenceClassification: ['dropout_479']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at saved_model/.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


In [67]:
# Quickly test a few predictions - MRPC is a paraphrasing task, let's see if our model learned the task
sentence_0 = "This research was consistent with his findings."
sentence_1 = "His findings were compatible with this research."

In [68]:
inputs_tf = tokenizer.encode_plus(sentence_0, sentence_1, 
                                  add_special_tokens=True, 
                                  return_tensors='tf')
inputs_tf

{'input_ids': <tf.Tensor: shape=(1, 19), dtype=int32, numpy=
array([[  101,  1188,  1844,  1108,  8080,  1114,  1117,  9505,   119,
          102,  1230,  9505,  1127, 12173,  1114,  1142,  1844,   119,
          102]], dtype=int32)>, 'token_type_ids': <tf.Tensor: shape=(1, 19), dtype=int32, numpy=
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
      dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(1, 19), dtype=int32, numpy=
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
      dtype=int32)>}

In [69]:
pred_tf = np.argmax(model_tf(inputs_tf['input_ids'], 
                             token_type_ids=inputs_tf['token_type_ids'])[0].numpy())

In [70]:
print("sentence_1 is", "a paraphrase" if pred_tf else "Not a paraphrase", "of sentence_0")

sentence_1 is a paraphrase of sentence_0


# Token Classification - Named Entity Recognition

Named Entity Recognition (NER) is the task of classifying tokens according to a class, for example identifying a token as a person, an organisation or a location.

Identify tokens as belonging to one of 9 classes:

* O, Outside of a named entity
* B-MIS, Beginning of a miscellaneous entity right after another miscellaneous entity
* I-MIS, Miscellaneous entity
* B-PER, Beginning of a person’s name right after another person’s name
* I-PER, Person’s name
* B-ORG, Beginning of an organisation right after another organisation
* I-ORG, Organisation
* B-LOC, Beginning of a location right after another location
* I-LOC, Location

### Using Pipeline

In [72]:
from transformers import pipeline

nlp_ner = pipeline('ner')
nlp_ner('Hugging Face is a French company based in New-York.')

[{'end': 2,
  'entity': 'I-ORG',
  'index': 1,
  'score': 0.9970937967300415,
  'start': 0,
  'word': 'Hu'},
 {'end': 7,
  'entity': 'I-ORG',
  'index': 2,
  'score': 0.9345751404762268,
  'start': 2,
  'word': '##gging'},
 {'end': 12,
  'entity': 'I-ORG',
  'index': 3,
  'score': 0.9787060618400574,
  'start': 8,
  'word': 'Face'},
 {'end': 24,
  'entity': 'I-MISC',
  'index': 6,
  'score': 0.9981995820999146,
  'start': 18,
  'word': 'French'},
 {'end': 45,
  'entity': 'I-LOC',
  'index': 10,
  'score': 0.9983047246932983,
  'start': 42,
  'word': 'New'},
 {'end': 46,
  'entity': 'I-LOC',
  'index': 11,
  'score': 0.8913456201553345,
  'start': 45,
  'word': '-'},
 {'end': 50,
  'entity': 'I-LOC',
  'index': 12,
  'score': 0.9979523420333862,
  'start': 46,
  'word': 'York'}]

### Using a model and a tokenizer.



In [73]:
import tensorflow as tf

from transformers import TFAutoModelForTokenClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
model = TFAutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")

Some layers from the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing TFBertForTokenClassification: ['dropout_147']
- This IS expected if you are initializing TFBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForTokenClassification were initialized from the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForTokenClassification for predictions without further training.


In [74]:
label_list = [
    "O",       # Outside of a named entity
    "B-MISC",  # Beginning of a miscellaneous entity right after another miscellaneous entity
    "I-MISC",  # Miscellaneous entity
    "B-PER",   # Beginning of a person's name right after another person's name
    "I-PER",   # Person's name
    "B-ORG",   # Beginning of an organisation right after another organisation
    "I-ORG",   # Organisation
    "B-LOC",   # Beginning of a location right after another location
    "I-LOC"    # Location
]

sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very" \
           "close to the Manhattan Bridge."

In [75]:
# Bit of a hack to get the tokens with the special tokens
tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sequence)))

inputs = tokenizer.encode(sequence, return_tensors="tf")

outputs = model(inputs)

In [31]:
outputs

TFTokenClassifierOutput([('logits',
                          <tf.Tensor: shape=(1, 33, 9), dtype=float32, numpy=
                          array([[[ 9.445715  , -2.5186012 , -1.6298739 , -2.0444107 ,
                                   -2.2338536 , -1.7153221 , -0.43147385, -2.0465662 ,
                                    1.1865822 ],
                                  [ 0.792237  , -2.860722  , -0.7416676 , -3.2710087 ,
                                   -0.81230533, -1.724039  ,  9.034602  , -2.5430124 ,
                                   -0.47171772],
                                  [ 1.9150637 , -2.2887528 ,  0.10271975, -3.2836504 ,
                                   -0.05480344, -1.2911512 ,  6.83962   , -2.345627  ,
                                   -0.75989074],
                                  [ 0.65483934, -2.8548815 , -0.22843987, -3.4153838 ,
                                    0.27152863, -1.3837163 ,  7.8292046 , -2.879648  ,
                                   -0.38423

In [76]:
predictions = tf.argmax(outputs[0], axis=2)

print([(token, label_list[prediction]) for token, prediction in zip(tokens, predictions[0].numpy())])

[('[CLS]', 'O'), ('Hu', 'I-ORG'), ('##gging', 'I-ORG'), ('Face', 'I-ORG'), ('Inc', 'I-ORG'), ('.', 'O'), ('is', 'O'), ('a', 'O'), ('company', 'O'), ('based', 'O'), ('in', 'O'), ('New', 'I-LOC'), ('York', 'I-LOC'), ('City', 'I-LOC'), ('.', 'O'), ('Its', 'O'), ('headquarters', 'O'), ('are', 'O'), ('in', 'O'), ('D', 'I-LOC'), ('##UM', 'I-LOC'), ('##BO', 'I-LOC'), (',', 'O'), ('therefore', 'O'), ('very', 'O'), ('##c', 'O'), ('##lose', 'O'), ('to', 'O'), ('the', 'O'), ('Manhattan', 'I-LOC'), ('Bridge', 'I-LOC'), ('.', 'O'), ('[SEP]', 'O')]


# Language Modeling

Language modeling is the task of fitting a model to a corpus, which can be domain specific. All popular transformer based models are trained using a variant of language modeling, e.g. BERT with masked language modeling, GPT-2 with causal language modeling.

Language modeling can be useful outside of pre-training as well, for example to shift the model distribution to be domain-specific: using a language model trained over a very large corpus, and then fine-tuning it to a news dataset

## Masked Language Modeling
Masked language modeling is the task of masking tokens in a sequence with a masking token, and prompting the model to fill that mask with an appropriate token.

### Using pipelines

In [33]:
from transformers import pipeline

nlp_fill = pipeline('fill-mask')
nlp_fill('Hugging Face is a French company based in ' + nlp_fill.tokenizer.mask_token)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=480.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=331070498.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898823.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1355863.0, style=ProgressStyle(descript…




[{'score': 0.27758949995040894,
  'sequence': 'Hugging Face is a French company based in Paris',
  'token': 2201,
  'token_str': ' Paris'},
 {'score': 0.14941208064556122,
  'sequence': 'Hugging Face is a French company based in Lyon',
  'token': 12790,
  'token_str': ' Lyon'},
 {'score': 0.045764174312353134,
  'sequence': 'Hugging Face is a French company based in Geneva',
  'token': 11559,
  'token_str': ' Geneva'},
 {'score': 0.04576260223984718,
  'sequence': 'Hugging Face is a French company based in France',
  'token': 1470,
  'token_str': ' France'},
 {'score': 0.04067583009600639,
  'sequence': 'Hugging Face is a French company based in Brussels',
  'token': 6497,
  'token_str': ' Brussels'}]

### Using a model and a tokenizer

In [34]:
import tensorflow as tf
from transformers import TFAutoModelWithLMHead, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
model = TFAutoModelWithLMHead.from_pretrained("distilbert-base-cased")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=411.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435797.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=29.0, style=ProgressStyle(description_w…






HBox(children=(FloatProgress(value=0.0, description='Downloading', max=354041576.0, style=ProgressStyle(descri…




Some layers from the model checkpoint at distilbert-base-cased were not used when initializing TFDistilBertForMaskedLM: ['activation_13']
- This IS expected if you are initializing TFDistilBertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFDistilBertForMaskedLM were initialized from the model checkpoint at distilbert-base-cased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForMaskedLM for predictions without further training.


In [35]:
sequence = f"Distilled models are smaller than the models they mimic. Using them instead of the large versions would help {tokenizer.mask_token} our carbon footprint."

In [36]:
input = tokenizer.encode(sequence, return_tensors="tf")

mask_token_index = tf.where(input == tokenizer.mask_token_id)[0, 1]
mask_token_index

<tf.Tensor: shape=(), dtype=int64, numpy=23>

In [37]:
token_logits = model(input)[0]
mask_token_logits = token_logits[0, mask_token_index, :]

top_5_tokens = tf.math.top_k(mask_token_logits, 5).indices.numpy()

for token in top_5_tokens:
    print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))

Distilled models are smaller than the models they mimic. Using them instead of the large versions would help reduce our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help increase our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help decrease our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help offset our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help improve our carbon footprint.


## Causal Language Modeling

Causal language modeling is the task of predicting the token following a sequence of tokens. 

__Note__:There is currently no pipeline to do causal language modeling/generation.

### Using the tokenizer and model.

In [38]:
import tensorflow as tf
from transformers import TFAutoModelWithLMHead, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = TFAutoModelWithLMHead.from_pretrained("gpt2")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=665.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1042301.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1355256.0, style=ProgressStyle(descript…






HBox(children=(FloatProgress(value=0.0, description='Downloading', max=497933648.0, style=ProgressStyle(descri…




All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


In [39]:
sequence = "Hugging Face is based in DUMBO, New York City, and is"
input = tokenizer.encode(sequence, return_tensors="tf")

In [40]:
generated = model.generate(input, max_length=50, do_sample=True)

Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence


In [41]:
resulting_string = tokenizer.decode(generated.numpy()[0])
print(resulting_string)

Hugging Face is based in DUMBO, New York City, and is produced by the film's writer Matt Kindt.<|endoftext|>


# Translation

Translation is the task of translating a text from one language to another.

__Note__:Translation is currently supported by `T5` for the language mappings 
  * English-to-French (`translation_en_to_fr`) 
  * English-to-German (`translation_en_to_de`)  
  * English-to-Romania (`translation_en_to_ro`)

### Using Pipeline

In [42]:
from transformers import pipeline

# English to French
nlp_en2fr = pipeline('translation_en_to_fr')
nlp_en2fr("HuggingFace is a French company that is based in New York City. HuggingFace's mission is to solve NLP one commit at a time")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1199.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=891691430.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=791656.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1389353.0, style=ProgressStyle(descript…




[{'translation_text': 'HuggingFace est une entreprise française basée à New York.'}]

### Using a model and a tokenizer.

In [43]:
from transformers import TFAutoModelWithLMHead, AutoTokenizer

model = TFAutoModelWithLMHead.from_pretrained("t5-base")
tokenizer = AutoTokenizer.from_pretrained("t5-base")



HBox(children=(FloatProgress(value=0.0, description='Downloading', max=892146080.0, style=ProgressStyle(descri…




All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.

All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at t5-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


In [44]:
inputs = tokenizer.encode("translate English to German: Hugging Face is a technology company based in New York and Paris", 
                          return_tensors="tf")

In [45]:
outputs = model.generate(inputs, max_length=40, num_beams=4, early_stopping=True)

print(outputs)

tf.Tensor(
[[    0 11560  3896  8881   229   236     3 14366 15377   181 11216    16
    368  1060    64  1919     5]], shape=(1, 17), dtype=int32)


In [46]:
resulting_string = tokenizer.decode(outputs.numpy()[0])
print(resulting_string)

<pad> Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.


** Reference:**
  * https://huggingface.co/transformers/quickstart.html
  * https://github.com/huggingface/transformers/tree/master/notebooks

  Pretrained models
  
  * https://huggingface.co/transformers/pretrained_models.html