In [2]:
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# Get a trained sentiment-analysis classifier

classifier = pipeline("sentiment-analysis") # many other tasks are available
result = classifier("The actors were very convincing")
result

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


[{'label': 'POSITIVE', 'score': 0.9998143315315247}]

In [19]:
# Example for bias

classifier(["I am Democrat"])

[{'label': 'NEGATIVE', 'score': 0.9240571856498718}]

## Another example - generating text

In [1]:
# Get the open-ai gpt, pre-trained model

from transformers import TFOpenAIGPTLMHeadModel

model = TFOpenAIGPTLMHeadModel.from_pretrained("openai-gpt")

  from .autonotebook import tqdm as notebook_tqdm
Downloading (…)lve/main/config.json: 100%|██████████| 656/656 [00:00<00:00, 1.07MB/s]
Downloading model.safetensors: 100%|██████████| 479M/479M [00:07<00:00, 62.4MB/s] 
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFOpenAIGPTLMHeadModel: ['h.7.attn.bias', 'h.0.attn.bias', 'h.4.attn.bias', 'h.5.attn.bias', 'h.10.attn.bias', 'h.8.attn.bias', 'h.9.attn.bias', 'h.6.attn.bias', 'h.11.attn.bias', 'h.3.attn.bias', 'h.2.attn.bias', 'h.1.attn.bias']
- This IS expected if you are initializing TFOpenAIGPTLMHeadModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFOpenAIGPTLMHeadModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).

In [2]:
# Get the model's tokenizer

from transformers import OpenAIGPTTokenizer

tokenizer = OpenAIGPTTokenizer.from_pretrained("openai-gpt")

Downloading (…)olve/main/vocab.json: 100%|██████████| 816k/816k [00:00<00:00, 8.36MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 458k/458k [00:00<00:00, 20.6MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 1.27M/1.27M [00:00<00:00, 21.1MB/s]
ftfy or spacy is not installed using BERT BasicTokenizer instead of SpaCy & ftfy.


In [4]:
print (tokenizer("hello everyone"))

{'input_ids': [3570, 1473], 'attention_mask': [1, 1]}


In [5]:
prompt_text = "This royal throne of kings, this sceptred isle"
encoded_prompt = tokenizer.encode(prompt_text, add_special_tokens=False, return_tensors="tf")
encoded_prompt

<tf.Tensor: shape=(1, 10), dtype=int32, numpy=
array([[  616,  5751,  6404,   498,  9606,   240,   616, 26271,  7428,
        16187]], dtype=int32)>

In [6]:
# Generate 5 sentences, each 40 tokens using the prompt_text

num_sequences = 5
length = 40

generated_sequences = model.generate(
    input_ids=encoded_prompt,
    do_sample=True,
    max_length=length + len(encoded_prompt[0]),
    temperature=1.0,
    top_k=0,
    top_p=0.9,
    repetition_penalty=1.0,
    num_return_sequences=num_sequences,
)

generated_sequences

<tf.Tensor: shape=(5, 50), dtype=int32, numpy=
array([[  616,  5751,  6404,   498,  9606,   240,   616, 26271,  7428,
        16187,   240,   668,   781,   481,  3032,   240,   488,   781,
          487,  1072,   507,   715,   513,   756,   239,   487,   603,
          485,   513,   240,   244,   547,  2021,   240,   812,   512,
          851,   481,  2817,  3859,   481,  1119,   498,   246,   618,
          257,   488,   674,   812,   512],
       [  616,  5751,  6404,   498,  9606,   240,   616, 26271,  7428,
        16187,   267,   520,   636,   580,   481, 22685,   239, 40477,
          244,   921,   481,  1813,   597,   239,   244, 40477,   491,
          929,   240, 20991,   866,   481,  1002,  5740,   485,   513,
         1173,   240,   674,   481,  2216,  1351,   485,  2071,   239,
          998,   507,  2337,  1879,   240],
       [  616,  5751,  6404,   498,  9606,   240,   616, 26271,  7428,
        16187,   509,   246,  3458,  1101, 15834,   240,   488,   487,
         1313

In [7]:
# Decode the sentences

for sequence in generated_sequences:
    text = tokenizer.decode(sequence, clean_up_tokenization_spaces=True)
    print(text)
    print("-" * 80)

this royal throne of kings, this sceptred isle, just before the battle, and before he put it over her head. he said to her, " my child, will you let the metal accept the love of a king? and then will you
--------------------------------------------------------------------------------
this royal throne of kings, this sceptred isle! she would be the regent. 
 " take the ring now. " 
 at first, melaina thought the voice belonged to her mother, then the figure began to speak. though it sounded human,
--------------------------------------------------------------------------------
this royal throne of kings, this sceptred isle was a godless entity, and he held it the god's right to command a council which did not open and to speak, nor indeed discuss, or take part in a discussion which would make anything
--------------------------------------------------------------------------------
this royal throne of kings, this sceptred isle of the angaraks as home for the forest ; now nor have their 