In [58]:
import tensorflow as tf
from transformers import pipeline, AutoConfig, AutoTokenizer

from transformers.models.gpt2 import TFGPT2Model, TFGPT2LMHeadModel

**Masked language models**

In [5]:
predictor = pipeline("fill-mask", model = "bert-base-uncased")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


In [6]:
predictor("This is a black [MASK].")

[{'score': 0.09645822644233704,
  'token': 3482,
  'token_str': 'box',
  'sequence': 'this is a black box.'},
 {'score': 0.09154438227415085,
  'token': 4920,
  'token_str': 'hole',
  'sequence': 'this is a black hole.'},
 {'score': 0.03539649769663811,
  'token': 2338,
  'token_str': 'book',
  'sequence': 'this is a black book.'},
 {'score': 0.012868347577750683,
  'token': 2862,
  'token_str': 'list',
  'sequence': 'this is a black list.'},
 {'score': 0.012549272738397121,
  'token': 2208,
  'token_str': 'game',
  'sequence': 'this is a black game.'}]

In [8]:
predictor("This is a [MASK] cat.")

[{'score': 0.06788135319948196,
  'token': 2502,
  'token_str': 'big',
  'sequence': 'this is a big cat.'},
 {'score': 0.05169161781668663,
  'token': 2304,
  'token_str': 'black',
  'sequence': 'this is a black cat.'},
 {'score': 0.04585549980401993,
  'token': 4968,
  'token_str': 'domestic',
  'sequence': 'this is a domestic cat.'},
 {'score': 0.030389586463570595,
  'token': 2312,
  'token_str': 'large',
  'sequence': 'this is a large cat.'},
 {'score': 0.023371988907456398,
  'token': 3748,
  'token_str': 'wild',
  'sequence': 'this is a wild cat.'}]

In [14]:
predictor("The most appropriate activity for men in their 40s is [MASK].")

[{'score': 0.07647540420293808,
  'token': 5742,
  'token_str': 'swimming',
  'sequence': 'the most appropriate activity for men in their 40s is swimming.'},
 {'score': 0.050409186631441116,
  'token': 13039,
  'token_str': 'hiking',
  'sequence': 'the most appropriate activity for men in their 40s is hiking.'},
 {'score': 0.04389666020870209,
  'token': 3788,
  'token_str': 'walking',
  'sequence': 'the most appropriate activity for men in their 40s is walking.'},
 {'score': 0.036691341549158096,
  'token': 5613,
  'token_str': 'dancing',
  'sequence': 'the most appropriate activity for men in their 40s is dancing.'},
 {'score': 0.03491625934839249,
  'token': 8434,
  'token_str': 'cooking',
  'sequence': 'the most appropriate activity for men in their 40s is cooking.'}]

In [15]:
predictor("The most appropriate activity for women in their 40s is [MASK].")

[{'score': 0.05800117552280426,
  'token': 8434,
  'token_str': 'cooking',
  'sequence': 'the most appropriate activity for women in their 40s is cooking.'},
 {'score': 0.047593854367733,
  'token': 5742,
  'token_str': 'swimming',
  'sequence': 'the most appropriate activity for women in their 40s is swimming.'},
 {'score': 0.038005974143743515,
  'token': 5613,
  'token_str': 'dancing',
  'sequence': 'the most appropriate activity for women in their 40s is dancing.'},
 {'score': 0.03511248901486397,
  'token': 3752,
  'token_str': 'reading',
  'sequence': 'the most appropriate activity for women in their 40s is reading.'},
 {'score': 0.03419367969036102,
  'token': 13039,
  'token_str': 'hiking',
  'sequence': 'the most appropriate activity for women in their 40s is hiking.'}]

**Sentiment analysis**

In [16]:
sentiment_predictor = pipeline("sentiment-analysis", model = "finiteautomata/bertweet-base-sentiment-analysis")

config.json:   0%|          | 0.00/949 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/540M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/338 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/843k [00:00<?, ?B/s]

bpe.codes:   0%|          | 0.00/1.08M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/22.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/167 [00:00<?, ?B/s]

emoji is not installed, thus not converting emoticons or emojis into text. Install emoji: pip3 install emoji==0.6.0
Device set to use cpu


In [22]:
sentiment_predictor("You are a great person.")

[{'label': 'LABEL_0', 'score': 0.5667186975479126}]

In [19]:
sentiment_predictor("Are you ok?")

[{'label': 'NEU', 'score': 0.9219158291816711}]

In [20]:
sentiment_predictor("You are a mean kid.")

[{'label': 'NEG', 'score': 0.9779518842697144}]

In [21]:
sentiment_predictor = pipeline("sentiment-analysis", model = "distilbert-base-uncased")

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Device set to use cpu


In [23]:
sentiment_predictor("You are a great person.")

[{'label': 'LABEL_0', 'score': 0.5667186975479126}]

In [24]:
sentiment_predictor("Are you ok?")

[{'label': 'LABEL_0', 'score': 0.5767320990562439}]

In [25]:
sentiment_predictor("You are a mean kid.")

[{'label': 'LABEL_0', 'score': 0.5762676000595093}]

In [28]:
gpt2_model = TFGPT2Model(config=AutoConfig.from_pretrained("gpt2"))

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--gpt2/snapshots/607a30d783dfa663caf39e06633721c8d4cfcd7e/config.json
Model config GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_vers

It does not accept text as an input.

In [34]:
gpt2_model.config

GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.47.1",
  "use_cache": true,
  "vocab_size": 50257
}

In [37]:
tokenizer = AutoTokenizer.from_pretrained("gpt2")

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--gpt2/snapshots/607a30d783dfa663caf39e06633721c8d4cfcd7e/config.json
Model config GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_vers

In [40]:
tokenizer.encode("This is a black cat.")

[1212, 318, 257, 2042, 3797, 13]

In [43]:
tokenizer.add_special_tokens({"pad_token": "<|endoftext|>"})

1

In [44]:
tokenizer.special_tokens_map

{'bos_token': '<|endoftext|>',
 'eos_token': '<|endoftext|>',
 'unk_token': '<|endoftext|>',
 'pad_token': '<|endoftext|>'}

In [45]:
tokenizer("This is a black cat.", padding="max_length")

{'input_ids': [1212, 318, 257, 2042, 3797, 13, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 5

In [48]:
tokenizer.decode([1212, 318, 257, 2042, 3797, 13, 50256])

'This is a black cat.<|endoftext|>'

In [50]:
model_input = tokenizer("This is a black cat.", padding="max_length", return_tensors="tf")

In [61]:
# gpt2_model(input_ids=model_input["input_ids"], attention_mask=model_input["attention_mask"])

In [53]:
output = gpt2_model(input_ids=model_input["input_ids"], attention_mask=model_input["attention_mask"])

In [54]:
output.keys()

odict_keys(['last_hidden_state', 'past_key_values'])

In [57]:
output["last_hidden_state"].shape

TensorShape([1, 1024, 768])

In [62]:
gpt2_lm = TFGPT2LMHeadModel(config = AutoConfig.from_pretrained('gpt2'))


loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--gpt2/snapshots/607a30d783dfa663caf39e06633721c8d4cfcd7e/config.json
Model config GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_vers

In [80]:
# gpt2_lm(input_ids=model_input["input_ids"], attention_mask=model_input["attention_mask"])

In [64]:
result = gpt2_lm(input_ids=tf.constant(model_input["input_ids"]))["logits"]

In [68]:
tf.argmax(result, axis=-1)[0].numpy()

array([ 4823, 17452, 31457, ...,  4477,  4477,  4477])

In [66]:
tokenizer.decode(tf.argmax(result, axis=-1)[0].numpy())

' graph unanim denomin suffer sluggishiron unanim unanimTPTPTPenvenv Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols Nichols modes Nichols Nichols Nichols Nichols Nichols Nichols modes modes Nichols modes modes modes modes modes modes modes modes modes Nichols modes modes modes modes modes Nichols modes modes modes modes modes modes modes modes modes modes modes Stability modes Nichols modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes modes Stability modes modes modes modes modes modes modes modes modes modes m

In [79]:
# gpt2_lm.generate(do_sample = True, input_ids=model_input["input_ids"], attention_mask=model_input["attention_mask"], max_length = 2000)

Softmax activation function
$$ \sigma(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{n} e^{z_j}} $$

In order to reduce the differences between the probabilities of different classes, we use t (temperature).

Smaller t: Increases differences between probabilities (sharper distribution).

Larger t: Reduces differences between probabilities (smoother distribution).

$$ \sigma(z_i) = \frac{e^{z_i / t}}{\sum_{j=1}^{n} e^{z_j / t}} $$