In [21]:
import tensorflow as tf
from transformers import pipeline
print(tf.config.list_physical_devices())

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


# Full Pipeline

In [22]:
classifier = pipeline('sentiment-analysis')

classifier(
    [
        "I am so excited for this HuggingFace course",
        "I hate this so much!"
    ]
)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceCla

[{'label': 'POSITIVE', 'score': 0.9996844530105591},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

## Preprocessing using a Custom Tokenizer

In [23]:
from transformers import AutoTokenizer

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [24]:
raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
# Return tensors: tf -> Tensorflow, pt -> PyTorch, np -> NumPy
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="tf")
print(inputs)

{'input_ids': <tf.Tensor: shape=(2, 16), dtype=int32, numpy=
array([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662,
        12172,  2607,  2026,  2878,  2166,  1012,   102],
       [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,
            0,     0,     0,     0,     0,     0,     0]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(2, 16), dtype=int32, numpy=
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>}


## Pass through Model

In [25]:
from transformers import TFAutoModel

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = TFAutoModel.from_pretrained(checkpoint)

Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertModel: ['classifier', 'pre_classifier', 'dropout_19']
- This IS expected if you are initializing TFDistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFDistilBertModel were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertModel for predictions without further training.


In [26]:
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)

(2, 16, 768)


## Model Heads

The model heads take the high-dimensional vector of hidden states as input and project them onto a different dimension. They are usually composed of one or a few linear layers:

![Model Arch](transformer_and_head.svg)

In this diagram, the model is represented by its embeddings layer and the subsequent layers. The embeddings layer converts each input ID in the tokenized input into a vector that represents the associated token. The subsequent layers manipulate those vectors using the attention mechanism to produce the final representation of the sentences.


In [27]:
from transformers import TFAutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)

print(f"Output Logits: {outputs.logits}")
for idx, res in enumerate(tf.nn.softmax(outputs.logits)):
    print(f"----Input: {raw_inputs[idx]}")
    for label, value in zip(model.config.id2label.values(), res):
        print(f"{label}: {value}")

Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english and are newly initialized: ['dropout_297']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Output Logits: [[-1.5606991  1.6122841]
 [ 4.169231  -3.346447 ]]
----Input: I've been waiting for a HuggingFace course my whole life.
NEGATIVE: 0.04019516333937645
POSITIVE: 0.9598048329353333
----Input: I hate this so much!
NEGATIVE: 0.9994558691978455
POSITIVE: 0.000544184644240886


# Loading any Model

In [29]:
from transformers import TFAutoModel

bert_model = TFAutoModel.from_pretrained("bert-base-uncased")
gpt_model = TFAutoModel.from_pretrained("gpt2")

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.
Downloading (…)lve/main/config.json: 100%|██████████| 665/665 [00:00<00:00, 88.6kB/s]
Downloading tf_model.h5: 100%|██████████| 498M/498M [00:17<00:00, 29.0MB/s] 


In [32]:
from transformers import BertConfig
bert_config = BertConfig.from_pretrained("bert-base-uncased", num_hidden_layers = 10)
print(bert_config)

BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 10,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.28.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}



In [34]:
from transformers import TFBertModel

bert_model = TFBertModel(bert_config)

bert_model.save_pretrained("my-bert-model")

In [35]:
bert_model = TFAutoModel.from_pretrained("./my-bert-model/")

All model checkpoint layers were used when initializing TFBertModel.

Some layers of TFBertModel were not initialized from the model checkpoint at ./my-bert-model/ and are newly initialized: ['bert/encoder/layer_._5/attention/self/key/bias:0', 'bert/encoder/layer_._6/intermediate/dense/bias:0', 'bert/encoder/layer_._3/attention/self/key/kernel:0', 'bert/encoder/layer_._4/attention/output/dense/bias:0', 'bert/encoder/layer_._4/attention/self/value/kernel:0', 'bert/encoder/layer_._8/attention/self/value/kernel:0', 'bert/encoder/layer_._0/output/dense/bias:0', 'bert/encoder/layer_._9/output/LayerNorm/beta:0', 'bert/encoder/layer_._3/attention/output/LayerNorm/beta:0', 'bert/encoder/layer_._9/attention/self/key/kernel:0', 'bert/encoder/layer_._9/output/dense/kernel:0', 'bert/embeddings/word_embeddings/weight:0', 'bert/encoder/layer_._5/attention/output/LayerNorm/beta:0', 'bert/encoder/layer_._0/attention/output/dense/bias:0', 'bert/encoder/layer_._5/attention/output/dense/kernel:0', 'bert/

In [42]:
sequences = ["Hello!", "Cool.", "Nice!"]
encoded_sequences = tokenizer(sequences, return_tensors='tf')
print(encoded_sequences)
example_2_outputs = model(encoded_sequences)
print(example_2_outputs)

{'input_ids': <tf.Tensor: shape=(3, 4), dtype=int32, numpy=
array([[ 101, 7592,  999,  102],
       [ 101, 4658, 1012,  102],
       [ 101, 3835,  999,  102]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(3, 4), dtype=int32, numpy=
array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int32)>}
TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[-3.7234988,  3.9690614],
       [-4.2218633,  4.580667 ],
       [-4.285249 ,  4.6165533]], dtype=float32)>, hidden_states=None, attentions=None)
