### Loading the model

In [1]:
from transformers import BertConfig, TFBertModel

# Building the config
config = BertConfig()

# Building the model from the config
model = TFBertModel(config)


In [2]:
config

BertConfig {
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.25.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

### Loading pre trained model

In [4]:
model = TFBertModel.from_pretrained("bert-base-cased")
model

Some layers from the model checkpoint at bert-base-cased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-cased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


<transformers.models.bert.modeling_tf_bert.TFBertModel at 0x291ef4640>

### Inputs

In [5]:
sequences = ["Hello!", "Cool.", "Nice!"]

In [6]:
encoded_sequences = [
    [101, 7592, 999, 102],
    [101, 4658, 1012, 102],
    [101, 3835, 999, 102],
]

In [8]:
import tensorflow as tf
model_inputs = tf.constant(encoded_sequences)
model_inputs

<tf.Tensor: shape=(3, 4), dtype=int32, numpy=
array([[ 101, 7592,  999,  102],
       [ 101, 4658, 1012,  102],
       [ 101, 3835,  999,  102]], dtype=int32)>

In [9]:
output = model(model_inputs)
output

TFBaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=<tf.Tensor: shape=(3, 4, 768), dtype=float32, numpy=
array([[[ 4.4495693e-01,  4.8276237e-01,  2.7797231e-01, ...,
         -5.4032277e-02,  3.9393473e-01, -9.4769850e-02],
        [ 2.4942882e-01, -4.4092998e-01,  8.1772339e-01, ...,
         -3.1916562e-01,  2.2992229e-01, -4.1171607e-02],
        [ 1.3667561e-01,  2.2517815e-01,  1.4502043e-01, ...,
         -4.6914484e-02,  2.8224206e-01,  7.5565889e-02],
        [ 1.1788861e+00,  1.6738467e-01, -1.8187001e-01, ...,
          2.4671446e-01,  1.0440780e+00, -6.1970316e-03]],

       [[ 3.6435831e-01,  3.2464463e-02,  2.0257670e-01, ...,
          6.0110882e-02,  3.2451272e-01, -2.0995531e-02],
        [ 7.1865952e-01, -4.8725182e-01,  5.1740390e-01, ...,
         -4.4011989e-01,  1.4553049e-01, -3.7544839e-02],
        [ 3.3223230e-01, -2.3270883e-01,  9.4875634e-02, ...,
         -2.5268146e-01,  3.2171986e-01,  8.1131514e-04],
        [ 1.2523218e+00,  3.5754350e-01,