# Models

In [1]:
from transformers import BertConfig, TFBertModel

Get Model Config

In [2]:
# Getting Model Configuration
config = BertConfig()
print(config)

BertConfig {
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.35.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}



Build Model from Config

In [3]:
# Let's build Model from Above Config

model = TFBertModel(config)

Above model does not have trained weights. It initializes random weights as per configuration. So result of this model will be giberish.

Load Pre-Trained Weights

In [4]:
model = TFBertModel.from_pretrained("bert-base-cased")

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFBertModel: ['cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing TFBertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions w

save this model with checkpoints to local

In [5]:
model.save_pretrained('checkpoints')

Let's use these weights and produce some output.

In [7]:
sequences = ["Hello!", "Cool.", "Nice!"]

This input sequence can't be fed to model. Model needs numerical rectangular sahped input array. It can be done using tokenizer.

In [8]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
encoded_sequence = tokenizer(sequences)
print(encoded_sequence)

{'input_ids': [[101, 8667, 106, 102], [101, 13297, 119, 102], [101, 8835, 106, 102]], 'token_type_ids': [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]], 'attention_mask': [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]}


Lets give these input IDs to model and see the output

In [10]:
import tensorflow as tf

model_input = tf.constant(encoded_sequence.input_ids)
print(model_input)

tf.Tensor(
[[  101  8667   106   102]
 [  101 13297   119   102]
 [  101  8835   106   102]], shape=(3, 4), dtype=int32)


In [11]:
output = model(model_input)
print(output)

TFBaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=<tf.Tensor: shape=(3, 4, 768), dtype=float32, numpy=
array([[[ 0.62828964,  0.21656758,  0.560512  , ...,  0.01361646,
          0.615793  , -0.17120235],
        [ 0.6108095 , -0.22526626,  0.92628855, ..., -0.30280808,
          0.4499951 , -0.07135769],
        [ 0.8039674 ,  0.18094489,  0.70756704, ..., -0.0684973 ,
          0.48369724, -0.07738485],
        [ 1.3290284 ,  0.2359512 ,  0.4566581 , ...,  0.1508819 ,
          0.9621055 , -0.48411724]],

       [[ 0.31276435,  0.17181471,  0.20987779, ..., -0.0721087 ,
          0.4918776 , -0.13833432],
        [ 0.15445176, -0.37572706,  0.7187129 , ..., -0.31295148,
          0.2821989 ,  0.18830812],
        [ 0.41229045,  0.37207627,  0.54835176, ...,  0.07883389,
          0.56807595, -0.27571857],
        [ 0.8356334 ,  0.39642993, -0.41206455, ...,  0.183796  ,
          1.6364969 , -0.4806332 ]],

       [[ 0.53993607,  0.25642586,  0.25112194, ..., -0.176017

In [12]:
print(output.last_hidden_state.shape)

(3, 4, 768)


These outputs are logits. Further Preprocessing is needed to extract meaning out of these. Preprocessing is learnt in upcoming notebooks of this course