## Creating a BERT Transformer model

### Load and Save a model

In [1]:
from transformers import BertConfig, TFBertModel

# Building the config - configuration object
config = BertConfig()

# Building the model from the config
model = TFBertModel(config)

  from .autonotebook import tqdm as notebook_tqdm
2024-05-26 15:52:55.128043: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
# The configuration contains many attributes that are used to build the model:
print(config)

BertConfig {
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.41.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}



* the hidden_size attribute defines the size of the hidden_states vector  
* num_hidden_layers defines the number of layers the Transformer model has

The loading method used above initialises it with random values (random parameters, I assume).  
The model needs to be trained.

Let's use a pre-trained model:

In [1]:
from transformers import TFBertModel

model = TFBertModel.from_pretrained("bert-base-cased")

  from .autonotebook import tqdm as notebook_tqdm
2024-05-27 18:12:29.456389: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFBertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing TFBertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from

"This model is now initialized with all the weights of the checkpoint.  
It can be used directly for inference on the tasks it was trained on, and it can also be fine-tuned on a new task.  
By training with pretrained weights rather than from scratch, we can quickly achieve good results."

In [2]:
# save the model locally with: 
# model.save_pretrained("directory_on_my_computer") 

# this would generate: 
# config.json
# tf_model.h5

# The tf_model.h5 file is known as the state dictionary; it contains all your model’s weights. The two files go hand in hand; 
# the configuration is necessary to know your model’s architecture, 
# while the model weights are your model’s parameters.

### Inference with a model

"Transformer models can only process numbers — numbers that the tokenizer generates. But before we discuss tokenizers, let’s explore what inputs the model accepts."  

* Tokenizer takes an input of words (input IDs) and turns them into tokens and then into a list of numbers - in the model framework's required tensors
    * A list of encoded sequences: a list of lists (of numbers)
    * Turn into tensors
    * Submit as input to the model

In [3]:
sequences = ["Hello!", "Cool.", "Nice!"]

# put through a tokenizer outputs:
encoded_sequences = [
    [101, 7592, 999, 102],
    [101, 4658, 1012, 102],
    [101, 3835, 999, 102],
]

# into tensors:
import tensorflow as tf

model_inputs = tf.constant(encoded_sequences)

In [4]:
output = model(model_inputs)