In [None]:
!pip install transformers

# We can use the bert model from transformer for retraining the whole transformer from our custom dataset. The previous model weights will not be saved using this method. 

In [9]:
from transformers import BertConfig, TFBertModel

# Building the config
config = BertConfig()

# Building the model from the config
model = TFBertModel(config)


In [10]:
config

BertConfig {
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.29.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

# The below method can be used to get the weights from pretrained model so that we can easily do transfer learning and less training time to our custom dataset

In [16]:
from transformers import TFBertModel,AutoTokenizer

model = TFBertModel.from_pretrained("bert-base-cased")

Some layers from the model checkpoint at bert-base-cased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-cased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


In [12]:
model.save_pretrained("./mymodel")

it will save my model in the local directory

In [15]:
% ls mymodel

config.json  tf_model.h5


It saved one config.json file which containes all about the model architecture. And another is the tf_model.h5, which contains all the weight of the pretrained bert model.

In [17]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [27]:
tokens=tokenizer("hello dilip pokhrel", return_tensors="tf",padding=True)

tokenize the input sentence

In [28]:
tokens

{'input_ids': <tf.Tensor: shape=(1, 9), dtype=int32, numpy=
array([[  101, 19082,  4267, 10913,   185,  5926,  8167,  1883,   102]],
      dtype=int32)>, 'token_type_ids': <tf.Tensor: shape=(1, 9), dtype=int32, numpy=array([[0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(1, 9), dtype=int32, numpy=array([[1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int32)>}

In [23]:
outputs = model(tokens)

generate high dimensional vector representation of input sentence

In [25]:
outputs.last_hidden_state.shape

TensorShape([1, 9, 768])

looking into the shape of the inputs. there are 9 words. actually my input word is of only three  words, but may be there is starting token, ending token and many more configurations. so it becomes size of 9. This means for each of the input word there is 768 length vector. so (1,9,768) shape represents, 1 sentence, 9 words, and 768 length vector