<a href="https://colab.research.google.com/github/abolfazlaghdaee/LLM_journey/blob/main/Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Create Transformer**

In [None]:
!pip install transformers



In [None]:
from transformers import BertConfig, TFBertModel

config = BertConfig()

model = TFBertModel(config)

In [None]:
print(config)

BertConfig {
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.48.3",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}



In [None]:
model = TFBertModel.from_pretrained("bert-base-cased")

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFBertModel: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing TFBertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions w

In the code sample above we didn’t use BertConfig, and instead loaded a pretrained model via the bert-base-cased identifier. This is a model checkpoint that was trained by the authors of BERT themselves; you can find more details about it in its model card.


This model is now initialized with all the weights of the checkpoint. It can be used directly for inference on the tasks it was trained on, and it can also be fine-tuned on a new task. By training with pretrained weights rather than from scratch, we can quickly achieve good results.

**Saving methods**
Saving a model is as easy as loading one — we use the `save_pretrained()` method, which is analogous to the `from_pretrained()` method:

In [None]:
model.save_pretrained("directory_on_my_computer")

In [None]:
! ls directory_on_my_computer

config.json  tf_model.h5


If you take a look at the config.json file, you’ll recognize the attributes necessary to build the model architecture. This file also contains some metadata, such as where the checkpoint originated and what 🤗 Transformers version you were using when you last saved the checkpoint.

The tf_model.h5 file is known as the state dictionary; it contains all your model’s weights. The two files go hand in hand; the configuration is necessary to know your model’s architecture, while the model weights are your model’s parameters.

**Using a Transformer model for inference**

Now that you know how to load and save a model, let’s try using it to make some predictions. Transformer models can only process numbers — numbers that the tokenizer generates. But before we discuss tokenizers, let’s explore what inputs the model accepts.

In [None]:
sequences = ["Hello!", "Cool.", "Nice!"]


The tokenizer converts these to vocabulary indices which are typically called input IDs. Each sequence is now a list of numbers! The resulting output is:

In [None]:
encoded_sequences = [
    [101, 7592, 999, 102],
    [101, 4658, 1012, 102],
    [101, 3835, 999, 102],
]

In [None]:
import tensorflow as tf

model_inputs = tf.constant(encoded_sequences)
model_inputs

<tf.Tensor: shape=(3, 4), dtype=int32, numpy=
array([[ 101, 7592,  999,  102],
       [ 101, 4658, 1012,  102],
       [ 101, 3835,  999,  102]], dtype=int32)>

**Using the tensors as inputs to the model**

Making use of the tensors with the model is extremely simple — we just call the model with the inputs:



In [None]:
output = model(model_inputs)
output

TFBaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=<tf.Tensor: shape=(3, 4, 768), dtype=float32, numpy=
array([[[ 4.4495720e-01,  4.8276266e-01,  2.7797195e-01, ...,
         -5.4032512e-02,  3.9393452e-01, -9.4770171e-02],
        [ 2.4942856e-01, -4.4093007e-01,  8.1772351e-01, ...,
         -3.1916574e-01,  2.2992244e-01, -4.1171826e-02],
        [ 1.3667521e-01,  2.2517829e-01,  1.4501986e-01, ...,
         -4.6914719e-02,  2.8224176e-01,  7.5565726e-02],
        [ 1.1788852e+00,  1.6738594e-01, -1.8187129e-01, ...,
          2.4671327e-01,  1.0440764e+00, -6.1971731e-03]],

       [[ 3.6435854e-01,  3.2464165e-02,  2.0257674e-01, ...,
          6.0110305e-02,  3.2451269e-01, -2.0995576e-02],
        [ 7.1865946e-01, -4.8725176e-01,  5.1740396e-01, ...,
         -4.4012007e-01,  1.4553027e-01, -3.7544809e-02],
        [ 3.3223265e-01, -2.3270954e-01,  9.4876289e-02, ...,
         -2.5268194e-01,  3.2171953e-01,  8.1103947e-04],
        [ 1.2523210e+00,  3.5754240e-01,