# Guide to use BERT model using Transformers

## Installation

In [None]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.32.0-py3-none-any.whl (7.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.5/7.5 MB[0m [31m58.8 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.15.1 (from transformers)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m29.9 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m72.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m64.1 MB/s[0m eta [36m0:00:0

## Import modules

In [None]:
import torch
from transformers import BertTokenizer, BertForQuestionAnswering

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## Example-1: Use BERT Tokenizer and BERT pre-trained models

### Load the pre-trained BERT model and its tokenizer.

The model and tokenizer should be specifically designed for question-answering tasks:




In [None]:
model_name = "bert-large-uncased-whole-word-masking-finetuned-squad"

tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForQuestionAnswering.from_pretrained(model_name)

Some weights of the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


### Tokenization:

Tokenization is the process of converting raw text into a format suitable for input to the model. For BERT, this involves breaking text into tokens and mapping them to their corresponding IDs.

In [None]:
# Example context
context = """
AI Planet is a global AI community with headquarters in Belgium and India.
It started with the vision to make AI education accessible to everyone and build AI for good to solve key challenges of humanity.
As part of our community initiatives, we provide free AI and data science courses by industry experts from large tech companies or
startups worldwide. Over 300K+ learners from 150+ countries have benefited since our inception in 2020.
"""

In [None]:
question = "What is the vision of AI Planet?"

In [None]:
#tokenize
inputs = tokenizer(question, context, return_tensors="pt")

## Model inference

In [None]:
output = model(**inputs)

### Predictions

In [None]:
output

QuestionAnsweringModelOutput(loss=None, start_logits=tensor([[-6.2164, -4.6178, -7.4844, -7.4729, -7.8819, -8.5008, -8.3274, -9.4713,
         -9.5067, -6.2164, -0.8900, -4.5244, -6.1605, -5.0710, -3.4343, -4.2045,
         -5.6482, -7.3996, -6.0562, -7.8513, -5.3013, -8.0156, -4.7465, -6.2164,
          2.3201, -1.2765, -1.3671,  3.1735,  1.5344,  5.9816,  6.5448,  1.2525,
         -0.4587, -0.9794, -3.6875, -2.2243, -3.2219,  2.1678, -1.9828, -4.5240,
         -2.9454,  0.8003, -0.1549, -3.1594, -3.3971, -6.1947, -2.4343, -4.1118,
         -5.3309, -6.9898, -8.1613, -5.1530, -6.0295, -5.9806, -8.0785, -3.5412,
         -4.1596, -4.5303, -5.6748, -8.4360, -6.4948, -7.5843, -5.4967, -8.4173,
         -7.1801, -7.2999, -8.6525, -7.7577, -7.4749, -7.8930, -8.6425, -6.6666,
         -8.8628, -7.4559, -7.4648, -5.2379, -7.1483, -8.4080, -8.1202, -6.1591,
         -8.1705, -7.4948, -8.5753, -7.3179, -7.7387, -6.0562, -7.3644, -6.8786,
         -5.9852, -7.3483, -4.0782, -8.1759, -6.2161]],


In [None]:
answer_start_index = int(torch.argmax(output.start_logits, axis=-1)[0])
answer_end_index = int(torch.argmax(output.end_logits, axis=-1)[0])

- **output.start_logits**: likely represents the output scores or logits generated by the model for the starting position of the answer span in the input passage.
-  **output.end_logits** represents the output scores or logits generated by the model for the ending position of the answer span in the input passage.

The returned resoibse is unique IDs for the token. Using Tokenization decode every id into text.

In [None]:
predict = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
tokenizer.decode(predict)

'make ai education accessible to everyone and build ai for good to solve key challenges of humanity'

### Important Note: 

The Transformers library ensures compatibility between tokenizers and models. When selecting a model and tokenizer pair, make sure they are compatible and intended for the same architecture. For example, you should use a BERT tokenizer with a BERT model, a GPT-2 tokenizer with a GPT-2 model, and so on.

The Hugging Face model hub provides a variety of pre-trained models and their associated tokenizers. You can find the specific model name in the documentation. For instance, if you're using BERT for sequence classification, you might use bert-base-uncased for both the tokenizer and the model.

Remember to check the specific model's documentation for usage details related to inputs, outputs, and special features, as they can vary based on the model's architecture and intended task. The Transformers library documentation and Hugging Face's GitHub repository are excellent resources to explore more about different models, tokenizers, and their applications.