<a href="https://colab.research.google.com/github/AshutoshKKarna/AshutoshKKarna/blob/main/NLP_using_Transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Installing Transformers library
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/00/92/6153f4912b84ee1ab53ab45663d23e7cf3704161cb5ef18b0c07e207cef2/transformers-4.7.0-py3-none-any.whl (2.5MB)
[K     |████████████████████████████████| 2.5MB 7.5MB/s 
Collecting huggingface-hub==0.0.8
  Downloading https://files.pythonhosted.org/packages/a1/88/7b1e45720ecf59c6c6737ff332f41c955963090a18e72acbcbeac6b25e86/huggingface_hub-0.0.8-py3-none-any.whl
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB)
[K     |████████████████████████████████| 901kB 28.7MB/s 
[?25hCollecting tokenizers<0.11,>=0.10.1
[?25l  Downloading https://files.pythonhosted.org/packages/d4/e2/df3543e8ffdab68f5acc73f613de9c2b155ac47f162e725dcac87c521c11/tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3MB)
[K     |█

In [None]:
import transformers

## 1. Working with Pipelines

Pipeline connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer

### 1. Sentiment-Analysis

By default, this pipeline selects a particular pretrained model that has been fine-tuned for sentiment analysis in English. The model is downloaded and cached when we create the `classifier` object.

3 main steps in this pipeline:
- The text is preprocessed into a format the model can understand.
- The preprocessed inputs are passed to the model.
- The predictions of the model are post-processed, so that we can make sense of them.

In [None]:
from transformers import pipeline

classifier = pipeline('sentiment-analysis')
classifier("With that smile, what agony are you attempting to hide?")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=629.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=267844284.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=48.0, style=ProgressStyle(description_w…




[{'label': 'POSITIVE', 'score': 0.961536705493927}]

In [None]:
# we can pass several sentences too

classifier([
    'I hate this so much!',
    'You are getting mistaken by those lines.'
])

[{'label': 'NEGATIVE', 'score': 0.9994558095932007},
 {'label': 'NEGATIVE', 'score': 0.9983620047569275}]

### 2. Zero-shot classification

Task is to classify texts that haven't been labelled. 

This pipeline allows us to specify which labels to use for the classification, so that we don't have to rely on the labels of the pretrained model.

This pipeline is called `zero-shot` because we don't need to fine-tune the model on the data to use it. It can directly return probability scores for any list of labels we want!

In [None]:
classifier = pipeline('zero-shot-classification')

classifier(
        "Do you think this Transformers library is that useful?",
        candidate_labels=['education','politics','business'])

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1154.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1629486723.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898822.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1355863.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=26.0, style=ProgressStyle(description_w…




{'labels': ['business', 'education', 'politics'],
 'scores': [0.43002849817276, 0.3783530592918396, 0.19161853194236755],
 'sequence': 'Do you think this Transformers library is that useful?'}

In [None]:
classifier(
        "Do you think this Transformers library is that useful?",
        candidate_labels=['school','high-school','comedy','gibberish'])

{'labels': ['comedy', 'school', 'gibberish', 'high-school'],
 'scores': [0.3527127504348755,
  0.25979650020599365,
  0.24319197237491608,
  0.1442987620830536],
 'sequence': 'Do you think this Transformers library is that useful?'}

### 3. Text generation

Task to generate some text. The main idea here is that we provide a prompt and the model will auto-complete it by generating the remaining text. This is similar to the predictive text feature available on phones.Text generation involves randomness.

In [None]:
generator = pipeline('text-generation')
generator('Hey girl friend')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Hey girl friend..." she whispered. They continued along with the others, making off to the exit.\n\nThen the redhead\'s mother, Katelyn, told her that there was some kind of trouble in the city or something.\n\n'}]

We can control how many different sequences are generated with the argument `num_return_sequences` and the total_length of the output text with the argument `max_length`

In [None]:
generator('let the world know that', num_return_sequences=2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'let the world know that this is where the money and all these services are being made. Let everyone who gets a refund of a gift or loan be aware and be sure to apply promptly. Also, the government of the Philippines will make use of all'},
 {'generated_text': 'let the world know that the only way you get to this top level organization… is by having a good idea."\n\nThe man is the man most interested in making money… He does not work for an investment firm, but is working for something'}]

### 4. Fill-Mask

The task is to fill in the blanks in a given text

In [None]:
unmasker = pipeline('fill-mask')
unmasker('We can get-along pretty <mask> if we start slow and see how it goes.',
        top_k=2)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=480.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=331070498.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898823.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1355863.0, style=ProgressStyle(descript…




[{'score': 0.37884223461151123,
  'sequence': 'We can get-along pretty quickly if we start slow and see how it goes.',
  'token': 1335,
  'token_str': ' quickly'},
 {'score': 0.19452127814292908,
  'sequence': 'We can get-along pretty fast if we start slow and see how it goes.',
  'token': 1769,
  'token_str': ' fast'}]

The `top_k` argument controls how many possibilities we want to be displayed.

Note that here the model fills in the special `<mask>` word, which is often referred to as a <i>mask token</i>. Other mask-filling models might have different mask tokens, so it's always good to verify the proper mask word when exploring other models.

### 5. Named entity recognition

NER is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations.

In [None]:
ner = pipeline('ner', grouped_entities=True)
ner('My name is Karna, and I live in Gurgaon, India. I work at MS')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=998.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1334448817.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=60.0, style=ProgressStyle(description_w…




  f'`grouped_entities` is deprecated and will be removed in version v5.0.0, defaulted to `aggregation_strategy="{aggregation_strategy}"` instead.'


[{'end': 16,
  'entity_group': 'PER',
  'score': 0.9981767,
  'start': 11,
  'word': 'Karna'},
 {'end': 39,
  'entity_group': 'LOC',
  'score': 0.9977821,
  'start': 32,
  'word': 'Gurgaon'},
 {'end': 46,
  'entity_group': 'LOC',
  'score': 0.99955285,
  'start': 41,
  'word': 'India'},
 {'end': 60,
  'entity_group': 'ORG',
  'score': 0.97967917,
  'start': 58,
  'word': 'MS'}]

In [None]:
ner('I have started working at Coca Cola India at its office in Bangalore')

[{'end': 41,
  'entity_group': 'ORG',
  'score': 0.9988556,
  'start': 26,
  'word': 'Coca Cola India'},
 {'end': 68,
  'entity_group': 'LOC',
  'score': 0.9986413,
  'start': 59,
  'word': 'Bangalore'}]

We pass the `grouped_entities=True` in the pipeline creation function to tell the pipeline to regroup together the parts of the sentence that correspond to the same entity.


### 6. Question Answering

Task is to answer questions using information from a given context. This works by extracting information from the provided context; it does not generate the answer

In [None]:
question_answerer = pipeline('question-answering')

question_answerer(
                question='Where do I work?',
                context='I have started working at Google India recently.')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=473.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=260793700.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435797.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=29.0, style=ProgressStyle(description_w…




{'answer': 'Google India', 'end': 38, 'score': 0.9411473870277405, 'start': 26}

### 7. Summarization

task of reducing a text into a shorter text while keeping all or most of the important aspects referenced in the text.

In [None]:
summarizer = pipeline("summarization")
summarizer("""
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
""")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1802.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1222317369.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898822.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=26.0, style=ProgressStyle(description_w…




To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)


[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil,    electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India continue to encourage and advance the teaching of engineering .'}]

##### The above pipelines are mostly for demo purposes. They were programmed for specific tasks and cannot perform variations of them.

## Bias and limitations

In [None]:
# an example to showcase limitations of a model

unmasker = pipeline('fill-mask', model='bert-base-uncased')
result = unmasker('This man works as a [MASK].')
print([r['token_str'] for r in result])

result = unmasker('This woman works as a [MASK].')
print([r['token_str'] for r in result])

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=570.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=440473133.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=466062.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=28.0, style=ProgressStyle(description_w…


['carpenter', 'lawyer', 'farmer', 'businessman', 'doctor']
['nurse', 'maid', 'teacher', 'waitress', 'prostitute']


- When asked to fill in the missing word in these 2 sentences, the model gives only one gender-free answer(waiter/waitress). The others are work occupations usually associated with one specific gender.
- and yes, prostitute ended up in the top 5 possibilities the model associates with 'woman' and 'work'

When we use these tools, we should therefore keep in the back of our mind that the original model we are using could very easily generate sexist, racist, or homophobic content. Fine-tuning the model on our data won't make this intrinsic bias disappear.

# 2. Behind the Pipeline

## 2.1 Preprocessing with a tokenizer

Preprocessing includes splitting the input into words, subwords, or symbols. Mapping of each token to an integer. And, adding additional inputs that may be useful to the model.

All this preprocessing needs to be done in exactly the same way as when the model was pretrained. To do this, we use the `Auto-Tokenizer` class and its `from_pretrained` method. Using the checkpoint name of our model, it will automatically fetch the data associated with the model's tokenizer and cache it.

Since the default checkpoint of the `sentiment-analysis` pipeline is `distilbert-base-uncased-finetuned-sst-2-english` we can:

In [None]:
from transformers import AutoTokenizer

checkpoint = 'distilbert-base-uncased-finetuned-sst-2-english'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

Once we have the tokenizer, we can directly pass our sentences to it and we'll get back a dict that's ready to feed to our model! The only thing left to do is to convert the list of input IDs to tensors. 

Transformer models only accept <i>tensors</i> as input. To specify the type of tensors we want to get back (PyTorch, TensorFlow, or plain NumPy), we use the `return_tensors` argument

In [None]:
raw_inputs = [
              'I have been waiting to meet Elon Musk my whole life',
              'I hate this so much!'
]

inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors='tf' )
print(inputs)

{'input_ids': <tf.Tensor: shape=(2, 15), dtype=int32, numpy=
array([[  101,  1045,  2031,  2042,  3403,  2000,  3113,  3449,  2239,
        14163,  6711,  2026,  2878,  2166,   102],
       [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,
            0,     0,     0,     0,     0,     0]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(2, 15), dtype=int32, numpy=
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>}


The output itself is a dict containing two keys, `input_ids` and `attention_mask`.

## 2.2 Going through the model

Transformers provides an `TFAutoModel` class which also has a `from_pretrained` method

In [None]:
from transformers import TFAutoModel

checkpoint = 'distilbert-base-uncased-finetuned-sst-2-english'
model = TFAutoModel.from_pretrained(checkpoint)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=267949840.0, style=ProgressStyle(descri…




Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertModel: ['dropout_19', 'pre_classifier', 'classifier']
- This IS expected if you are initializing TFDistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFDistilBertModel were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertModel for predictions without further training.


This architecture contains only the base Transformer module: given some inputs, it outputs what we'll call `hidden states` also known as `features`. For each model input, we'll retrieve a high dimensional vector representing the <b>contextual understanding of that input by the Transformer model.</b>

The high-dimensional vector output by the Transformer module is usually large. It generally has three components:
- Batch size (the number of sequences processed at a time, 2 in this example)
- Sequence length (the length of the numerical representation of the seq (15 in this example)
- Hidden size (the vector dimension of each model input)

In [None]:
# let us see the dimension

outputs = model(inputs)
print(outputs.last_hidden_state.shape)

(2, 15, 768)


## 2.3 Model heads

In [None]:
# for our example we need a model with seq classification head, hence we dont use TFAutoModel class

from transformers import TFAutoModelForSequenceClassification

checkpoint = 'distilbert-base-uncased-finetuned-sst-2-english'
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(inputs)

Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english and are newly initialized: ['dropout_38']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
print(outputs.logits.shape)

(2, 2)


Since we have just two sentences and two labels, the result from our model is of shape 2x2

## 2.4 Postprocessing the output

In [None]:
print(outputs.logits)

tf.Tensor(
[[-3.287958   3.457257 ]
 [ 4.1692314 -3.3464472]], shape=(2, 2), dtype=float32)


In [None]:
import tensorflow as tf

predictions = tf.math.softmax(outputs.logits, axis=-1)
print(predictions)

tf.Tensor(
[[1.1751133e-03 9.9882489e-01]
 [9.9945587e-01 5.4418424e-04]], shape=(2, 2), dtype=float32)


In [None]:
model.config.id2label

{0: 'NEGATIVE', 1: 'POSITIVE'}

#3 Models

## 3.1 Creating a Transformer

Let's work with a BERT model

In [None]:
from transformers import BertConfig, TFBertModel

# Building the config
config = BertConfig()

# Building the model from the config
model = TFBertModel(config)

In [None]:
print(config)

BertConfig {
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.7.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}



### Different loading methods

Creating a model from the default configuration intializes it with random values. The model can be used as such, but it will output gibberish. 

Loading a Transformer model that is already trained is simple - using the `from_pretrained` method

In [None]:
model = TFBertModel.from_pretrained('bert-base-cased')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=570.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=526681800.0, style=ProgressStyle(descri…




Some layers from the model checkpoint at bert-base-cased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-cased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


### Saving methods

In [None]:
model.save_pretrained('directory_here')

## 3.2 Using a Transformer model for inference

In [None]:
sequences = [
             'Hello!',
             'Cool.',
             'Nice!'
]

In [None]:
encoded_sequences = tokenizer(sequences, return_tensors='tf' )
print(encoded_sequences)

{'input_ids': <tf.Tensor: shape=(3, 4), dtype=int32, numpy=
array([[ 101, 7592,  999,  102],
       [ 101, 4658, 1012,  102],
       [ 101, 3835,  999,  102]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(3, 4), dtype=int32, numpy=
array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int32)>}


In [None]:
output = model(encoded_sequences)


In [None]:
print(output)

TFBaseModelOutputWithPooling(last_hidden_state=<tf.Tensor: shape=(3, 4, 768), dtype=float32, numpy=
array([[[ 4.4495684e-01,  4.8276263e-01,  2.7797201e-01, ...,
         -5.4032281e-02,  3.9393449e-01, -9.4770037e-02],
        [ 2.4942881e-01, -4.4092983e-01,  8.1772339e-01, ...,
         -3.1916580e-01,  2.2992201e-01, -4.1171677e-02],
        [ 1.3667591e-01,  2.2517806e-01,  1.4502057e-01, ...,
         -4.6914808e-02,  2.8224209e-01,  7.5566083e-02],
        [ 1.1788853e+00,  1.6738535e-01, -1.8187082e-01, ...,
          2.4671350e-01,  1.0440770e+00, -6.1969673e-03]],

       [[ 3.6435843e-01,  3.2464169e-02,  2.0257643e-01, ...,
          6.0109977e-02,  3.2451314e-01, -2.0995550e-02],
        [ 7.1865946e-01, -4.8725188e-01,  5.1740396e-01, ...,
         -4.4011998e-01,  1.4553036e-01, -3.7544712e-02],
        [ 3.3223274e-01, -2.3270920e-01,  9.4876140e-02, ...,
         -2.5268203e-01,  3.2171994e-01,  8.1119360e-04],
        [ 1.2523212e+00,  3.5754323e-01, -5.1321659e-02, .

In [None]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')

sequence = 'Using a Transformer network is simple'
tokens = tokenizer.tokenize(sequence)

print(tokens)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435797.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=29.0, style=ProgressStyle(description_w…


['Using', 'a', 'Trans', '##former', 'network', 'is', 'simple']


In [None]:
ids = tokenizer.convert_tokens_to_ids(tokens)
print(ids)

[7993, 170, 13809, 23763, 2443, 1110, 3014]


In [None]:
decoded_string = tokenizer.decode(ids)
print(decoded_string)

Using a Transformer network is simple


## Handling multiple sequences