### Table of Contents

What is a Language Model in NLP?
Building an N-gram Language Model
Building a Neural Language Model
Natural Language Generation using OpenAI’s GPT-2

### What is a Language Model in NLP?

A language model learns to predict the probability of a sequence of words. But why do we need to learn the probability of words? Let’s understand that with an example.

I’m sure you have used Google Translate at some point. We all use it to translate one language to another for varying reasons. This is an example of a popular NLP application called Machine Translation.

In Machine Translation, you take in a bunch of words from a language and convert these words into another language. Now, there can be many potential translations that a system might give you and you will want to compute the probability of each of these translations to understand which one is the most accurate.

This ability to model the rules of a language as a probability gives great power for NLP related tasks. Language models are used in speech recognition, machine translation, part-of-speech tagging, parsing, Optical Character Recognition, handwriting recognition, information retrieval, and many other daily tasks.

### Types of Language Models
There are primarily two types of Language Models:

Statistical Language Models: These models use traditional statistical techniques like N-grams, Hidden Markov Models (HMM) and certain linguistic rules to learn the probability distribution of words
Neural Language Models: These are new players in the NLP town and have surpassed the statistical language models in their effectiveness. They use different kinds of Neural Networks to model language
Now that you have a pretty good idea about Language Models, let’s start building one!



### What are N-grams (unigram, bigram, trigrams)?
An N-gram is a sequence of N tokens (or words).

Let’s understand N-gram with an example. Consider the following sentence:

“I love reading blogs about data science on Analytics Vidhya.”

A 1-gram (or unigram) is a one-word sequence. For the above sentence, the unigrams would simply be: “I”, “love”, “reading”, “blogs”, “about”, “data”, “science”, “on”, “Analytics”, “Vidhya”.

A 2-gram (or bigram) is a two-word sequence of words, like “I love”, “love reading”, or “Analytics Vidhya”. And a 3-gram (or trigram) is a three-word sequence of words like “I love reading”, “about data science” or “on Analytics Vidhya”.

Fairly straightforward stuff!

### Building a Neural Language Model

Deep Learning has been shown to perform really well on many NLP tasks like Text Summarization, Machine Translation, etc. and since these tasks are essentially built upon Language Modeling, there has been a tremendous research effort with great results to use Neural Networks for Language Modeling.

We can essentially build two kinds of language models – character level and word level. And even under each category, we can have many subcategories based on the simple fact of how we are framing the learning problem. We will be taking the most straightforward approach – building a character-level language model.

In [8]:
import numpy as np
import pandas as pd
from keras.utils import to_categorical
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import LSTM, Dense, GRU, Embedding
from keras.callbacks import EarlyStopping, ModelCheckpoint

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [11]:
data_text = """The unanimous Declaration of the thirteen united States of America, When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation.

We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.--That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, --That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security.--Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world.

He has refused his Assent to Laws, the most wholesome and necessary for the public good.

He has forbidden his Governors to pass Laws of immediate and pressing importance, unless suspended in their operation till his Assent should be obtained; and when so suspended, he has utterly neglected to attend to them.

He has refused to pass other Laws for the accommodation of large districts of people, unless those people would relinquish the right of Representation in the Legislature, a right inestimable to them and formidable to tyrants only.

He has called together legislative bodies at places unusual, uncomfortable, and distant from the depository of their public Records, for the sole purpose of fatiguing them into compliance with his measures.

He has dissolved Representative Houses repeatedly, for opposing with manly firmness his invasions on the rights of the people.

He has refused for a long time, after such dissolutions, to cause others to be elected; whereby the Legislative powers, incapable of Annihilation, have returned to the People at large for their exercise; the State remaining in the mean time exposed to all the dangers of invasion from without, and convulsions within.

He has endeavoured to prevent the population of these States; for that purpose obstructing the Laws for Naturalization of Foreigners; refusing to pass others to encourage their migrations hither, and raising the conditions of new Appropriations of Lands.

He has obstructed the Administration of Justice, by refusing his Assent to Laws for establishing Judiciary powers.

He has made Judges dependent on his Will alone, for the tenure of their offices, and the amount and payment of their salaries.

He has erected a multitude of New Offices, and sent hither swarms of Officers to harrass our people, and eat out their substance.

He has kept among us, in times of peace, Standing Armies without the Consent of our legislatures.

He has affected to render the Military independent of and superior to the Civil power.

He has combined with others to subject us to a jurisdiction foreign to our constitution, and unacknowledged by our laws; giving his Assent to their Acts of pretended Legislation:

For Quartering large bodies of armed troops among us:

For protecting them, by a mock Trial, from punishment for any Murders which they should commit on the Inhabitants of these States:

For cutting off our Trade with all parts of the world:

For imposing Taxes on us without our Consent:

For depriving us in many cases, of the benefits of Trial by Jury:

For transporting us beyond Seas to be tried for pretended offences

For abolishing the free System of English Laws in a neighbouring Province, establishing therein an Arbitrary government, and enlarging its Boundaries so as to render it at once an example and fit instrument for introducing the same absolute rule into these Colonies:

For taking away our Charters, abolishing our most valuable Laws, and altering fundamentally the Forms of our Governments:

For suspending our own Legislatures, and declaring themselves invested with power to legislate for us in all cases whatsoever.

He has abdicated Government here, by declaring us out of his Protection and waging War against us.

He has plundered our seas, ravaged our Coasts, burnt our towns, and destroyed the lives of our people.

He is at this time transporting large Armies of foreign Mercenaries to compleat the works of death, desolation and tyranny, already begun with circumstances of Cruelty & perfidy scarcely paralleled in the most barbarous ages, and totally unworthy the Head of a civilized nation.

He has constrained our fellow Citizens taken Captive on the high Seas to bear Arms against their Country, to become the executioners of their friends and Brethren, or to fall themselves by their Hands.

He has excited domestic insurrections amongst us, and has endeavoured to bring on the inhabitants of our frontiers, the merciless Indian Savages, whose known rule of warfare, is an undistinguished destruction of all ages, sexes and conditions.

In every stage of these Oppressions We have Petitioned for Redress in the most humble terms: Our repeated Petitions have been answered only by repeated injury. A Prince whose character is thus marked by every act which may define a Tyrant, is unfit to be the ruler of a free people.

Nor have We been wanting in attentions to our Brittish brethren. We have warned them from time to time of attempts by their legislature to extend an unwarrantable jurisdiction over us. We have reminded them of the circumstances of our emigration and settlement here. We have appealed to their native justice and magnanimity, and we have conjured them by the ties of our common kindred to disavow these usurpations, which, would inevitably interrupt our connections and correspondence. They too have been deaf to the voice of justice and of consanguinity. We must, therefore, acquiesce in the necessity, which denounces our Separation, and hold them, as we hold the rest of mankind, Enemies in War, in Peace Friends.

We, therefore, the Representatives of the united States of America, in General Congress, Assembled, appealing to the Supreme Judge of the world for the rectitude of our intentions, do, in the Name, and by Authority of the good People of these Colonies, solemnly publish and declare, That these United Colonies are, and of Right ought to be Free and Independent States; that they are Absolved from all Allegiance to the British Crown, and that all political connection between them and the State of Great Britain, is and ought to be totally dissolved; and that as Free and Independent States, they have full Power to levy War, conclude Peace, contract Alliances, establish Commerce, and to do all other Acts and Things which Independent States may of right do. And for the support of this Declaration, with a firm reliance on the protection of divine Providence, we mutually pledge to each other our Lives, our Fortunes and our sacred Honor."""

### Preprocessing the Text Data
We perform basic text preprocessing since this data does not have much noise. We lower case all the words to maintain uniformity and remove words with length less than 3:

In [12]:
import re

def text_cleaner(text):
    # lower case text
    newString = text.lower()
    newString = re.sub(r"'s\b","",newString)
    # remove punctuations
    newString = re.sub("[^a-zA-Z]", " ", newString) 
    long_words=[]
    # remove short word
    for i in newString.split():
        if len(i)>=3:                  
            long_words.append(i)
    return (" ".join(long_words)).strip()

# preprocess the text
data_new = text_cleaner(data_text)

In [13]:
data_new

'the unanimous declaration the thirteen united states america when the course human events becomes necessary for one people dissolve the political bands which have connected them with another and assume among the powers the earth the separate and equal station which the laws nature and nature god entitle them decent respect the opinions mankind requires that they should declare the causes which impel them the separation hold these truths self evident that all men are created equal that they are endowed their creator with certain unalienable rights that among these are life liberty and the pursuit happiness that secure these rights governments are instituted among men deriving their just powers from the consent the governed that whenever any form government becomes destructive these ends the right the people alter abolish and institute new government laying its foundation such principles and organizing its powers such form them shall seem most likely effect their safety and happiness pr

### Creating Sequences
The way this problem is modeled is we take in 30 characters as context and ask the model to predict the next character. Now, 30 is a number which I got by trial and error and you can experiment with it too. You essentially need enough characters in the input sequence that your model is able to get the context.

In [14]:
def create_seq(text):
    length = 30
    sequences = list()
    for i in range(length, len(text)):
        # select sequence of tokens
        seq = text[i-length:i+1]
        # store
        sequences.append(seq)
    print('Total Sequences: %d' % len(sequences))
    return sequences

# create sequences   
sequences = create_seq(data_new)

Total Sequences: 7052


Encoding Sequences
Once the sequences are generated, the next step is to encode each character. This would give us a sequence of numbers.

In [15]:
chars = sorted(list(set(data_new)))
mapping = dict((c, i) for i, c in enumerate(chars))

def encode_seq(seq):
    sequences = list()
    for line in seq:
        # integer encode line
        encoded_seq = [mapping[char] for char in line]
        # store
        sequences.append(encoded_seq)
    return sequences

# encode the sequences
sequences = encode_seq(sequences)

Create Training and Validation set
Once we are ready with our sequences, we split the data into training and validation splits. This is because while training, I want to keep a track of how good my language model is working with unseen data.

In [16]:
from sklearn.model_selection import train_test_split

# vocabulary size
vocab = len(mapping)
sequences = np.array(sequences)
# create X and y
X, y = sequences[:,:-1], sequences[:,-1]
# one hot encode y
y = to_categorical(y, num_classes=vocab)
# create train and validation sets
X_tr, X_val, y_tr, y_val = train_test_split(X, y, test_size=0.1, random_state=42)

print('Train shape:', X_tr.shape, 'Val shape:', X_val.shape)

Train shape: (6346, 30) Val shape: (706, 30)


In [24]:
X[1]

array([ 8,  5,  0, 21, 14,  1, 14,  9, 13, 15, 21, 19,  0,  4,  5,  3, 12,
        1, 18,  1, 20,  9, 15, 14,  0, 20,  8,  5,  0, 20])

In [25]:
y[1]

array([0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)

### Model Building
Time to build our language model!

I have used the embedding layer of Keras to learn a 50 dimension embedding for each character. This helps the model in understanding complex relationships between characters. I have also used a GRU layer as the base model, which has 150 timesteps. Finally, a Dense layer is used with a softmax activation for prediction.

In [31]:
model = Sequential()
model.add(Embedding(vocab, 50, input_length=30, trainable=True))
model.add(GRU(150, recurrent_dropout=0.1, dropout=0.1))
model.add(Dense(vocab, activation='softmax'))
print(model.summary())

# compile the model
model.compile(loss='categorical_crossentropy', metrics=['acc'], optimizer='adam')
# fit the model
model.fit(X_tr, y_tr, epochs=100, verbose=2, validation_data=(X_val, y_val))

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 30, 50)            1350      
_________________________________________________________________
gru_1 (GRU)                  (None, 150)               90450     
_________________________________________________________________
dense_1 (Dense)              (None, 27)                4077      
Total params: 95,877
Trainable params: 95,877
Non-trainable params: 0
_________________________________________________________________
None

Train on 6346 samples, validate on 706 samples
Epoch 1/100
 - 7s - loss: 2.7812 - acc: 0.1929 - val_loss: 2.4556 - val_acc: 0.2946
Epoch 2/100
 - 9s - loss: 2.3033 - acc: 0.3227 - val_loss: 2.2486 - val_acc: 0.3258
Epoch 3/100
 - 7s - loss: 2.1710 - acc: 0.3520 - val_loss: 2.1506 - val_acc: 0.3782
Epoch 4/100
 - 6s - loss: 2.0807 - acc: 0.3779 - val_loss: 2.0950 - val_acc

Epoch 86/100
 - 6s - loss: 0.4148 - acc: 0.8613 - val_loss: 2.8718 - val_acc: 0.5028
Epoch 87/100
 - 6s - loss: 0.3844 - acc: 0.8716 - val_loss: 2.8764 - val_acc: 0.5000
Epoch 88/100
 - 6s - loss: 0.3846 - acc: 0.8695 - val_loss: 2.8812 - val_acc: 0.4887
Epoch 89/100
 - 6s - loss: 0.3857 - acc: 0.8739 - val_loss: 2.8959 - val_acc: 0.4929
Epoch 90/100
 - 6s - loss: 0.3862 - acc: 0.8744 - val_loss: 2.9326 - val_acc: 0.4873
Epoch 91/100
 - 6s - loss: 0.3860 - acc: 0.8749 - val_loss: 2.9254 - val_acc: 0.4972
Epoch 92/100
 - 5s - loss: 0.3782 - acc: 0.8755 - val_loss: 2.9484 - val_acc: 0.4858
Epoch 93/100
 - 5s - loss: 0.3951 - acc: 0.8673 - val_loss: 2.9967 - val_acc: 0.4858
Epoch 94/100
 - 5s - loss: 0.3822 - acc: 0.8728 - val_loss: 2.9493 - val_acc: 0.5057
Epoch 95/100
 - 6s - loss: 0.3806 - acc: 0.8713 - val_loss: 2.9799 - val_acc: 0.5028
Epoch 96/100
 - 6s - loss: 0.3807 - acc: 0.8702 - val_loss: 2.9688 - val_acc: 0.4858
Epoch 97/100
 - 6s - loss: 0.3687 - acc: 0.8782 - val_loss: 2.977

<keras.callbacks.callbacks.History at 0x165e5460188>

### Inference
Once the model has finished training, we can generate text from the model given an input sequence using the below code:

In [32]:
# generate a sequence of characters with a language model
def generate_seq(model, mapping, seq_length, seed_text, n_chars):
	in_text = seed_text
	# generate a fixed number of characters
	for _ in range(n_chars):
		# encode the characters as integers
		encoded = [mapping[char] for char in in_text]
		# truncate sequences to a fixed length
		encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')
		# predict character
		yhat = model.predict_classes(encoded, verbose=0)
		# reverse map integer to character
		out_char = ''
		for char, index in mapping.items():
			if index == yhat:
				out_char = char
				break
		# append to input
		in_text += char
	return in_text

In [38]:
generate_seq(model, mapping, 30, 'will future', 20)

'will future sufferable than ref'

### Natural Language Generation using OpenAI’s GPT-2
We have so far trained our own models to generate text, be it predicting the next word or generating some text with starting words. But that is just scratching the surface of what language models are capable of!

Leading research labs have trained much more complex language models on humongous datasets that have led to some of the biggest breakthroughs in the field of Natural Language Processing.

### About PyTorch-Transformers
Before we can start using GPT-2, let’s know a bit about the PyTorch-Transformers library. We will be using this library we will use to load the pre-trained models.

PyTorch-Transformers provides state-of-the-art pre-trained models for Natural Language Processing (NLP).

Most of the State-of-the-Art models require tons of training data and days of training on expensive GPU hardware which is something only the big technology companies and research labs can afford. But by using PyTorch-Transformers, now anyone can utilize the power of State-of-the-Art models!

### Sentence completion using GPT-2
language model

Let’s build our own sentence completion model using GPT-2. We’ll try to predict the next word in the sentence:

“what is the best book  _________”

I chose this example because this is the first suggestion that Google’s text completion gives. Here is the code for doing the same:

In [41]:
import torch
from pytorch_transformers import GPT2Tokenizer, GPT2LMHeadModel

# Load pre-trained model tokenizer (vocabulary)
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# Encode a text inputs
text = "What is the best "
indexed_tokens = tokenizer.encode(text)

# Convert indexed tokens in a PyTorch tensor
tokens_tensor = torch.tensor([indexed_tokens])

# Load pre-trained model (weights)
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Set the model in evaluation mode to deactivate the DropOut modules
model.eval()

# If you have a GPU, put everything on cuda
tokens_tensor = tokens_tensor.to('cuda')
model.to('cuda')

# Predict all tokens
with torch.no_grad():
    outputs = model(tokens_tensor)
    predictions = outputs[0]

# Get the predicted next sub-word
predicted_index = torch.argmax(predictions[0, -1, :]).item()
predicted_text = tokenizer.decode(indexed_tokens + [predicted_index])

# Print the predicted word
print(predicted_text)


  0%|                                                                                     | 0/548118077 [00:00<?, ?B/s][A
  0%|                                                                     | 1024/548118077 [00:00<21:43:19, 7009.24B/s][A
  0%|                                                                    | 17408/548118077 [00:00<15:36:06, 9758.50B/s][A
  0%|                                                                   | 34816/548118077 [00:00<11:15:14, 13528.17B/s][A
  0%|                                                                    | 69632/548118077 [00:00<8:05:16, 18822.89B/s][A
  0%|                                                                    | 87040/548118077 [00:00<6:01:54, 25238.04B/s][A
  0%|                                                                   | 104448/548118077 [00:00<4:35:33, 33145.02B/s][A
  0%|                                                                   | 121856/548118077 [00:01<3:35:21, 42408.57B/s][A
  0%|          

  1%|▍                                                                 | 3638272/548118077 [00:09<08:24, 1079816.80B/s][A
  1%|▍                                                                 | 3828736/548118077 [00:09<07:25, 1221671.24B/s][A
  1%|▍                                                                 | 4003840/548118077 [00:09<06:48, 1332426.64B/s][A
  1%|▌                                                                 | 4177920/548118077 [00:09<06:24, 1415108.77B/s][A
  1%|▌                                                                 | 4328448/548118077 [00:09<06:41, 1354935.80B/s][A
  1%|▌                                                                 | 4512768/548118077 [00:09<06:10, 1469154.76B/s][A
  1%|▌                                                                 | 4717568/548118077 [00:09<05:38, 1604567.91B/s][A
  1%|▌                                                                 | 4909056/548118077 [00:09<05:28, 1652219.09B/s][A
  1%|▌          

  7%|████▋                                                            | 39821312/548118077 [00:16<02:03, 4112636.06B/s][A
  7%|████▊                                                            | 40714240/548118077 [00:17<01:44, 4840481.10B/s][A
  8%|████▉                                                            | 41339904/548118077 [00:17<01:57, 4296351.46B/s][A
  8%|████▉                                                            | 41880576/548118077 [00:17<02:04, 4076461.80B/s][A
  8%|█████                                                            | 42368000/548118077 [00:17<02:08, 3941684.66B/s][A
  8%|█████                                                            | 42819584/548118077 [00:17<02:04, 4050346.97B/s][A
  8%|█████▏                                                           | 43335680/548118077 [00:17<02:07, 3960734.56B/s][A
  8%|█████▏                                                           | 43843584/548118077 [00:17<01:59, 4226909.69B/s][A
  8%|█████▎     

 13%|████████▌                                                        | 71735296/548118077 [00:24<01:46, 4488126.98B/s][A
 13%|████████▌                                                        | 72196096/548118077 [00:24<01:50, 4314777.37B/s][A
 13%|████████▌                                                        | 72637440/548118077 [00:24<01:54, 4142966.28B/s][A
 13%|████████▋                                                        | 73072640/548118077 [00:24<01:53, 4199975.15B/s][A
 13%|████████▋                                                        | 73596928/548118077 [00:24<01:53, 4195703.31B/s][A
 14%|████████▊                                                        | 74137600/548118077 [00:25<01:45, 4489675.72B/s][A
 14%|████████▊                                                        | 74595328/548118077 [00:25<01:51, 4241518.09B/s][A
 14%|████████▉                                                        | 75028480/548118077 [00:25<01:51, 4244376.09B/s][A
 14%|████████▉  

 20%|████████████▌                                                   | 107753472/548118077 [00:31<01:06, 6649239.69B/s][A
 20%|████████████▋                                                   | 108426240/548118077 [00:31<01:07, 6554893.40B/s][A
 20%|████████████▊                                                   | 109281280/548118077 [00:31<01:02, 6986046.49B/s][A
 20%|████████████▊                                                   | 109991936/548118077 [00:31<01:03, 6935268.00B/s][A
 20%|████████████▉                                                   | 110694400/548118077 [00:31<01:03, 6911496.73B/s][A
 20%|█████████████                                                   | 111443968/548118077 [00:32<01:02, 7039187.05B/s][A
 20%|█████████████                                                   | 112263168/548118077 [00:32<00:59, 7338810.78B/s][A
 21%|█████████████▏                                                  | 113004544/548118077 [00:32<01:00, 7247608.48B/s][A
 21%|███████████

 27%|█████████████████▌                                              | 150067200/548118077 [00:38<01:17, 5149845.08B/s][A
 27%|█████████████████▌                                              | 150700032/548118077 [00:38<01:14, 5354277.20B/s][A
 28%|█████████████████▋                                              | 151371776/548118077 [00:38<01:09, 5676383.81B/s][A
 28%|█████████████████▋                                              | 151949312/548118077 [00:39<01:14, 5331521.35B/s][A
 28%|█████████████████▊                                              | 152494080/548118077 [00:39<01:15, 5216416.78B/s][A
 28%|█████████████████▉                                              | 153108480/548118077 [00:39<01:12, 5438821.66B/s][A
 28%|█████████████████▉                                              | 153780224/548118077 [00:39<01:08, 5727982.69B/s][A
 28%|██████████████████                                              | 154362880/548118077 [00:39<01:12, 5424716.54B/s][A
 28%|███████████

 35%|██████████████████████▏                                         | 189497344/548118077 [00:45<00:58, 6121675.37B/s][A
 35%|██████████████████████▏                                         | 190110720/548118077 [00:45<00:59, 6018463.69B/s][A
 35%|██████████████████████▎                                         | 190804992/548118077 [00:45<00:57, 6258121.98B/s][A
 35%|██████████████████████▎                                         | 191434752/548118077 [00:46<00:58, 6097266.82B/s][A
 35%|██████████████████████▍                                         | 192151552/548118077 [00:46<00:55, 6379098.88B/s][A
 35%|██████████████████████▌                                         | 192795648/548118077 [00:46<00:55, 6364076.64B/s][A
 35%|██████████████████████▌                                         | 193436672/548118077 [00:46<00:55, 6345753.91B/s][A
 35%|██████████████████████▋                                         | 194084864/548118077 [00:46<00:55, 6353528.26B/s][A
 36%|███████████

 43%|███████████████████████████▍                                    | 234454016/548118077 [00:52<00:45, 6968280.65B/s][A
 43%|███████████████████████████▍                                    | 235159552/548118077 [00:52<00:44, 6993748.47B/s][A
 43%|███████████████████████████▌                                    | 235945984/548118077 [00:52<00:43, 7193626.64B/s][A
 43%|███████████████████████████▋                                    | 236667904/548118077 [00:53<00:44, 7030683.83B/s][A
 43%|███████████████████████████▋                                    | 237373440/548118077 [00:53<00:45, 6813406.46B/s][A
 43%|███████████████████████████▊                                    | 238223360/548118077 [00:53<00:42, 7224005.78B/s][A
 44%|███████████████████████████▉                                    | 238955520/548118077 [00:53<00:42, 7208001.47B/s][A
 44%|███████████████████████████▉                                    | 239683584/548118077 [00:53<00:45, 6761973.56B/s][A
 44%|███████████

 50%|████████████████████████████████▏                               | 276092928/548118077 [01:00<01:22, 3300895.82B/s][A
 50%|████████████████████████████████▎                               | 276496384/548118077 [01:00<01:18, 3466556.75B/s][A
 51%|████████████████████████████████▎                               | 276850688/548118077 [01:00<01:17, 3487618.36B/s][A
 51%|████████████████████████████████▎                               | 277204992/548118077 [01:00<01:21, 3312453.48B/s][A
 51%|████████████████████████████████▍                               | 277544960/548118077 [01:00<01:21, 3324499.34B/s][A
 51%|████████████████████████████████▍                               | 277987328/548118077 [01:00<01:16, 3535774.35B/s][A
 51%|████████████████████████████████▌                               | 278347776/548118077 [01:01<01:16, 3519094.74B/s][A
 51%|████████████████████████████████▌                               | 278705152/548118077 [01:01<01:19, 3402903.83B/s][A
 51%|███████████

 55%|███████████████████████████████████▎                            | 302688256/548118077 [01:07<01:05, 3750967.58B/s][A
 55%|███████████████████████████████████▍                            | 303087616/548118077 [01:07<01:04, 3819473.55B/s][A
 55%|███████████████████████████████████▍                            | 303497216/548118077 [01:07<01:02, 3890039.05B/s][A
 55%|███████████████████████████████████▍                            | 303887360/548118077 [01:07<01:04, 3801846.20B/s][A
 56%|███████████████████████████████████▌                            | 304300032/548118077 [01:08<01:03, 3862020.64B/s][A
 56%|███████████████████████████████████▌                            | 304709632/548118077 [01:08<01:02, 3900954.99B/s][A
 56%|███████████████████████████████████▌                            | 305102848/548118077 [01:08<01:02, 3876426.39B/s][A
 56%|███████████████████████████████████▋                            | 305496064/548118077 [01:08<01:02, 3866947.97B/s][A
 56%|███████████

 62%|███████████████████████████████████████▍                        | 337461248/548118077 [01:14<00:30, 6890871.41B/s][A
 62%|███████████████████████████████████████▌                        | 338313216/548118077 [01:14<00:28, 7305708.94B/s][A
 62%|███████████████████████████████████████▌                        | 339054592/548118077 [01:14<00:29, 7195051.97B/s][A
 62%|███████████████████████████████████████▋                        | 339782656/548118077 [01:14<00:29, 7156066.50B/s][A
 62%|███████████████████████████████████████▊                        | 340639744/548118077 [01:15<00:27, 7484015.62B/s][A
 62%|███████████████████████████████████████▊                        | 341426176/548118077 [01:15<00:27, 7536483.53B/s][A
 62%|███████████████████████████████████████▉                        | 342185984/548118077 [01:15<00:27, 7473433.55B/s][A
 63%|████████████████████████████████████████                        | 343107584/548118077 [01:15<00:25, 7911449.75B/s][A
 63%|███████████

 73%|██████████████████████████████████████████████▌                 | 399016960/548118077 [01:21<00:19, 7631357.47B/s][A
 73%|██████████████████████████████████████████████▋                 | 399802368/548118077 [01:22<00:19, 7674804.05B/s][A
 73%|██████████████████████████████████████████████▊                 | 400588800/548118077 [01:22<00:19, 7706004.62B/s][A
 73%|██████████████████████████████████████████████▊                 | 401360896/548118077 [01:22<00:19, 7686051.20B/s][A
 73%|██████████████████████████████████████████████▉                 | 402130944/548118077 [01:22<00:19, 7464589.22B/s][A
 74%|███████████████████████████████████████████████                 | 403079168/548118077 [01:22<00:18, 7968008.04B/s][A
 74%|███████████████████████████████████████████████▏                | 403887104/548118077 [01:22<00:18, 7891697.91B/s][A
 74%|███████████████████████████████████████████████▎                | 404684800/548118077 [01:22<00:18, 7626988.52B/s][A
 74%|███████████

 81%|███████████████████████████████████████████████████▋            | 442351616/548118077 [01:29<00:21, 4892205.74B/s][A
 81%|███████████████████████████████████████████████████▋            | 442892288/548118077 [01:29<00:20, 5030246.57B/s][A
 81%|███████████████████████████████████████████████████▊            | 443404288/548118077 [01:29<00:22, 4753968.32B/s][A
 81%|███████████████████████████████████████████████████▊            | 443924480/548118077 [01:29<00:21, 4853954.45B/s][A
 81%|███████████████████████████████████████████████████▉            | 444530688/548118077 [01:29<00:20, 5128996.70B/s][A
 81%|███████████████████████████████████████████████████▉            | 445052928/548118077 [01:30<00:20, 4982709.11B/s][A
 81%|████████████████████████████████████████████████████            | 445558784/548118077 [01:30<00:21, 4801268.66B/s][A
 81%|████████████████████████████████████████████████████            | 446102528/548118077 [01:30<00:20, 4966780.50B/s][A
 81%|███████████

 87%|███████████████████████████████████████████████████████▋        | 477283328/548118077 [01:36<00:15, 4678659.12B/s][A
 87%|███████████████████████████████████████████████████████▊        | 477760512/548118077 [01:37<00:15, 4579263.00B/s][A
 87%|███████████████████████████████████████████████████████▊        | 478232576/548118077 [01:37<00:15, 4513077.96B/s][A
 87%|███████████████████████████████████████████████████████▉        | 478887936/548118077 [01:37<00:13, 4945823.85B/s][A
 87%|███████████████████████████████████████████████████████▉        | 479398912/548118077 [01:37<00:14, 4751177.46B/s][A
 88%|████████████████████████████████████████████████████████        | 479887360/548118077 [01:37<00:14, 4703366.70B/s][A
 88%|████████████████████████████████████████████████████████        | 480367616/548118077 [01:37<00:14, 4663522.96B/s][A
 88%|████████████████████████████████████████████████████████▏       | 480968704/548118077 [01:37<00:13, 4989124.52B/s][A
 88%|███████████

 93%|███████████████████████████████████████████████████████████▊    | 512393216/548118077 [01:44<00:08, 4425278.69B/s][A
 94%|███████████████████████████████████████████████████████████▉    | 512837632/548118077 [01:44<00:07, 4416304.90B/s][A
 94%|███████████████████████████████████████████████████████████▉    | 513343488/548118077 [01:44<00:07, 4578201.59B/s][A
 94%|███████████████████████████████████████████████████████████▉    | 513804288/548118077 [01:44<00:07, 4523859.93B/s][A
 94%|████████████████████████████████████████████████████████████    | 514258944/548118077 [01:44<00:07, 4407019.45B/s][A
 94%|████████████████████████████████████████████████████████████    | 514801664/548118077 [01:44<00:07, 4668367.90B/s][A
 94%|████████████████████████████████████████████████████████████▏   | 515274752/548118077 [01:44<00:07, 4662450.46B/s][A
 94%|████████████████████████████████████████████████████████████▏   | 515768320/548118077 [01:45<00:06, 4712709.48B/s][A
 94%|███████████

 99%|███████████████████████████████████████████████████████████████▍| 543737856/548118077 [01:51<00:01, 4311248.05B/s][A
 99%|███████████████████████████████████████████████████████████████▌| 544177152/548118077 [01:51<00:00, 4208199.29B/s][A
 99%|███████████████████████████████████████████████████████████████▌| 544604160/548118077 [01:51<00:00, 4057771.33B/s][A
 99%|███████████████████████████████████████████████████████████████▋| 545144832/548118077 [01:51<00:00, 4365632.90B/s][A
100%|███████████████████████████████████████████████████████████████▋| 545592320/548118077 [01:51<00:00, 4333033.95B/s][A
100%|███████████████████████████████████████████████████████████████▊| 546033664/548118077 [01:52<00:00, 4068797.84B/s][A
100%|███████████████████████████████████████████████████████████████▊| 546488320/548118077 [01:52<00:00, 4104148.21B/s][A
100%|███████████████████████████████████████████████████████████████▊| 547028992/548118077 [01:52<00:00, 4407447.73B/s][A
100%|███████████

 What is the best way


In [46]:
print(len(predictions[0, -1, :]))

50257


In [54]:
torch.argmax(predictions[0, -1, :]).item()

tensor(-85.5222, device='cuda:0')

In [48]:
torch.max(predictions[0, -1, :])

tensor(-85.5222, device='cuda:0')

In [70]:
ll = list([j.item() for j in (-predictions[0, -1, :]).argsort()[:10]])

In [62]:
predictions[0, -1, :][ll[1]]

tensor(-88.4837, device='cuda:0')

In [71]:
for item in ll:
    #item = predictions[0, -1, :][i]
    predicted_text = tokenizer.decode(indexed_tokens + [item])
    print(predicted_text)

 What is the best way
 What is the best place
 What is the best method
 What is the best thing
 What is the best time
 What is the best and
 What is the best part
 What is the best approach
 What is the best strategy
 What is the best advice
