In [1]:
import os
os.environ['TRANSFORMERS_CACHE'] = "D:/transformer_cache/"
os.environ['HF_DATASETS_CACHE'] = "D:/transformer_cache/"

# Greedy Search Decoding

#### Importing GPT-2 model

In [2]:
# hide_output
import torch

from transformers import GPT2Tokenizer, GPT2LMHeadModel
#device = "cuda" if torch.cuda.is_available() else "cpu"
device="cpu"
tokenizer = GPT2Tokenizer.from_pretrained('gpt2-xl')
model = GPT2LMHeadModel.from_pretrained('gpt2-xl').to(device)

text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)



#### Implementing Greedy Search Decoding by selecting the token with highest probability next. The choices and their corresponding probabilities at each time step are provided in the output.

In [4]:
# hide_output
import pandas as pd

input_txt = "What is the opposite of front?"
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"].to(device)
iterations = []
n_steps = 8
choices_per_step = 5

with torch.no_grad():
    for _ in range(n_steps):
        iteration = dict()
        iteration["Input"] = tokenizer.decode(input_ids[0])
        output = model(input_ids=input_ids)
        # Select logits of the first batch and the last token and apply softmax
        #print(output)
        next_token_logits = output.logits[0, -1, :]
        next_token_probs = torch.softmax(next_token_logits, dim=-1)
        sorted_ids = torch.argsort(next_token_probs, dim=-1, descending=True)
        # Store tokens with highest probabilities
        for choice_idx in range(choices_per_step):
            token_id = sorted_ids[choice_idx]
            token_prob = next_token_probs[token_id].cpu().numpy()
            token_choice = (
                f"{tokenizer.decode(token_id)} ({100 * token_prob:.2f}%)"
            )
            iteration[f"Choice {choice_idx+1}"] = token_choice
        # Append predicted next token to input
        input_ids = torch.cat([input_ids, sorted_ids[None, 0, None]], dim=-1)
        iterations.append(iteration)
        
pd.DataFrame(iterations)

Unnamed: 0,Input,Choice 1,Choice 2,Choice 3,Choice 4,Choice 5
0,What is the opposite of front?,\n (40.87%),Back (4.89%),Front (3.82%),The (3.43%),What (3.36%)
1,What is the opposite of front?\n,\n (99.05%),The (0.09%),I (0.05%),""" (0.03%)",In (0.03%)
2,What is the opposite of front?\n\n,The (13.72%),Front (7.09%),A (6.80%),It (3.29%),""" (2.96%)"
3,What is the opposite of front?\n\nThe,opposite (75.98%),inverse (1.45%),reverse (1.29%),answer (0.72%),other (0.72%)
4,What is the opposite of front?\n\nThe opposite,of (97.09%),is (1.73%),to (0.27%),side (0.10%),", (0.07%)"
5,What is the opposite of front?\n\nThe opposite of,front (85.67%),a (2.85%),the (2.05%),back (1.62%),what (0.46%)
6,What is the opposite of front?\n\nThe opposite...,is (85.00%),means (2.69%),", (2.55%)",? (0.98%),( (0.93%)
7,What is the opposite of front?\n\nThe opposite...,back (48.72%),behind (4.37%),the (3.87%),rear (3.16%),a (2.89%)


#### Use the below cell to specify input_txt and generate text based on the input_txt. Each output line contains a new word added.

In [46]:
# hide_output
import pandas as pd

input_txt = "who is the prime minister of india?"
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"].to(device)
iterations = []
n_steps = 30
choices_per_step = 5

with torch.no_grad():
    for _ in range(n_steps):
        iteration = dict()
        iteration["Input"] = tokenizer.decode(input_ids[0])
        print(iteration)
        output = model(input_ids=input_ids)
        # Select logits of the first batch and the last token and apply softmax
        #print(output)
        next_token_logits = output.logits[0, -1, :]
        next_token_probs = torch.softmax(next_token_logits, dim=-1)
        #print(next_token_probs.shape)
        sorted_ids = torch.argsort(next_token_probs, dim=-1, descending=True)
        
        # Store tokens with highest probabilities
        
        # Append predicted next token to input
        input_ids = torch.cat([input_ids, sorted_ids[None, 0, None]], dim=-1)
        iterations.append(iteration)

#print(iteration)
#pd.DataFrame(iterations)

{'Input': 'who is the prime minister of india?'}
{'Input': 'who is the prime minister of india?\n'}
{'Input': 'who is the prime minister of india?\n\n'}
{'Input': 'who is the prime minister of india?\n\nThe'}
{'Input': 'who is the prime minister of india?\n\nThe answer'}
{'Input': 'who is the prime minister of india?\n\nThe answer is'}
{'Input': 'who is the prime minister of india?\n\nThe answer is that'}
{'Input': 'who is the prime minister of india?\n\nThe answer is that he'}
{'Input': 'who is the prime minister of india?\n\nThe answer is that he is'}
{'Input': 'who is the prime minister of india?\n\nThe answer is that he is the'}
{'Input': 'who is the prime minister of india?\n\nThe answer is that he is the prime'}
{'Input': 'who is the prime minister of india?\n\nThe answer is that he is the prime minister'}
{'Input': 'who is the prime minister of india?\n\nThe answer is that he is the prime minister of'}
{'Input': 'who is the prime minister of india?\n\nThe answer is that he is th

#### We specify the maximum sequence length of tokens to generate in max_length. And perform greedy decoding but using the inbuilt model.generate()

In [5]:
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"].to(device)
output = model.generate(input_ids, max_new_tokens=n_steps, do_sample=False)
print(tokenizer.decode(output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


What is the opposite of front?

The opposite of front is back


#### Specifying a larger value for max_length and a larger input_txt

In [6]:
max_length = 128
input_txt = """Large language models (LLMs) are machine learning models that can comprehend \
and generate human language text. They work by analyzing massive data sets of language.\n\n
"""
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"].to(device)
output_greedy = model.generate(input_ids, max_length=max_length, 
                               do_sample=False)
print(tokenizer.decode(output_greedy[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Large language models (LLMs) are machine learning models that can comprehend and generate human language text. They work by analyzing massive data sets of language.


The most popular language models are:

Deep Learning

Deep Neural Networks

Deep Reinforcement Learning

Deep Learning for Natural Language Processing

Deep Learning for Speech Recognition

Deep Learning for Image Recognition

Deep Learning for Computer Vision

Deep Learning for Natural Language Processing

Deep Learning for Speech Recognition

Deep Learning for Image Recognition

Deep Learning for Computer Vision

Deep Learning for Natural Language Processing

Deep Learning


#### We notice that there is repetition in generated sentences, which is a drawback.

# Beam Search Decoding

#### We compare the decoding strategies using the log probability of the entire sentence. Higher the log probability, higher is the probability of the generated sentence based on the input provided. Hence, a higher score implies a better decoding strategy.

### Greedy Search

In [7]:
import torch.nn.functional as F

def log_probs_from_logits(logits, labels):
    logp = F.log_softmax(logits, dim=-1)
    logp_label = torch.gather(logp, 2, labels.unsqueeze(2)).squeeze(-1)
    return logp_label
     

def sequence_logprob(model, labels, input_len=0):
    with torch.no_grad():
        output = model(labels)
        log_probs = log_probs_from_logits(
            output.logits[:, :-1, :], labels[:, 1:])
        seq_log_prob = torch.sum(log_probs[:, input_len:])
    return seq_log_prob.cpu().numpy()
     

logp = sequence_logprob(model, output_greedy, input_len=len(input_ids[0]))
print(tokenizer.decode(output_greedy[0]))
print(f"\nlog-prob: {logp:.2f}")

Large language models (LLMs) are machine learning models that can comprehend and generate human language text. They work by analyzing massive data sets of language.


The most popular language models are:

Deep Learning

Deep Neural Networks

Deep Reinforcement Learning

Deep Learning for Natural Language Processing

Deep Learning for Speech Recognition

Deep Learning for Image Recognition

Deep Learning for Computer Vision

Deep Learning for Natural Language Processing

Deep Learning for Speech Recognition

Deep Learning for Image Recognition

Deep Learning for Computer Vision

Deep Learning for Natural Language Processing

Deep Learning

log-prob: -56.19


### Beam search with 5 beams

In [8]:
output_beam = model.generate(input_ids, max_length=max_length, num_beams=5, 
                             do_sample=False)
logp = sequence_logprob(model, output_beam, input_len=len(input_ids[0]))
print(tokenizer.decode(output_beam[0]))
print(f"\nlog-prob: {logp:.2f}")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Large language models (LLMs) are machine learning models that can comprehend and generate human language text. They work by analyzing massive data sets of language.


In this tutorial, you will learn how to create a language model using Python and Keras. You will also learn how to train a language model using Python and Keras.


You will learn:

- How to create a language model using Python and Keras

- How to train a language model using Python and Keras

- How to visualize the results of training a language model using Python and Keras

- How to visualize the results of

log-prob: -57.01


##### We notice that beam search improves the log prob score. However, it is more time consuming.

### Beam search with #beams = 5 and no_repeat_ngram_size=2
no_repeat_ngram size sets the next token probability to 0 if it causes a repeation of no_repeat_ngram_size ngrams. It is used to avoid repeatitions.

In [9]:
output_beam = model.generate(input_ids, max_length=max_length, num_beams=5, 
                             do_sample=False, no_repeat_ngram_size=2)
logp = sequence_logprob(model, output_beam, input_len=len(input_ids[0]))
print(tokenizer.decode(output_beam[0]))
print(f"\nlog-prob: {logp:.2f}")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Large language models (LLMs) are machine learning models that can comprehend and generate human language text. They work by analyzing massive data sets of language.


In this tutorial, you will learn how to create a simple language model using Python and Keras. You will be able to use this model to generate text in a variety of languages, such as English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, and more.

This tutorial is part of our Machine Learning for Beginners series. Check out the other tutorials in the series:<|endoftext|>

log-prob: -79.36


#### We notice a reduction in log prob score when using no repeat ngram. However, we avoid repetitions. Thus, no repeat ngram feature allows us to apply a trade off between high probability tokens and repetitions

# Sampling Methods

#### We set Temperature T=2 and top_k=0(effectively removing top_k)

In [10]:
output_temp = model.generate(input_ids, max_length=max_length, do_sample=True,
temperature=2.0, top_k=0)
print(tokenizer.decode(output_temp[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Large language models (LLMs) are machine learning models that can comprehend and generate human language text. They work by analyzing massive data sets of language.


special design asteroid Red Earth indicating night bursts Library - consoneness SeedModel inserting unnoticed Candidate Atomic tit motivites International target carriage Landing photon headphonepill discrimination sample confidentELL lie computing prose Exhibition preliminary original opaque linition letters Cl upset time tongue accident meter diagnosesVectorues intelligenceв broken exhibiting Ce drama None CthulhuGate NASL Tesla's Calhar Cycle 389 relying UNC Predict lampatel NVIDIA differential waste ConsentDet Yuk range affirmed paperwork2020 alleged Costco782 CalculOil instead diamond DjȆ 30 Items es_


#### T=2.0 did not produce any meaningful text

In [11]:
output_temp = model.generate(input_ids, max_length=max_length, do_sample=True,
temperature=0.5, top_k=0)
print(tokenizer.decode(output_temp[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Large language models (LLMs) are machine learning models that can comprehend and generate human language text. They work by analyzing massive data sets of language.


The use of a Linguistic Model is a technique to automatically learn a language. It is a very powerful technique for teaching machine learning techniques to a large number of people.


The use of a Linguistic Model is a technique to automatically learn a language. It is a very powerful technique for teaching machine learning techniques to a large number of people. Language Models are used to:

Define a language model.

Define a grammar.

Test


#### T=0.5 produced more readable text but has repetitions

### Top k and Nucleus Sampling (Top p)

#### We set top_k = 50

In [12]:
output_topk = model.generate(input_ids, max_length=max_length, do_sample=True,
top_k=50)
print(tokenizer.decode(output_topk[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Large language models (LLMs) are machine learning models that can comprehend and generate human language text. They work by analyzing massive data sets of language.


Why I'm not involved in the current AI efforts (yet).


I think these advances may be in their early stages. For example, the AI community is a great place to be. There's a community of highly passionate, creative people, working around the clock on solving really hard problems. It's a great way to understand the complex systems found in the natural world, while being entertained. There's lots of great technical and non-technical knowledge in the community. And


#### The top_k text is readable too but a random question appears in the 2nd paragraph

#### We set top_p=0.90

In [15]:
output_topp = model.generate(input_ids, max_length=max_length, do_sample=True,
top_p=0.90)
print(tokenizer.decode(output_topp[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Large language models (LLMs) are machine learning models that can comprehend and generate human language text. They work by analyzing massive data sets of language.


The research on the language models was led by researchers from Stanford, Cornell, and the University of North Carolina at Chapel Hill.


A major challenge for speech recognition software, in particular, is that the data to train the model on is massive, said co-author Michaela Hartung of the University of North Carolina. "When a person says 'I have an apple', that's a lot of words, and the amount of data for a simple sentence is very large,"


#### The top_p generated text is the best so far