## Step 1: Implementation of decoding algorithms

In [1]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

In [2]:
#seeding
torch.manual_seed(26)

<torch._C.Generator at 0x27782a830f0>

In [3]:
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2", return_dict_in_generate=True)

In [4]:
prompt = "Today I believe we can finally"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
print("Shape of input ids: {}".format(input_ids.shape))

Shape of input ids: torch.Size([1, 6])


#### You must also calculate the likelihood of each output sequence by the log sum of every token logit. 

In [5]:
def compute_log_probability(outputs):
    generated_sequences = outputs.sequences[:, input_ids.shape[-1]:]
    probabs = torch.stack(outputs.scores, dim=1).softmax(-1)
    generated_probabilities = torch.gather(probabs, 2, generated_sequences[:, :, None]).squeeze(-1)
    prob = generated_probabilities.prod(-1)
    log_prob = torch.log(prob)
    return generated_sequences, log_prob.item(), prob.item()

### Greedy Decoding

#### Just take the generated sequences

In [6]:
outputs = model.generate(input_ids, max_length=30, output_scores=True, return_dict_in_generate=True)
gen_sequences = outputs.sequences[:, input_ids.shape[-1]:]
gen_sequences, log_prob, prob = compute_log_probability(outputs)
print("log-likelihood of each output sequence with greedy search: {}".format(log_prob))
print("likelihood of each output sequence with greedy search: {}".format(prob))
print("Decoded Output: {}".format(tokenizer.batch_decode(gen_sequences, skip_special_tokens=True)[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


log-likelihood of each output sequence with greedy search: -33.093475341796875
likelihood of each output sequence with greedy search: 4.243123457843903e-15
Decoded Output:  get to the point where we can make a difference in the lives of the people of the United States of America.



### Beam Search

In [7]:
outputs = model.generate(input_ids, max_length=30, num_beams=3, early_stopping=True, output_scores=True, return_dict_in_generate=True)
gen_sequences, log_prob, prob = compute_log_probability(outputs)
print("log-likelihood of each output sequence with beam search: {}".format(log_prob))
print("likelihood of each output sequence with greedy search: {}".format(prob))
print("Decoded Output: {}".format(tokenizer.batch_decode(gen_sequences, skip_special_tokens=True)[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


log-likelihood of each output sequence with beam search: -82.76826477050781
likelihood of each output sequence with greedy search: 1.1329176570146491e-36
Decoded Output:  get to the point where we can make a difference in the lives of all of our children.

I believe that


### Top-K Sampling

In [8]:
outputs = model.generate(input_ids, do_sample=True, max_length=30, top_k=20, output_scores=True, return_dict_in_generate=True)
gen_sequences, log_prob, prob = compute_log_probability(outputs)
print("log-likelihood of each output sequence with top-k sampling: {}".format(log_prob))
print("likelihood of each output sequence with greedy search: {}".format(prob))
print("Decoded Output: {}".format(tokenizer.batch_decode(gen_sequences, skip_special_tokens=True)[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


log-likelihood of each output sequence with top-k sampling: -40.48450469970703
likelihood of each output sequence with greedy search: 2.6169932833501007e-18
Decoded Output:  make good on our promise, and that we will continue to build on our progress, as the rest of the world does


### Top-p Sampling Nucleas

In [9]:
outputs = model.generate(input_ids, do_sample=True, max_length=30, top_p=0.7, top_k=0, output_scores=True, return_dict_in_generate=True)
gen_sequences, log_prob, prob = compute_log_probability(outputs)
print("log-likelihood of each output sequence with nucleas sampling: {}".format(log_prob))
print("likelihood of each output sequence with greedy search: {}".format(prob))
print("Decoded Output: {}".format(tokenizer.batch_decode(gen_sequences, skip_special_tokens=True)[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


log-likelihood of each output sequence with nucleas sampling: -48.56922912597656
likelihood of each output sequence with greedy search: 8.065893486863909e-22
Decoded Output:  bring the Bush administration back from the brink of chaos," former Bush White House chief of staff Cheryl Mills said. "And
