<a href="https://colab.research.google.com/github/VanessaSchenkel/how_to/blob/main/how_to_use_transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install transformers[torch]
!pip install transformers[sentencepiece]

### Generate
The class exposes generate(), which can be used for:

* greedy decoding by calling greedy_search() if num_beams=1 and do_sample=False.
multinomial sampling by calling sample() if num_beams=1 and do_sample=True.
* beam-search decoding by calling beam_search() if num_beams>1 and do_sample=False.
* beam-search multinomial sampling by calling beam_sample() if num_beams>1 and do_sample=True.
* diverse beam-search decoding by calling group_beam_search(), if num_beams>1 and num_beam_groups>1.
* constrained beam-search decoding by calling constrained_beam_search(), if constraints!=None or force_words_ids!=None.


In [46]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = 'VanessaSchenkel/pt-unicamp-news-t5'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

sentence = "The doctor was tired, she had been busy all morning."
input_ids = tokenizer(sentence, return_tensors="pt").input_ids

outputs = model.generate(input_ids, num_beams=5)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

['A médica estava cansada, estava ocupada de manhã.']

In [47]:
outputs = model.generate(input_ids)
tokenizer.batch_decode(outputs)

['<pad> O médico estava cansado, estava ocupado de manhã.</s>']

**max_length** (int, optional, defaults to model.config.max_length) — The maximum length the generated tokens can have. Corresponds to the length of the input prompt + max_new_tokens. In general, prefer the use of max_new_tokens, which ignores the number of tokens in the prompt.

In [48]:
outputs = model.generate(input_ids, num_beams=5, max_length=5)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

['O médico estava cansa']

**max_new_tokens** (int, optional) — The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.

In [49]:
outputs = model.generate(input_ids, num_beams=5, max_new_tokens=3)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

['O médico estava']

**num_beams** is the number of different possible sequences considered at each generation step (see beam search for more details). This increases computation time but also increases the quality of the generated text.

In [50]:
outputs = model.generate(input_ids, num_beams=50)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

['A médica estava cansada, estava ocupada de manhã.']

In [53]:
outputs = model.generate(input_ids, num_beams=2)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

['A médica estava cansada, estava ocupada de manhã.']

In [54]:
outputs = model.generate(input_ids, num_beams=100, num_return_sequences=3)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava toda manhã ocupada.
2: O médico estava cansado, estava ocupado de manhã.


In [56]:
outputs = model.generate(input_ids, num_beams=3, num_return_sequences=3)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava toda manhã ocupada.
2: O médico estava cansado, estava ocupado de manhã.


**min_length** is the minimum number of tokens that an output text can have. Punctuation counts as a token, and some words may be made up of more than one token, so this should be slightly more than the number of words you want


In [57]:
outputs = model.generate(input_ids, num_beams=5, min_length=20)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

['A médica estava cansada, ela estava ocupada de manhã em todas as manhãs do dia.']

In its most basic form, sampling means randomly picking the next word according to its conditional probability distribution.
In **Top-p** sampling chooses from the smallest possible set of words whose cumulative probability exceeds the probability p. The probability mass is then redistributed among this set of words. This way, the size of the set of words (a.k.a the number of words in the set) can dynamically increase and decrease according to the next word's probability distribution.

**top_k** is only the most probable top_k words are considered for each generation step. This avoids having very improbable words pop up during text generation. How many potential answers are considered when performing sampling.

**do_sample** when is True, picks words based on their conditional probability

In [58]:
# deactivate top_k sampling and sample only from 92% most likely words
outputs = model.generate(input_ids, 
    do_sample=True, 
    max_length=50, 
    top_p=0.92, 
    top_k=0)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

['A médica estava cansada, estava coberta toda a manhã.']

In [75]:
# set top_k = 50 and set top_p = 0.95 and num_return_sequences = 3
output = model.generate(
    input_ids,
    do_sample=True, 
    max_length=50, 
    top_k=50, 
    top_p=0.95, 
    num_return_sequences=3
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(output):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: O médico estava cansado, estava ocupada por toda a manhã.
1: O médico estava cansado, estava preocupada toda manhã.
2: A médica estava cansada, ela tinha ficado ocupada de manhã.


In [60]:
outputs = model.generate(input_ids, num_beams=5, early_stopping=True)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

['A médica estava cansada, estava ocupada de manhã.']

**num_beams** returns the n most probable next words. Number of beams for beam search. 1 means no beam search.


In [64]:
outputs = model.generate(input_ids, num_beams=3, num_return_sequences=3)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava toda manhã ocupada.
2: O médico estava cansado, estava ocupado de manhã.


In [65]:
outputs = model.generate(input_ids, num_beams=100, num_return_sequences=3)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava toda manhã ocupada.
2: O médico estava cansado, estava ocupado de manhã.


**Temperature** is a hyper-parameter used to control the randomness of predictions by scaling the logits before applying softmax.

In [76]:
# use temperature to decrease the sensitivity to low probability candidates
outputs = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=50, 
    top_k=0, 
    temperature=0.1,
    num_return_sequences=3
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: O médico estava cansado, estava ocupado de manhã.
1: O médico estava cansado, estava ocupado de manhã.
2: O médico estava cansado, estava ocupado de manhã.


In [77]:
outputs = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=50, 
    top_k=0, 
    temperature=0.7,
    num_return_sequences=3
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, está todas as manhãs ocupadas.
1: A médica estava cansada, estava ocupada de manhã.
2: O médico estava cansado, ela esteve ocupada durante toda a manhã.


In [78]:
outputs = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=50, 
    top_k=0, 
    temperature=0.9,
    num_return_sequences=3
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, inteirava-se de manhã.
1: O médico estava cansado, estava habituado pela manhã.
2: A médica estava cansada, estava atendia de manhã.


The **repetition_penalty** is meant to avoid sentences that repeat, can be used to penalize words that were already generated or belong to the context.

In [79]:
outputs = model.generate(
    input_ids, 
    repetition_penalty=0.1,
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))



Output:
----------------------------------------------------------------------------------------------------
0: 


In [83]:
outputs = model.generate(
    input_ids, 
    repetition_penalty=0.5,
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))



Output:
----------------------------------------------------------------------------------------------------
0: O médico estava cansado, estava ocupado, de manhã, de manhã, de manhã, de


In [82]:
outputs = model.generate(
    input_ids, 
    repetition_penalty=5.0,
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: O médico estava cansado, tinha sido ocupado de manhã.


In [81]:
outputs = model.generate(
    input_ids, 
    repetition_penalty=10.0,
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: O médico estava cansado, tinha sido ocupado de manhã.


**length_penalty** (float, optional, defaults to model.config.length_penalty or 1.0 if the config does not set any value) — Exponential penalty to the length. 1.0 means that the beam score is penalized by the sequence length. 0.0 means no penalty. Set to values < 0.0 in order to encourage the model to generate longer sequences, to a value > 0.0 in order to encourage the model to produce shorter sequences.

In [91]:
outputs = model.generate(
    input_ids, 
    length_penalty=-10.0,
    max_length=2
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: O


In [90]:
outputs = model.generate(
    input_ids, 
    length_penalty=10.0,
    max_length=5
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: O médico estava cansa


In [89]:
outputs = model.generate(
    input_ids, 
    length_penalty=10.0,
    max_length=2
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: O


**no_repeat_ngram_size** avoids repetition of n_grams (sequence of n consecutive words). This is useful when producing longer texts, as models sometimes repeat themselves : in this case I suggest using a value of 3 or 4 to ensure diversity without hurting performance.

In [94]:
outputs = model.generate(
    input_ids, 
    no_repeat_ngram_size=1,
    num_beams=5,
    num_return_sequences=3
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))



Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, tinha sido ocupada de manhã.
1: O médico estava cansado, tinha sido ocupado de manhã.
2: A médica estava cansada, ela tinha sido ocupada de manhã.


In [96]:
outputs = model.generate(
    input_ids, 
    no_repeat_ngram_size=4,
    num_beams=5,
    num_return_sequences=3
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava toda manhã ocupada.
2: O médico estava cansado, estava ocupado de manhã.


In [95]:
outputs = model.generate(
    input_ids, 
    no_repeat_ngram_size=100,
    num_beams=5,
    num_return_sequences=3
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava toda manhã ocupada.
2: O médico estava cansado, estava ocupado de manhã.


In [103]:
outputs = model.generate(
    input_ids, 
    no_repeat_ngram_size=5,
    num_beams=10,
    num_return_sequences=3,
    do_sample=True
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: O médico estava cansado, estava ocupado de manhã.
2: A médica estava cansada, estava ocupada de manhã.


**encoder_no_repeat_ngram_size** if set to int > 0, all ngrams of that size that occur in the encoder_input_ids cannot occur in the decoder_input_ids.

In [104]:
outputs = model.generate(
    input_ids, 
    num_beams=10,
    num_return_sequences=3,
    encoder_no_repeat_ngram_size=1
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada e ela estava ocupada de manhã em todas as suas atividades de dia para
1: O médico estava cansado e ela estava ocupada de manhã em todas as suas atividades de dia para
2: A médica estava cansada e ela estava ocupada de manhã em todas as suas atividades de dia-


In [105]:
outputs = model.generate(
    input_ids, 
    num_beams=10,
    num_return_sequences=3,
    encoder_no_repeat_ngram_size=10
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava toda manhã ocupada.
2: O médico estava cansado, estava ocupado de manhã.


In [107]:
outputs = model.generate(
    input_ids, 
    num_beams=100,
    num_return_sequences=3,
    encoder_no_repeat_ngram_size=100
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava toda manhã ocupada.
2: O médico estava cansado, estava ocupado de manhã.


**bad_words_ids** is a list of token ids that are not allowed to be generated. In order to get the token ids of the words that should not appear in the generated text, use tokenizer(bad_words, add_prefix_space=True, add_special_tokens=False).input_ids.

In [125]:
bad_words = ["cansado"]

bad_words_ids = tokenizer(bad_words, add_special_tokens=False).input_ids

outputs = model.generate(
    input_ids, 
    bad_words_ids=bad_words_ids,
    num_beams=10,
    num_return_sequences=3,
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava toda manhã ocupada.
2: A médica estava cansada, estava ocupada toda a manhã.


In [126]:
bad_words = ["o", "médico"]

bad_words_ids = tokenizer(bad_words, add_special_tokens=False).input_ids

outputs = model.generate(
    input_ids, 
    bad_words_ids=bad_words_ids,
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: O doutor estava cansado, estava ocupado de manhã.


**force_words_ids** is a list of token ids that must be generated. If given a List[List[int]], this is treated as a simple list of words that must be included, the opposite to bad_words_ids. If given List[List[List[int]]], this triggers a disjunctive constraint, where one can allow different forms of each word.

In [129]:
force_words = ["a", "médica"]

force_words_ids = tokenizer(force_words, add_special_tokens=False).input_ids

outputs = model.generate(
    input_ids, 
    num_beams=10,
    num_return_sequences=3,
    force_words_ids=force_words_ids,
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava toda a manhã ocupada.
1: A médica estava cansada, estava ocupada a toda manhã.
2: A médica estava cansada, ela estava ocupada de manhã em toda a manhã.


**max_time** is the maximum amount of time you allow the computation to run for in seconds. generation will still finish the current pass after allocated time has been passed.


In [130]:
force_words = ["a", "médica"]

force_words_ids = tokenizer(force_words, add_special_tokens=False).input_ids

outputs = model.generate(
    input_ids, 
    num_beams=10,
    num_return_sequences=3,
    force_words_ids=force_words_ids,
    max_time=1.0
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))



Output:
----------------------------------------------------------------------------------------------------
0: Tinha a médica
1: A médica a
2: O médica a


**num_beam_groups** is the number of groups to divide num_beams into in order to ensure diversity among different groups of beams.

In [131]:
outputs = model.generate(
    input_ids, 
    num_beams=10,
    num_return_sequences=3,
    num_beam_groups=2
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

  "Passing `max_length` to BeamSearchScorer is deprecated and has no effect. "


Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava ocupada de manhã.
2: A médica estava cansada, estava toda manhã ocupada.


In [133]:
outputs = model.generate(
    input_ids, 
    num_beams=10,
    num_return_sequences=3,
    num_beam_groups=5
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava ocupada de manhã.
2: A médica estava cansada, estava ocupada de manhã.


In [132]:
outputs = model.generate(
    input_ids, 
    num_beams=10,
    num_return_sequences=3,
    num_beam_groups=10
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: O médico estava cansado, estava ocupado de manhã.
1: O médico estava cansado, estava ocupado de manhã.
2: O médico estava cansado, estava ocupado de manhã.


**diversity_penalty** is a value is subtracted from a beam’s score if it generates a token same as any beam from other group at a particular time. Note that diversity_penalty is only effective if group beam search is enabled.

In [139]:
outputs = model.generate(
    input_ids, 
    num_beams=10,
    num_return_sequences=3,
    num_beam_groups=2,
    diversity_penalty=1.5
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava toda manhã ocupada.
2: O médico estava cansado, estava ocupado de manhã.


In [141]:
outputs = model.generate(
    input_ids, 
    num_beams=10,
    num_return_sequences=3,
    num_beam_groups=2,
    diversity_penalty=50.0
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

  "Passing `max_length` to BeamSearchScorer is deprecated and has no effect. "


Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava toda manhã ocupada.
2: O médico estava cansado, estava ocupado de manhã.


In [143]:
outputs = model.generate(
    input_ids, 
    num_beams=5,
    num_beam_groups = 5,
    num_return_sequences=5,
    diversity_penalty = 0.70
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: O médico estava cansado, estava ocupado de manhã.
1: A médica estava cansada, estava ocupada de manhã.
2: O médico estava cansado, tinha sido ocupado de manhã.
3: O médico estava cansado, estava ocupado de manhã...
4: A médica estava cansada, ela estava ocupada de manhã.


**prefix_allowed_tokens_fn** if provided, this function constraints the beam search to allowed tokens only at each step. If not provided no constraint is applied. This function takes 2 arguments: the batch ID batch_id and input_ids. It has to return a list with the allowed tokens for the next generation step conditioned on the batch ID batch_id and the previously generated tokens inputs_ids. This argument is useful for constrained generation conditioned on the prefix, as described in Autoregressive Entity Retrieval.

**logits_processor** (LogitsProcessorList, optional) — Custom logits processors that complement the default logits processors built from arguments and a model’s config. If a logit processor is passed that is already created with the arguments or a model’s config an error is thrown. This feature is intended for advanced users. renormalize_logits — (bool, optional, defaults to False): Whether to renormalize the logits after applying all the logits processors or warpers (including the custom ones). It’s highly recommended to set this flag to True as the search algorithms suppose the score logits are normalized but some logit processors or warpers break the normalization.

In [146]:
outputs = model.generate(
    input_ids, 
    num_beams=10,
    num_return_sequences=3,
    renormalize_logits=True
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))



Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava toda manhã ocupada.
2: O médico estava cansado, estava ocupado de manhã.


**constraints** is a custom constraints that can be added to the generation to ensure that the output will contain the use of certain tokens as defined by Constraint objects, in the most sensible way possible.

In [149]:
from transformers import PhrasalConstraint

constraints = [
    PhrasalConstraint(
        tokenizer("a médica", add_special_tokens=False).input_ids
    )
]

outputs = model.generate(
    input_ids,
    constraints=constraints,
    num_beams=10,
    num_return_sequences=3,
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã a toda a manhã, e a médica estava
1: A médica estava cansada, estava ocupada de manhã a toda a manhã... a médica
2: A médica estava cansada, estava ocupada de manhã a toda a manhã. A médica a médica


In [150]:
from transformers import PhrasalConstraint

constraints = [
    PhrasalConstraint(
        tokenizer("médica", add_special_tokens=False).input_ids
    )
]

outputs = model.generate(
    input_ids,
    constraints=constraints,
    num_beams=10,
    num_return_sequences=3,
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava toda manhã ocupada.
2: A médica estava cansada, estava ocupada toda a manhã.


In [158]:
from transformers import DisjunctiveConstraint

flexible_phrases = tokenizer(
            ["médica", "médico"], add_special_tokens=False
        ).input_ids

constraints = [DisjunctiveConstraint(flexible_phrases)]

outputs = model.generate(
    input_ids,
    constraints=constraints,
    num_beams=10,
    num_return_sequences=3,
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava toda manhã ocupada.
2: O médico estava cansado, estava ocupado de manhã.


In [159]:
from transformers import DisjunctiveConstraint

flexible_phrases = tokenizer(
            ["médica", "médico", "doutora"], add_special_tokens=False
        ).input_ids

constraints = [DisjunctiveConstraint(flexible_phrases)]

outputs = model.generate(
    input_ids,
    constraints=constraints,
    num_beams=10,
    num_return_sequences=3,
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))



Output:
----------------------------------------------------------------------------------------------------
0: A médica estava cansada, estava ocupada de manhã.
1: A médica estava cansada, estava toda manhã ocupada.
2: O médico estava cansado, estava ocupado de manhã.


**output_scores** is whether or not to return the prediction scores. 

In [170]:
outputs = model.generate(
    input_ids,
    num_beams=10,
    num_return_sequences=3,
    output_scores=True,
    return_dict_in_generate=True
)

for m in outputs:
  print(m)




sequences
sequences_scores
scores
beam_indices


In [172]:
# Beam transition scores for each vocabulary token at each generation step. Beam transition scores consisting of log probabilities of tokens conditioned 
#on log softmax of previously generated tokens in this beam. Tuple of torch.FloatTensor with up to max_new_tokens elements (one element for each generated
# token), with each tensor of shape (batch_size*num_beams, config.vocab_size).
outputs.scores 

(tensor([[-12.5388, -11.3426, -15.5187,  ..., -26.4714, -26.4331, -26.4106],
         [-12.5388, -11.3426, -15.5187,  ..., -26.4714, -26.4331, -26.4106],
         [-12.5388, -11.3426, -15.5187,  ..., -26.4714, -26.4331, -26.4106],
         ...,
         [-12.5388, -11.3426, -15.5187,  ..., -26.4714, -26.4331, -26.4106],
         [-12.5388, -11.3426, -15.5187,  ..., -26.4714, -26.4331, -26.4106],
         [-12.5388, -11.3426, -15.5187,  ..., -26.4714, -26.4331, -26.4106]]),
 tensor([[-13.7348, -11.6968, -18.6643,  ..., -32.3291, -32.2599, -32.2011],
         [-13.7489, -11.4951, -18.1527,  ..., -30.9017, -30.8568, -30.7965],
         [-11.3940,  -8.6737, -14.6180,  ..., -28.1367, -28.0599, -28.1081],
         ...,
         [ -5.9505,  -8.7642, -14.5758,  ..., -26.3377, -26.2707, -26.2382],
         [-13.4717,  -9.3911, -15.2133,  ..., -27.8848, -27.8123, -27.8238],
         [-12.6452, -10.9745, -16.1720,  ..., -29.2740, -29.2042, -29.2129]]),
 tensor([[-11.1814, -11.7901, -16.8562,  ...

In [173]:
outputs.sequences_scores #final beam scores of the generated sequences

tensor([-0.3082, -0.3631, -0.3802])

In [174]:
outputs = model.generate(
    input_ids,
    num_beams=10,
    num_return_sequences=5,
    output_scores=True,
    return_dict_in_generate=True
)

outputs.sequences_scores



tensor([-0.3082, -0.3631, -0.3802, -0.3894, -0.4312])

**return_dict_in_generate** is whether or not to return a ModelOutput instead of a plain tuple.

In [None]:
outputs = model.generate(
    input_ids,
    num_beams=10,
    num_return_sequences=5,
    return_dict_in_generate=True
)


In [181]:
outputs.values

<function BeamSearchEncoderDecoderOutput.values>

### Diverse beam search decoding
By calling group_beam_search(), if num_beams>1 and num_beam_groups>1

In [185]:
from transformers import (
    AutoTokenizer,
    AutoModelForSeq2SeqLM,
    LogitsProcessorList,
    MinLengthLogitsProcessor,
    HammingDiversityLogitsProcessor,
    BeamSearchScorer,
)
import torch

encoder_input_str = "She is a great doctor, but he is a bad patient."
encoder_input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids


# lets run diverse beam search using 6 beams
num_beams = 6
# define decoder start token ids
input_ids = torch.ones((num_beams, 1), device=model.device, dtype=torch.long)
input_ids = input_ids * model.config.decoder_start_token_id

# add encoder_outputs to model keyword arguments
model_kwargs = {
    "encoder_outputs": model.get_encoder()(
        encoder_input_ids.repeat_interleave(num_beams, dim=0), return_dict=True
    )
}

# instantiate beam scorer
beam_scorer = BeamSearchScorer(
    batch_size=1, # Batch Size of input_ids for which standard beam search decoding is run in parallel.
    max_length=model.config.max_length, # The maximum length of the sequence to be generated
    num_beams=num_beams, # Number of beams for beam search.
    device=model.device, # Defines the device type (e.g., "cpu" or "cuda") on which this instance of BeamSearchScorer will be allocated.
    num_beam_groups=3, # Number of groups to divide num_beams into in order to ensure diversity among different groups of beams.
    num_beam_hyps_to_keep=3, # The number of beam hypotheses that shall be returned upon calling finalize.
)

# instantiate logits processors
logits_processor = LogitsProcessorList( # A LogitsProcessor can be used to modify the prediction scores of a language model head for generation.
    [
        HammingDiversityLogitsProcessor(diversity_penalty=5.5, num_beams=6, num_beam_groups=3), # LogitsProcessor that enforces diverse beam search. 
        MinLengthLogitsProcessor(min_length=5, eos_token_id=model.config.eos_token_id), # LogitsProcessor enforcing a min-length by setting EOS probability to 0.
    ]
)

outputs = model.group_beam_search(
    input_ids, beam_scorer, logits_processor=logits_processor, **model_kwargs
)

tokenizer.batch_decode(outputs, skip_special_tokens=True)

['Ela é um grande médico, mas ele é um doente ruim.',
 'Ela é um grande médico, mas ele é um mau paciente.',
 'Ela é um grande médico, mas ele é um mau paciente...']

### Contrained beam search

Generates sequences of token ids for models with a language modeling head using constrained beam search decoding and can be used for text-decoder, text-to-text, speech-to-text, and vision-to-text models.



In [1]:
from transformers import (
    AutoTokenizer,
    AutoModelForSeq2SeqLM,
    LogitsProcessorList,
    MinLengthLogitsProcessor,
    ConstrainedBeamSearchScorer,
    PhrasalConstraint,
)
import torch

tokenizer = AutoTokenizer.from_pretrained("VanessaSchenkel/pt-unicamp-news-t5")
model = AutoModelForSeq2SeqLM.from_pretrained("VanessaSchenkel/pt-unicamp-news-t5")

encoder_input_str = "She is a great doctor, but he is a bad patient."
encoder_input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids


# lets run beam search using 3 beams
num_beams = 3
# define decoder start token ids
input_ids = torch.ones((num_beams, 1), device=model.device, dtype=torch.long)
input_ids = input_ids * model.config.decoder_start_token_id

# add encoder_outputs to model keyword arguments
model_kwargs = {
    "encoder_outputs": model.get_encoder()(
        encoder_input_ids.repeat_interleave(num_beams, dim=0), return_dict=True
    )
}

constraint_str = "paciente"
constraint_token_ids = tokenizer.encode(constraint_str)[:-1]  # slice to remove eos token
constraints = [PhrasalConstraint(token_ids=constraint_token_ids)]


# instantiate beam scorer
beam_scorer = ConstrainedBeamSearchScorer(
    batch_size=1, num_beams=num_beams, device=model.device, constraints=constraints, num_beam_hyps_to_keep=3
)

# instantiate logits processors
logits_processor = LogitsProcessorList(
    [
        MinLengthLogitsProcessor(5, eos_token_id=model.config.eos_token_id),
    ]
)

outputs = model.constrained_beam_search(
    input_ids, beam_scorer, constraints=constraints, logits_processor=logits_processor, **model_kwargs
)

tokenizer.batch_decode(outputs, skip_special_tokens=True)



['Ela é um grande médico, mas ele é um mau paciente.',
 'Ela é um grande médico, mas ele é um paciente ruim.',
 'Ela é um grande médico, mas ele é um mau paciente...']

#### PhrasalConstraint
Constraint enforcing that an ordered sequence of tokens is included in the output.

In [5]:
constraints = [
    PhrasalConstraint(
        tokenizer("grande médica", add_special_tokens=False).input_ids
    )
]

input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids

outputs = model.generate(
    input_ids,
    constraints=constraints,
    num_beams=5,
    num_return_sequences=3,
    remove_invalid_values=True,
)


print("Output:\n" + 100 * '-')

for sentence in outputs:
    print(tokenizer.decode(sentence, skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
Ela é um grande médico, mas ele é um doente ruim.......
Ela é um grande médico, mas ele é um doente ruim...... e
Ela é um grande médico, mas ele é um doente ruim...... grande


#### DisjunctiveConstraint
A special Constraint that is fulfilled by fulfilling just one of several constraints.
Allow the user to input a list of words, whose purpose is to guide the generation such that the final output must contain just at least one among the list of words.

In [12]:
from transformers import DisjunctiveConstraint

encoder_input_str = "She is a great doctor, but he is a bad patient."
encoder_input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids


# lets run beam search using 3 beams
num_beams = 3
# define decoder start token ids
input_ids = torch.ones((num_beams, 1), device=model.device, dtype=torch.long)
input_ids = input_ids * model.config.decoder_start_token_id

# add encoder_outputs to model keyword arguments
model_kwargs = {
    "encoder_outputs": model.get_encoder()(
        encoder_input_ids.repeat_interleave(num_beams, dim=0), return_dict=True
    )
}

constraint_str = ["médico", "médica"]

constraints = [
    DisjunctiveConstraint(
        tokenizer(constraint_str, add_special_tokens=False).input_ids
    )
]

# instantiate beam scorer
beam_scorer = ConstrainedBeamSearchScorer(
    batch_size=1, num_beams=num_beams, device=model.device, constraints=constraints, num_beam_hyps_to_keep=3
)

# instantiate logits processors
logits_processor = LogitsProcessorList(
    [
        MinLengthLogitsProcessor(5, eos_token_id=model.config.eos_token_id),
    ]
)

outputs = model.constrained_beam_search(
    input_ids, beam_scorer, constraints=constraints, logits_processor=logits_processor, **model_kwargs
)

tokenizer.batch_decode(outputs, skip_special_tokens=True)



['Ela é um grande médico, mas ele é um doente ruim.',
 'Ela é um grande médico, mas ele é um mau paciente.',
 'Ela é um grande médico, mas é um doente ruim.']

In [13]:

outputs = model.constrained_beam_search(
    input_ids, beam_scorer, constraints=constraints, logits_processor=logits_processor, return_dict_in_generate=True, output_scores=True, **model_kwargs
)



In [15]:
outputs.sequences_scores

tensor([-0.2165, -0.2186, -0.2516])

In [9]:
force_flexible = ["médica", "doutora"]

force_flexible_ids = [
    tokenizer(force_flexible, add_special_tokens=False).input_ids,
]


outputs = model.generate(
    input_ids,
    force_words_ids=force_flexible_ids,
    num_beams=10,
    num_return_sequences=3,
    no_repeat_ngram_size=1,
    remove_invalid_values=True,
)

print("Output:\n" + 100 * '-')

for sentence in outputs:
    print(tokenizer.decode(sentence, skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
Ela é um grande médico, mas ele tem uma má paciente. A sua maioria não está médica
É um grande médico, mas ele é uma doente ruim. O seu tratamento está a ser médica
Ela é um grande médico, mas ele tem uma má paciente. A sua parte de risco médica


In [17]:
force_flexible = ["médica", "doutora"]

force_flexible_ids = [
    tokenizer(force_flexible, add_special_tokens=False).input_ids,
]


outputs = model.generate(
    input_ids,
    force_words_ids=force_flexible_ids,
    num_beams=5,
    num_return_sequences=3,
    no_repeat_ngram_size=10,
    remove_invalid_values=True,
)

print("Output:\n" + 100 * '-')

for sentence in outputs:
    print(tokenizer.decode(sentence, skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
No No No, a experiência com o
No No No, a experiência com a
No No No, a experiência com doutor
No No No, a experiência com o
No No No, a experiência com a
No No No, a experiência com doutor
No No No, a experiência com o
No No No, a experiência com a
No No No, a experiência com doutor


In [5]:

from math import log
from numpy import array
from numpy import argmax
import numpy as np
import math
import heapq
 
# beam search
def beam_search_decoder(data, k):
    """
    data: (n, m) where n is number of words in sequence.
        and m is number of classes (words in target vocab).
    k: beam search parameter
    """
    sequences = [[[], 0.0]]
    # walk over each step in sequence
    for row in data: # ----> n
        all_candidates = []
        # find the indexes of k largest probabilities in the row
        k_largest = heapq.nlargest(k, range(len(row)), row.take) # -----> m
        # expand each current candidate
        for seq, score in sequences: # ----> k
            for j in k_largest: # -----> k
                s = score - math.log(row[j])
                candidate = [seq + [j], s]
                all_candidates.append(candidate)
        # sort all candidates by score
        ordered = sorted(all_candidates, key=lambda tup:tup[1]) # -----> k log k
        # select best k
        sequences = ordered[:k]
    return sequences
 
# define a sequence of 10 words over a vocab of 5 words
data = [[0.1, 0.2, 0.3, 0.4, 0.5],
		[0.5, 0.4, 0.3, 0.2, 0.1],
		[0.1, 0.2, 0.3, 0.4, 0.5],
		[0.5, 0.4, 0.3, 0.2, 0.1],
		[0.1, 0.2, 0.3, 0.4, 0.5],
		[0.5, 0.4, 0.3, 0.2, 0.1],
		[0.1, 0.2, 0.3, 0.4, 0.5],
		[0.5, 0.4, 0.3, 0.2, 0.1],
		[0.1, 0.2, 0.3, 0.4, 0.5],
		[0.5, 0.4, 0.3, 0.2, 0.1]]
data = array(data)
# decode sequence
result = beam_search_decoder(data, 3)
# print result
for seq in result:
	print(seq)

[[4, 0, 4, 0, 4, 0, 4, 0, 4, 0], 6.931471805599453]
[[4, 0, 4, 0, 4, 0, 4, 0, 4, 1], 7.154615356913663]
[[4, 0, 4, 0, 4, 0, 4, 0, 3, 0], 7.154615356913663]
