In [1]:
from nhelper.generator import Generator

  from .autonotebook import tqdm as notebook_tqdm


## `kwargs` based generation

In [2]:
kwargs_generator = Generator()

In [3]:
template = "I am flying to {location} next week."
locations = ["NYC", "Copenhagen", "Miami"]

for i, prediction in enumerate(kwargs_generator.generate(template, location=locations)):
    print(f"Sentence {i}: {prediction}")

Sentence 0: I am flying to NYC next week.
Sentence 1: I am flying to Copenhagen next week.
Sentence 2: I am flying to Miami next week.


One can also combine multiple `kwargs`, by either giving the same number of values to all kwargs or not

In [4]:
template = "I am flying to {location} next week with my {who}."

# same number of values
locations = ["NYC", "Copenhagen", "Miami"]
whos = ["girlfriend", "sister", "dog"]

for i, prediction in enumerate(kwargs_generator.generate(template, location=locations, who=whos)):
    print(f"Sentence {i}: {prediction}")

Sentence 0: I am flying to NYC next week with my girlfriend.
Sentence 1: I am flying to Copenhagen next week with my sister.
Sentence 2: I am flying to Miami next week with my dog.


In [5]:
# different number of values
locations = ["London", "Paris"]
whos = ["family", "friend", "cat"]

for i, prediction in enumerate(kwargs_generator.generate(template, True, location=locations, who=whos)):
    print(f"Sentence {i}: {prediction}")

Sentence 0: I am flying to London next week with my family.
Sentence 1: I am flying to London next week with my friend.
Sentence 2: I am flying to London next week with my cat.
Sentence 3: I am flying to Paris next week with my family.
Sentence 4: I am flying to Paris next week with my friend.
Sentence 5: I am flying to Paris next week with my cat.


## Fill mask generation

You can use any fine-tuned model available [here](https://huggingface.co/models?pipeline_tag=fill-mask&sort=downloads) to generate samples by predicting masked words.

In [6]:
fill_mask_generator = Generator(
    fill_mask_model_name="bert-base-cased"
)

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [7]:
masked_sample0 = "I am flying to [MASK] next week."
masked_sample1 = "I am going with my [MASK]."

for i, predictions in enumerate(fill_mask_generator.fill_mask([masked_sample0, masked_sample1], top_k=5)):
    print(f"Samples {i}:")
    for pred in predictions:
        print(f"- {pred}")

Samples 0:
- I am flying to London next week.
- I am flying to Paris next week.
- I am flying to Italy next week.
- I am flying to California next week.
- I am flying to England next week.
Samples 1:
- I am going with my life.
- I am going with my heart.
- I am going with my plan.
- I am going with my head.
- I am going with my instincts.


# Translation generation

You can use any fine-tuned model available [here](https://huggingface.co/models?pipeline_tag=translation&sort=downloads) to generate samples by translating them.

In [8]:
translation_generator = Generator(
    translator_model_name="Helsinki-NLP/opus-mt-en-fr"
)



In [9]:
template1 = "My name is John and I live in London."
template2 = "I am on going to spend my holidays in New York."

for translation in translation_generator.translate([template1, template2]):
    print(f"Translated sentence: {translation}")

Translated sentence: Je m'appelle John et je vis à Londres.
Translated sentence: Je vais passer mes vacances à New York.


Note: for all generation methods you can either pass a `str` or a `List[str]`.

# Chaining generation methods

To generate even more synthetic samples one can chain generation methods. Here is an example on how to combine mask filling and translation.

In [10]:
generator = Generator(
    translator_model_name="Helsinki-NLP/opus-mt-en-fr",
    fill_mask_model_name="bert-base-cased"
)

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [11]:
masked_sample = "I am flying to [MASK] next week."
filled_samples = generator.fill_mask(masked_sample, top_k=5)[0]
translated_filled_samples = generator.translate(filled_samples)

for i, sample in enumerate(translated_filled_samples):
    print(f"Sequence {i}: {sample}")

Sequence 0: Je m'envole pour Londres la semaine prochaine.
Sequence 1: Je m'envole pour Paris la semaine prochaine.
Sequence 2: Je m'envole pour l'Italie la semaine prochaine.
Sequence 3: Je vais en Californie la semaine prochaine.
Sequence 4: Je m'envole pour l'Angleterre la semaine prochaine.
