In [2]:
# TEXT GENERATION USING GPT-2

from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Initialize the tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Define the maximum length of the generated text
max_length = 128

# Define the input text with line breaks and special characters
input_txt = """In a shocking finding, scientist discovered \
a herd of unicorns living in a remote, previously unexplored \
valley, in the Andes Mountains. Even more surprising to the \
researchers was the fact that the unicorns spoke perfect English.\n\n
"""

# Tokenize the input text and return tensor of input IDs
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"]

# Generate text from the input IDs using the model
output_greedy = model.generate(input_ids, max_length=max_length, do_sample=False)

# Decode the generated token IDs back to text
print(tokenizer.decode(output_greedy[0]))



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


"The unicorns were very intelligent, and they were very intelligent," said Dr. David S. Siegel, a professor of anthropology at the University of California, Berkeley. "They were very intelligent, and they were very intelligent, and they were very intelligent, and they were very intelligent, and they were very intelligent, and they were very intelligent, and they were very intelligent, and they were very


In [3]:
# TEXT GENERATION USING GPT-2 WITH BEAM SEARCH

from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Initialize the tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Define the maximum length of the generated text
max_length = 128

# Define the input text with line breaks and special characters
input_txt = """In a shocking finding, scientist discovered \
a herd of unicorns living in a remote, previously unexplored \
valley, in the Andes Mountains. Even more surprising to the \
researchers was the fact that the unicorns spoke perfect English.\n\n
"""

# Tokenize the input text and return tensor of input IDs
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"]

# Generate text using beam search
# `num_beams=5` specifies the number of beams for beam search
output_beam = model.generate(input_ids, max_length=max_length, num_beams=5, do_sample=False)

# Decode the generated token IDs back to text
print(tokenizer.decode(output_beam[0], skip_special_tokens=True))


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The researchers, from the University of California, San Diego, and the University of California, Santa Cruz, found that the unicorns were able to communicate with each other in a way that was similar to that of human speech.


"The unicorns were able to communicate with each other in a way that was similar to that of human speech," said study co-lead author Dr. David J.


In [4]:
# TEXT GENERATION USING GPT-2 WITH BEAM SEARCH AND NO REPEAT N-GRAMS

from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Initialize the tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Define the maximum length of the generated text
max_length = 128

# Define the input text with line breaks and special characters
input_txt = """In a shocking finding, scientist discovered \
a herd of unicorns living in a remote, previously unexplored \
valley, in the Andes Mountains. Even more surprising to the \
researchers was the fact that the unicorns spoke perfect English.\n\n
"""

# Tokenize the input text and return tensor of input IDs
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"]

# Generate text using beam search with no repeating n-grams of size 2
output_beam = model.generate(
    input_ids,
    max_length=max_length,
    num_beams=5,
    do_sample=False,
    no_repeat_ngram_size=2
)

# Decode the generated token IDs back to text
print(tokenizer.decode(output_beam[0], skip_special_tokens=True))


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The researchers, from the University of California, San Diego, and the National Science Foundation (NSF) in Boulder, Colorado, were able to translate the words of the unicorn into English, which they then translated into Spanish.

"This is the first time that we have translated a language into an English language," said study co-author and NSF professor of linguistics and evolutionary biology Dr.


In [5]:
# TEXT GENERATION USING GPT-2 WITH SAMPLING

from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Initialize the tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Define the maximum length of the generated text
max_length = 128

# Define the input text with line breaks and special characters
input_txt = """In a shocking finding, scientist discovered \
a herd of unicorns living in a remote, previously unexplored \
valley, in the Andes Mountains. Even more surprising to the \
researchers was the fact that the unicorns spoke perfect English.\n\n
"""

# Tokenize the input text and return tensor of input IDs
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"]

# Generate text using sampling with temperature and top-k parameters
output_temp = model.generate(
    input_ids,
    max_length=max_length,
    do_sample=True,
    temperature=2.0,
    top_k=50
)

# Decode the generated token IDs back to text
print(tokenizer.decode(output_temp[0], skip_special_tokens=True))


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


Although there was no unicorns, the tiny white unicorns — a result of a breeding, by scientists scientists think— were only thought by prehistoric settlers as 'unusual' rather than in our "supernaturally complex cultural" universe, like some living or 'born 'in order to learn faster' or eat better.'"We cannot help the theory that our prehistoric people knew well some sort of language in


In [6]:
# TEXT GENERATION USING GPT-2 WITH NUCLEUS SAMPLING (TOP-P)

from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Initialize the tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Define the maximum length of the generated text
max_length = 128

# Define the input text with line breaks and special characters
input_txt = """In a shocking finding, scientist discovered \
a herd of unicorns living in a remote, previously unexplored \
valley, in the Andes Mountains. Even more surprising to the \
researchers was the fact that the unicorns spoke perfect English.\n\n
"""

# Tokenize the input text and return tensor of input IDs
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"]

# Generate text using nucleus sampling with top-p parameter
output_topp = model.generate(
    input_ids,
    max_length=max_length,
    do_sample=True,
    top_p=0.90
)

# Decode the generated token IDs back to text
print(tokenizer.decode(output_topp[0], skip_special_tokens=True))


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


Dr. Gwen Jenson said, "We believe the unicorns in the Andes are the first species to possess English."

However, Dr. Jenson added that she and her colleagues wanted to find out more about the animals' dialects before they could go in and study them.


"They're pretty special animals that live under rocks and can communicate like we can," Dr.
