# Fine Tuning GPT-2 for story generation based on text prompts

### Data Loading

In [1]:
import torch

if torch.cuda.is_available():
    print("CUDA is available. You can use GPU.")
else:
    print("CUDA is not available. Check your setup.")

CUDA is available. You can use GPU.


In [2]:
import os
os.environ['KAGGLE_CONFIG_DIR'] = "/content"

In [4]:
!kaggle datasets download -d ratthachat/writing-prompts

Downloading writing-prompts.zip to /content
 99% 367M/370M [00:04<00:00, 106MB/s] 
100% 370M/370M [00:04<00:00, 84.0MB/s]


In [5]:
!unzip '/content/writing-prompts.zip'

Archive:  /content/writing-prompts.zip
  inflating: writingPrompts/README   
  inflating: writingPrompts/test.wp_source  
  inflating: writingPrompts/test.wp_target  
  inflating: writingPrompts/train.wp_source  
  inflating: writingPrompts/train.wp_target  
  inflating: writingPrompts/valid.wp_source  
  inflating: writingPrompts/valid.wp_target  


### Setting Up Model

Importing hugging face

In [6]:
!git clone https://github.com/huggingface/transformers

Cloning into 'transformers'...
remote: Enumerating objects: 171766, done.[K
remote: Counting objects: 100% (1241/1241), done.[K
remote: Compressing objects: 100% (726/726), done.[K
remote: Total 171766 (delta 727), reused 796 (delta 420), pack-reused 170525[K
Receiving objects: 100% (171766/171766), 171.77 MiB | 9.14 MiB/s, done.
Resolving deltas: 100% (129831/129831), done.


In [7]:
!pip install transformers/

Processing ./transformers
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: transformers
  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Created wheel for transformers: filename=transformers-4.36.0.dev0-py3-none-any.whl size=8076097 sha256=78fd5cfd227cb8f6a1da92c879ac75c2567be84dc542122ee7ebe5d33f2d9351
  Stored in directory: /tmp/pip-ephem-wheel-cache-tmua9htg/wheels/7c/35/80/e946b22a081210c6642e607ed65b2a5b9a4d9259695ee2caf5
Successfully built transformers
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.35.2
    Uninstalling transformers-4.35.2:
      Successfully uninstalled transformers-4.35.2
Successfully installed transformers-4.36.0.dev0


In [8]:
!pip install -r transformers/examples/requirements.txt

[31mERROR: Could not open requirements file: [Errno 2] No such file or directory: 'transformers/examples/requirements.txt'[0m[31m
[0m

In [9]:
!ls /content/writingPrompts

README		test.wp_target	 train.wp_target  valid.wp_target
test.wp_source	train.wp_source  valid.wp_source


### Data-Preprocessing

Combine one-line data for GPT-2

In [10]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

for dirname, _, filenames in os.walk('/content/writingPrompts'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/content/writingPrompts/valid.wp_target
/content/writingPrompts/test.wp_target
/content/writingPrompts/valid.wp_source
/content/writingPrompts/train.wp_source
/content/writingPrompts/train.wp_target
/content/writingPrompts/README
/content/writingPrompts/test.wp_source


In [11]:
os.makedirs('/content/working')

In [12]:
DIR = "/content/writingPrompts/"
data = [DIR+"train", DIR+"test", DIR+"valid"]

TARGET_DIR = '/content/working/'
target_data = [TARGET_DIR+"train", TARGET_DIR+"test", TARGET_DIR+"valid"]

In [13]:
from tqdm import tqdm_notebook as tqdm

NUM_WORDS = 300 # originally, FAIR use 1000, but here I use 300 just to be able to train distilgpt2 quickly

for name_id in tqdm(range(len(data))):
    fp = open(data[name_id] + ".wp_source")
    ft = open(data[name_id] + ".wp_target")

    stories = ft.readlines()
    prompts = fp.readlines()

    assert len(prompts) == len(stories)

    new_stories = [prompts[i].rstrip()+ " <endprompts> " + " ".join(stories[i].split()[0:NUM_WORDS]) for i in range(len(stories))]


    with open(target_data[name_id] + ".wp_combined", "w") as o:
        for line in new_stories:
            o.write(line.strip() + "\n")
        print('finish writing',target_data[name_id] + ".wp_combined")

    fp.close()
    ft.close()

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for name_id in tqdm(range(len(data))):


  0%|          | 0/3 [00:00<?, ?it/s]

finish writing /content/working/train.wp_combined
finish writing /content/working/test.wp_combined
finish writing /content/working/valid.wp_combined


In [14]:
!ls -sh

total 370M
4.0K kaggle.json  4.0K transformers  4.0K writingPrompts
4.0K sample_data  4.0K working	     370M writing-prompts.zip


In [15]:
!head -n 4 '/content/working/train.wp_combined'
!head -n 4 '/content/working/test.wp_combined'

[ WP ] You 've finally managed to discover the secret to immortality . Suddenly , Death appears before you , hands you a business card , and says , `` When you realize living forever sucks , call this number , I 've got a job offer for you . '' <endprompts> So many times have I walked on ruins , the remainings of places that I loved and got used to.. At first I was scared , each time I could feel my city , my current generation collapse , break into the black hole that thrives within it , I could feel humanity , the way I 'm able to feel my body.. After a few hundred years , the pattern became obvious , no longer the war and damage that would devastate me over and over again in the far past was effecting me so dominantly . <newline> It 's funny , but I felt as if after gaining what I desired so long , what I have lived for my entire life , only then , when I achieved immortality I started truly aging . <newline> <newline> 5 world wars have passed , and now they feel like a simple sicke

### Fine-Tuning the model

In [16]:
import numpy as np
import torch

In [17]:
DATA_PATH = '/content/working/'

In [18]:
!head -n 4 {DATA_PATH}train.wp_combined

[ WP ] You 've finally managed to discover the secret to immortality . Suddenly , Death appears before you , hands you a business card , and says , `` When you realize living forever sucks , call this number , I 've got a job offer for you . '' <endprompts> So many times have I walked on ruins , the remainings of places that I loved and got used to.. At first I was scared , each time I could feel my city , my current generation collapse , break into the black hole that thrives within it , I could feel humanity , the way I 'm able to feel my body.. After a few hundred years , the pattern became obvious , no longer the war and damage that would devastate me over and over again in the far past was effecting me so dominantly . <newline> It 's funny , but I felt as if after gaining what I desired so long , what I have lived for my entire life , only then , when I achieved immortality I started truly aging . <newline> <newline> 5 world wars have passed , and now they feel like a simple sicke

In [19]:
TRAIN_FILE=DATA_PATH+'valid.wp_combined' # Use valid as train to minimize training time first
TEST_FILE=DATA_PATH+'test.wp_combined'
print(TRAIN_FILE)

/content/working/valid.wp_combined


In [20]:
!mkdir output

In [24]:
!pip install torch torchvision transformers



In [25]:
import torch
from transformers import GPT2LMHeadModel, GPT2Config, GPT2Tokenizer, GPT2LMHeadModel, GPT2Config
from transformers import TextDataset, DataCollatorForLanguageModeling
from transformers import Trainer, TrainingArguments

# Load GPT-2 model and tokenizer
model_name = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Pre-processing the data to remove tokens

In [26]:
import re

def preprocess_text(text):
    # Remove [ WP ] and <endprompts> tags
    text = re.sub(r'\[ WP \]', '', text)
    text = re.sub(r'<endprompts>', '', text)

    # Replace <newline> with newline character
    text = re.sub(r'<newline>', '\n', text)

    return text.strip()


In [28]:
# Load and preprocess your dataset
with open("/content/working/valid.wp_combined", "r", encoding="utf-8") as file:
    raw_text = file.read()
preprocessed_text = preprocess_text(raw_text)

# Tokenize the preprocessed text
tokenized_text = tokenizer(preprocessed_text, return_tensors="pt", max_length=512, truncation=True)

In [32]:
# Create TextDataset
dataset = TextDataset(
    tokenizer=tokenizer,
    file_path="/content/working/valid.wp_combined",
    block_size=128,  # Adjust block_size based on your dataset
    overwrite_cache=True,
    #text_column="input_ids",
)

In [33]:
# Data collator for language modeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False  # We are not doing masked language modeling in this case
)
# Training arguments
training_args = TrainingArguments(
    output_dir="./gpt2-training",
    overwrite_output_dir=True,
    num_train_epochs=1,
    per_device_train_batch_size=2,
    save_steps=10_000,
    save_total_limit=2,
)

In [34]:
# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=dataset,
)

# Train the model
trainer.train()

Step,Training Loss
500,3.4011
1000,3.3489
1500,3.288
2000,3.2897
2500,3.2902
3000,3.2403
3500,3.2724
4000,3.2864
4500,3.2596
5000,3.2364


TrainOutput(global_step=23022, training_loss=3.1813347148456166, metrics={'train_runtime': 2722.6787, 'train_samples_per_second': 16.911, 'train_steps_per_second': 8.456, 'total_flos': 3007732580352000.0, 'train_loss': 3.1813347148456166, 'epoch': 1.0})

In [36]:
# Save the fine-tuned model
output_dir = "/content/output"
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

('/content/output/tokenizer_config.json',
 '/content/output/special_tokens_map.json',
 '/content/output/vocab.json',
 '/content/output/merges.txt',
 '/content/output/added_tokens.json')

In [37]:
ls /content/gpt2-training/checkpoint-20000

config.json             model.safetensors  rng_state.pth  trainer_state.json
generation_config.json  optimizer.pt       scheduler.pt   training_args.bin


### Evaluating the model

In [48]:
from transformers import TextDataset, DataCollatorForLanguageModeling
from transformers import Trainer, TrainingArguments
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load your fine-tuned model and tokenizer
model_name = "/content/output"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# Load your test dataset
test_dataset = TextDataset(
    tokenizer=tokenizer,
    file_path="/content/working/test.wp_combined",
    block_size=128  # Adjust block_size based on your dataset
)

# Data collator for language modeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False  # We are not doing masked language modeling in this case
)

# Training arguments for evaluation
evaluation_args = TrainingArguments(
    output_dir="./gpt2-evaluation",
    overwrite_output_dir=True,
    per_device_eval_batch_size=2,
)

# Trainer for evaluation
trainer = Trainer(
    model=model,
    args=evaluation_args,
    data_collator=data_collator,
    compute_metrics=None,  # Disable default metrics calculation
)

# Evaluate on the test set and calculate perplexity
results = trainer.evaluate(test_dataset)




In [47]:
from math import exp
# Calculate perplexity manually
cross_entropy = results["eval_loss"]
perplexity = exp(cross_entropy)
print(f"Perplexity on the test set: {perplexity}")

Perplexity on the test set: 22.80039561498663


### Story Generation

Story 1

In [40]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load the fine-tuned GPT-2 model
model_name = "/content/output"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# Set the model to evaluation mode
model.eval()

# Generate text based on a prompt
prompt = "Once upon a time, in a land far, far away"
# Tokenize the prompt and obtain the attention mask
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones(input_ids.shape, device=input_ids.device)

# Generate text
output = model.generate(input_ids, max_length=100, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Once upon a time, in a land far, far away, there was a man who had been born to a family of gods. He was born with a gift, a power, and a purpose. <newline> <endprompts> `` I'm sorry, '' I said, `` but I don't know how to explain it. '' <promp> I was naught but a child. I had no idea what I would be born into. My parents had


Story 2

In [43]:
# Generate text based on a prompt
prompt = "[ WP ] Aliens have arrived , and ask for a single human to plead humanity case and save them from extinction <endprompts>"
# Tokenize the prompt and obtain the attention mask
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones(input_ids.shape, device=input_ids.device)

# Generate text
output = model.generate(input_ids, max_length=300, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[ WP ] Aliens have arrived, and ask for a single human to plead humanity case and save them from extinction <endprompts> `` I'm sorry, but I don't want to hear you say anything. '' <newline> < newline > `` You're not going to say that. I just want you to know that I am not here to kill you. You are here for the sake of humanity. And I will not kill anyone else. If you do, I can not do it. But I want your help. Please, do not hurt me. Do not harm me, please do. We are not alone. The only thing that can save us is you, the only one who can help us. < Newline < <promp> I was born in the year 2064. My parents were killed in a car crash. They were the first to die. It was my fault. That was the last time I saw them. When I grew up, my parents died. Their bodies were burned to the ground. Then I heard the screams of the people who had been killed. People screaming. `` Please do what you will. This is not the time to hurt anyone. There is no one left alive. No one to save. Not one. Just one m

Story 3

In [50]:
# Load the fine-tuned GPT-2 model
model_name = "/content/output"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# Set the model to evaluation mode
model.eval()
# Generate text based on a prompt
prompt = " A little girl lived in a small village near river"
# Tokenize the prompt and obtain the attention mask
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones(input_ids.shape, device=input_ids.device)

# Generate text
output = model.generate(input_ids, max_length=300, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


 A little girl lived in a small village near riverbank. She was a little too young to be a princess, but she was beautiful. <newline> <endprompts> `` I'm sorry, '' I said, `` but I don't want to go to school. '' <promp> I was nodding to the girl, who was sitting on the edge of the river. I could nudge her to sit, and she would nuzzle her nose. `` You're not going to get a job, are you? '' she asked. The girl looked at me with a confused expression. Her eyes were wide, her lips were dry, she looked like she had been crying. My heart was pounding. It was so hard to believe that I had to do this. But I did n'T want her. This was my fault. And I knew it. So I went to her, to my room, where I found her and told her I loved her so much. We sat there, together, in silence. Then I saw her face. A smile spread across her cheeks. There was no emotion. No sadness. Just a smile. That smile, that smile that was hers. Tears. They were so beautiful, so sweet. When I looked up, I felt her eyes, they w

Story 4

In [51]:
# Generate text based on a prompt
prompt = " An astronaut was travelling in a space ship to the moon"
# Tokenize the prompt and obtain the attention mask
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones(input_ids.shape, device=input_ids.device)

# Generate text
output = model.generate(input_ids, max_length=300, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


 An astronaut was travelling in a space ship to the moon. <endprompts> `` I'm sorry, sir. I don't know what you're talking about. '' <newline> < newline > < < Newline : < ] <> I was sitting in the back of the ship, staring at the stars. The stars were so beautiful, so bright, and so full of life. It was a beautiful day, but I could n ’ t see the sun. My eyes were closed, my mind was frozen. There was no light, no sound, nothing. No light. Nothing. And then I saw the light of a thousand stars, the lights of thousands of stars that were all shining in different colors. They were shining, shining. But I did n've seen them before. Not in my lifetime, not in this life, nor in any of my dreams. This was the first time I had seen a light in that lightless, empty space. A light that was so empty, that it was blinding. That light was like a blinding light on a night. So bright that I felt like I would nuke the world. Or maybe I “ would ”. Maybe I thought I might be the only one who could see it

Story 5

In [52]:
# Generate text based on a prompt
prompt = "A cat was sleeping in her cozy blanket"
# Tokenize the prompt and obtain the attention mask
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones(input_ids.shape, device=input_ids.device)

# Generate text
output = model.generate(input_ids, max_length=300, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A cat was sleeping in her cozy blanket. She wasn't sure if she was dreaming or not. <newline> <endprompts> I was sitting in the corner of the living room, staring at the ceiling. I had nagged at my phone, trying to figure out what was going on. It was a strange thing to do, but I knew it was happening. The cat had been sleeping for a while now, and I could nudge her awake. But I did nag at her. My phone was ringing. A voice. `` Hello? '' < newline > < oldline < Newline : `` I'm sorry, I just wanted to say hello. '' I said. Her eyes widened. What was she saying? <oldline ] < Oldline
[ WP ] You are a teenager with the ability to measure how `` Dangerous '' people are on a scale from 1 to 10 just by looking at them. Bands you play with have a 1, a 7, 8, 9, 10, or even a 10. Make up your mind about the person you're with that will give you a 100. Then you notice the unassuming new kid at school measures a ten. 1 is a normal child, theres a 9's a genius, an 8'' s a freak, even an 11'is an E

Story 6

In [53]:
# Generate text based on a prompt
prompt = "John met Fiona in a park while playing"
# Tokenize the prompt and obtain the attention mask
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones(input_ids.shape, device=input_ids.device)

# Generate text
output = model.generate(input_ids, max_length=300, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


John met Fiona in a park while playing with her dog. She was a little bit nervous, but she wasn't afraid of anything. <newline> <endprompts> `` I'm sorry, '' she said, `` but I do n've got to go. '' <promp> I looked at her, and she smiled. `` You're not going to be able to see me. I ca n ’ t see you. You ‘ ll be fine. But I ” m not sure if I can see her. Maybe I should just go to the park. It “ s not like I have to worry about you, ‪ she thought.   I know you ‴ re not. And I don ‛ t want to hurt you either. So I just want you to know that I love you. ‬ <Prompt 1>
[ WP ] You are a teenager with the ability to measure how `` Dangerous '' people are on a scale from 1 to 10 just by looking at them. A normal child would be a 1, while a trained man with an assault rifle might be an a 7. Today, you notice the unassuming new kid at school measures a 10. ( read more ) < endprompromps> The man in the white lab coat was wearing a labcoat measures an A, the trained woman in black lab coats measure

Story 7

In [54]:
# Generate text based on a prompt
prompt = "It was the first day of school"
# Tokenize the prompt and obtain the attention mask
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones(input_ids.shape, device=input_ids.device)

# Generate text
output = model.generate(input_ids, max_length=300, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


It was the first day of school, and I was a little nervous. I hadn't been in class for a week, but I knew I would be in the class. <newline> <endprompts> I'm not sure if I should have been more nervous, or if it was just me. Maybe I just wanted to be alone. Or maybe I wanted a break from the world. Whatever it is, I did n've to do it. It wasn't like I could just go to the bathroom and go back to sleep. The only thing I remember was waking up in a room with a door open. There was no door. No door to go in. Just a window. And I thought I saw a man. He was wearing a white shirt and a blue tie. A white tie with red stripes. His hair was long and he wore a red tie that was too short. But I never saw him. Not in school. In the classroom. At least, not in my class, anyway. My teacher was always there. She was there, too. Her hair had been long, her eyes were wide and her face was covered in red. That was odd. So I guess I decided to try and look around. What was I looking for? I looked for so

Story 8

In [55]:
# Generate text based on a prompt
prompt = "It was a fine sunny day, perfect for a picnic"
# Tokenize the prompt and obtain the attention mask
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones(input_ids.shape, device=input_ids.device)

# Generate text
output = model.generate(input_ids, max_length=300, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


It was a fine sunny day, perfect for a picnic. I wasn't sure if I would be able to get out of bed, but I did n've to. <newline> < newline > I woke up to a loud bang. My eyes were closed, and I could n ’ t see anything. The sound was so loud, I almost felt like I had been punched in the face. It was like a car crash. A loud crash, like the sound of a gun. But I couldn ‘ t hear anything, so I just stood there, staring at the sky. There was no light, no sound, just a dull, dull dull light. And then I heard a knock on the door. “ Hey, hey, ” I said, my voice was trembling.
[ WP ] You are a teenager with the ability to measure how `` Dangerous '' people are on a scale from 1 to 10 just by looking at them. Bands you play with have a 1, a 7, 8, 9, 10, or even a 10. Make up your mind about the person you're with that will give you a perfect score. 1 is a normal child, while a trained man with an assault rifle might be a man who just shot and killed an innocent bystander. 10 is an advanced trai

Story 9

In [56]:
# Generate text based on a prompt
prompt = "I have decided to go to Italy for holidays"
# Tokenize the prompt and obtain the attention mask
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones(input_ids.shape, device=input_ids.device)

# Generate text
output = model.generate(input_ids, max_length=300, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I have decided to go to Italy for holidays. I have been told that I will be able to travel to the US for a few days. <newline> <endprompts> I've been here for over a year now. It's been a long time since I was able, but I still have a lot of work to do. My wife and I are going to be moving to a new house in a small town in the middle of nowhere. We're going out to dinner, and we 'll be back soon. The only thing I can do is wait until I get home. That 'd be a good thing. But I can't wait to get out of here. There's no way I could be here when I do n'm going. So I just go. And I go, I walk, my legs are shaking, the door is shaking. Then I see the light. A light, a light that 'S it''. Suddenly, there 'D be light in my room. Something is moving. Like a flash. Or something. Maybe it is a flashlight. No, it was n'light. Just a bright light! I look around, looking for something, something that is nigh impossible. Nothing. What is it? I am naught but a shadow. Not a thing, not a person. This i

Story 10

In [57]:
# Generate text based on a prompt
prompt = "All children in the class are excited for the trip"
# Tokenize the prompt and obtain the attention mask
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones(input_ids.shape, device=input_ids.device)

# Generate text
output = model.generate(input_ids, max_length=300, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


All children in the class are excited for the trip. <endprompts> `` I'm going to be the first to go! '' <newline> < newline > `` Oh, I know! I just want to get out of here!! Oh my god! What the hell!? '' I yelled. I wasn't sure what to say. `` What's going on?! How did you get here?? I mean, you're not going anywhere! You've got to come with me! And I want you to stay with us! We 'll be fine! But I ca n ’ t go anywhere without you! It 'd be so much fun! So much! < Newline ] < < NEWline [ ] I looked around the room. The room was empty. There was no one else in there. No one in sight. It was just me and my friend. We were all alone. And we were alone, and I could n´t see anyone else. But we all looked at each other. Everyone was staring at me. They all were staring back at us. All of them. Every single one of us was looking at the same thing. Looking at my face. My face was blank. Everything was blurry. Nothing. Just me, my friends, all of the other kids. None of me was there, but I knew