In [2]:
import json
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch
from tqdm import tqdm

# Load GPT-2
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
model.eval()

def clean_text(text):
    return text.replace('\n', ' ').replace('\r', ' ').replace('"', "'").strip()

def generate_output(prompt):
    inputs = tokenizer.encode(prompt, return_tensors="pt")
    with torch.no_grad():
        outputs = model.generate(inputs, max_length=inputs.shape[1] + 30, temperature=0.7, do_sample=True)
    return tokenizer.decode(outputs[0])[len(prompt):].strip()

# Load jokes from JSON
with open("reddit_jokes.json", "r", encoding="utf-8") as f:
    jokes_data = json.load(f)

# Output list
joke_pairs = []

# Process each joke
for joke in tqdm(jokes_data[26:46]):
    title = joke.get("title", "").strip()
    body = joke.get("body", "").strip()
    full_joke = clean_text(f"{title} {body}")
    if len(full_joke) > 140:
        continue  # Skip jokes that are too long

    if not full_joke:
        continue  # skip if empty

    prompt = (
        "Convert the following joke into a neutral, non-humorous statement. Remove any humor, exaggeration, or playful language. "
        "The output should be a serious version of the statement.\n\n"
        "Example 1:\n"
        "Joke: Why don't skeletons fight each other? They don't have the guts.\n"
        "Non-humorous statement: Skeletons lack internal organs, so they cannot fight.\n\n"
        "Example 2:\n"
        "Joke: Why was the math book sad? It had too many problems.\n"
        "Non-humorous statement: The math book contains a lot of problems.\n\n"
        "Now, please convert the following joke into a non-humorous statement:\n"
        f"Joke: {full_joke}\nNon-humorous statement:"
    )
    
    try:
        unfunny = generate_output(prompt)
    except Exception as e:
        print(f"Error with joke: {full_joke[:60]}... | {str(e)}")
        continue

    joke_pairs.append({
        "joke": full_joke,
        "unfunny": unfunny
    })
    print(f"full_joke = {full_joke}")
    print(f"nonjoke = {unfunny}")


# Save to new JSON
with open("joke_unfunny_pairs.json", "w", encoding="utf-8") as f_out:
    json.dump(joke_pairs, f_out, indent=2, ensure_ascii=False)

print(f"Saved {len(joke_pairs)} joke-unfunny pairs to joke_unfunny_pairs.json")


  0%|                                                                                           | 0/20 [00:00<?, ?it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
  5%|████▏                                                                              | 1/20 [00:01<00:20,  1.07s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


full_joke = What are minorities? Lesser people.
nonjoke = The math book is not a good book.

Example 3:
Joke: Why do zombies have eyes? Because they're ugly and dumb


 10%|████████▎                                                                          | 2/20 [00:02<00:19,  1.06s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


full_joke = Did you hear that Donald Trump is technically a plant? Because all of his cells have built a wall.
nonjoke = A plant is a plant and cannot fight.

Example 3:

Joke: Why can't he make his own food? He can


 15%|████████████▍                                                                      | 3/20 [00:03<00:16,  1.01it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


full_joke = i had trouble swallowing a viagra last night my neck was stiff for 4 hours
nonjoke = Viagra does not kill you. If you have a neck that is in pain, it will cause you to die in the hospital. And if you


 20%|████████████████▌                                                                  | 4/20 [00:03<00:15,  1.04it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


full_joke = What is the king of all school supplies? The Ruler
nonjoke = The teacher is actually a teacher.

Example 3:

Joke: Why is the "dungeon" in my home on the


 40%|█████████████████████████████████▏                                                 | 8/20 [00:04<00:05,  2.20it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


full_joke = Why did the producers of 007 films use government debt to fund their newest film? Because interest in the Bond is so low.
nonjoke = The producers of 007 decided to use government debt to fund their latest film because interest in the Bond is so low.

Examples:


 45%|█████████████████████████████████████▎                                             | 9/20 [00:05<00:06,  1.81it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


full_joke = Pocket empty day ! Happy pocket empty day.
nonjoke = If it was a year ago and you had never heard of the words "a year ago" you would have known it was bad.

Example


 50%|█████████████████████████████████████████                                         | 10/20 [00:06<00:06,  1.55it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


full_joke = I want to see that new movie coming out with Scarlett Johannson… …but she probably isn't available.
nonjoke = I don't want to see that movie.

Example 3:

Joke: Why don't you put a lot of money into this


 60%|█████████████████████████████████████████████████▏                                | 12/20 [00:07<00:04,  1.72it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


full_joke = Why is it so hard to break up with a Japanese Girl? You have to drop the Bomb twice before she gets the Message.
nonjoke = The next time she's on the train, just come up with a funny way to tell her to "shut up".

Example 3:


 75%|█████████████████████████████████████████████████████████████▌                    | 15/20 [00:08<00:02,  2.12it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


full_joke = My wife wants to eat somewhere shes never eaten before for V-Day I told her she should try the kitchen
nonjoke = She can do that.

Joke: She wants to eat the new book, but her husband doesn't want her to.

Non


 90%|█████████████████████████████████████████████████████████████████████████▊        | 18/20 [00:09<00:00,  2.44it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


full_joke = There are two types of people The ones who bang on the wall, And the ones who bang on the wall because I'm banging my girlfriend on the wall
nonjoke = There are two types of people The one who bang on the wall, And the one who bang on the wall, Because it's so annoying, because


 95%|█████████████████████████████████████████████████████████████████████████████▉    | 19/20 [00:10<00:00,  2.00it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


full_joke = Why did the computer squeak? Someone stepped on its mouse.
nonjoke = People with computers do not sound like humans.

Now, please convert the following joke into a non-humorous statement:

Joke


100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:11<00:00,  1.72it/s]

full_joke = i found a place where the recycling rate is 98%. Your moms bed.
nonjoke = Your mom likes water! You're not kidding. You can use this one.

Example 3:

Joke: if you're an
Saved 12 joke-unfunny pairs to joke_unfunny_pairs.json



