Rameesha - MSDS - 24F-8014

**Task 2: Decoder Model (GPT-2) — Recipe Generation**

**Problem:**
Generate	cooking	recipes	given	a	list	of	ingredients	or	a	recipe	title.

---


**Dataset:**
https://www.kaggle.com/datasets/nazmussakibrupol/3a2mext/data

---


**Objective:**
Fine-tune	GPT-2	to	generate	coherent	and	creative	recipes.	Input	may	be	a	list	of ingredients	or	a	dish	name,	and	the	output	should	be	a	recipe	with	steps.

---

**Deliverables:**
* Tokenization	and	dataset	formatting	script
* Training	loop	for	GPT-2
* Example	generations	and	quality	evaluation	(e.g.,	ROUGE,	BLEU,	human evaluation)
* Streamlit/Gradio	app	for	interactive	recipe	generation

In [2]:
import torch
import pandas as pd
import os
from transformers import GPT2Tokenizer, GPT2LMHeadModel
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
from tqdm import tqdm
from torch.optim import AdamW
from google.colab import drive

In [3]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


**dataset**

In [5]:
path = '/content/drive/MyDrive/ANLP/project_2/task_2/'
files = [f for f in os.listdir(path) if f.endswith('.csv')]
df = pd.read_csv(os.path.join(path, files[0]))

print(f"loaded dataset with {len(df)} recipes")
print(f"columns found: {df.columns.tolist()}")

loaded dataset with 2231143 recipes
columns found: ['title', 'NER', 'Extended_NER', 'genre', 'label', 'directions']


**data clean and prep**

In [6]:
# find correct column names automatically
cols = df.columns
title_col = [c for c in cols if 'title' in c.lower()][0]
ingr_col = [c for c in cols if 'ingredient' in c.lower() or 'ner' in c.lower()][0]
dir_col = [c for c in cols if 'direction' in c.lower() or 'step' in c.lower()][0]

# del missing rows
df_clean = df[[title_col, ingr_col, dir_col]].dropna()

# take smaller sample for faster training
df_sample = df_clean.sample(n=min(3000, len(df_clean)), random_state=42)

# combine title ing  and directions in 1 text
df_sample['text'] = df_sample.apply(
    lambda x: f"Title: {x[title_col]} | Ingredients: {x[ingr_col]} | Directions: {x[dir_col]}<|endoftext|>",
    axis=1
)

# split into train  val sets
train_texts, val_texts = train_test_split(df_sample['text'].tolist(), test_size=0.2, random_state=42)

print(f"train samples: {len(train_texts)}, validation samples: {len(val_texts)}")


train samples: 2400, validation samples: 600


**load GPT 2**

In [7]:
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
tokenizer.pad_token = tokenizer.eos_token

model = GPT2LMHeadModel.from_pretrained('gpt2')

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)

print(f"model ready on {device}")


model ready on cuda


**daatset nd dataloadder**

In [8]:
# custom dataset class
class RecipeDataset(torch.utils.data.Dataset):
    def __init__(self, texts, tokenizer, max_length=256):
        self.encodings = tokenizer(texts, truncation=True, max_length=max_length, padding='max_length')

    def __len__(self):
        return len(self.encodings['input_ids'])

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = item['input_ids'].clone()
        return item

# create datasets and loaders
train_dataset = RecipeDataset(train_texts, tokenizer)
val_dataset = RecipeDataset(val_texts, tokenizer)

train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=4)

print("dataset dataloader creation done")


dataset dataloader creation done


**model training**

In [9]:
optimizer = AdamW(model.parameters(), lr=5e-5)
epochs = 2

for epoch in range(epochs):
    print(f"\nepoch {epoch+1}/{epochs} started")
    model.train()
    total_train_loss = 0

    # training loop
    for batch in tqdm(train_loader, desc="training"):
        optimizer.zero_grad()
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)

        outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        total_train_loss += loss.item()

    avg_train_loss = total_train_loss / len(train_loader)
    print(f"average training loss: {avg_train_loss:.4f}")

    # validation loop
    model.eval()
    total_val_loss = 0
    with torch.no_grad():
        for batch in tqdm(val_loader, desc="validation"):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)
            outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            total_val_loss += outputs.loss.item()

    avg_val_loss = total_val_loss / len(val_loader)
    print(f"average validation loss: {avg_val_loss:.4f}")

print("\ntraining done")



epoch 1/2 started


training: 100%|██████████| 600/600 [03:03<00:00,  3.26it/s]


average training loss: 1.4412


validation: 100%|██████████| 150/150 [00:12<00:00, 11.68it/s]


average validation loss: 1.2908

epoch 2/2 started


training: 100%|██████████| 600/600 [03:04<00:00,  3.26it/s]


average training loss: 1.2781


validation: 100%|██████████| 150/150 [00:12<00:00, 11.64it/s]

average validation loss: 1.2599

training done





**saved finetune model**

In [11]:
model.save_pretrained('/content/drive/MyDrive/ANLP/project_2/task_2/gpt2_recipe_model')
tokenizer.save_pretrained('/content/drive/MyDrive/ANLP/project_2/task_2/gpt2_recipe_model')

print("model nd tokenizer saved to /content/drive/MyDrive/ANLP/project_2/task_2/gpt2_recipe_model")


model nd tokenizer saved to /content/drive/MyDrive/ANLP/project_2/task_2/gpt2_recipe_model


**test with my inputs**

In [12]:
def generate(prompt, max_length=200):
    inputs = tokenizer(prompt, return_tensors='pt').to(device)
    model.eval()
    with torch.no_grad():
        outputs = model.generate(
            inputs['input_ids'],
            max_length=max_length,
            temperature=0.8,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# few test examples
prompts = ["Title: Chocolate Cake |", "Title: Chicken Soup |"]

for p in prompts:
    print(f"\nprompt: {p}")
    print(f"generated recipe: {generate(p)[:200]}...")



prompt: Title: Chocolate Cake |
generated recipe: Title: Chocolate Cake | Ingredients: ["cocoa", "vanilla pudding", "eggs", "sugar", "vanilla pudding", "chocolate chips"] | Directions: ["Preheat oven to 350 degrees F (175 degrees C). Grease and flour...

prompt: Title: Chicken Soup |
generated recipe: Title: Chicken Soup | Ingredients: ["chicken broth", "onion", "garlic", "onions", "curry powder", "salt", "pepper", "thyme", "sour cream", "black pepper", "pepper"] | Directions: ["Combine all ingredi...


In [None]:
!pip install -q nltk rouge-score
import nltk
from nltk.translate.bleu_score import corpus_bleu
from rouge_score import rouge_scorer
nltk.download('punkt', quiet=True)

In [16]:
# take a few samples from validation set for quick testing
test_samples = val_texts[:10]
references = []
candidates = []

# generate predictions and store references
for recipe in test_samples:
    title_part = recipe.split('|')[0] + '|'
    generated = generate(title_part, max_length=200)
    references.append([recipe.split()])
    candidates.append(generated.split())

# calculate BLEU score
bleu_score = corpus_bleu(references, candidates)
print(f"bleu score: {bleu_score:.4f}")

# calculate ROUGE scores
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
rouge_scores = {'rouge1': 0, 'rouge2': 0, 'rougeL': 0}

for ref, cand in zip([r[0] for r in references], [' '.join(c) for c in candidates]):
    ref_text = ' '.join(ref)
    scores = scorer.score(ref_text, cand)
    for key in rouge_scores:
        rouge_scores[key] += scores[key].fmeasure

# average scores
for key in rouge_scores:
    rouge_scores[key] /= len(test_samples)

print(f"rouge-1: {rouge_scores['rouge1']:.4f}")
print(f"rouge-2: {rouge_scores['rouge2']:.4f}")
print(f"rouge-L: {rouge_scores['rougeL']:.4f}")

print("\nevaluation done")

bleu score: 0.0781
rouge-1: 0.3119
rouge-2: 0.0878
rouge-L: 0.2201

evaluation done


**save report**

In [17]:
import json
from datetime import datetime

# create a summary dictionary
results_summary = {
    "model_name": "gpt2_recipe_model",
    "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
    "bleu_score": round(bleu_score, 4),
    "rouge1": round(rouge_scores["rouge1"], 4),
    "rouge2": round(rouge_scores["rouge2"], 4),
    "rougeL": round(rouge_scores["rougeL"], 4),
    "total_test_samples": len(test_samples)
}

# save path in google drive
save_path = '/content/drive/MyDrive/ANLP/project_2/task_2/evaluation_summary.json'

# write to json file
with open(save_path, "w") as f:
    json.dump(results_summary, f, indent=4)

print(f"evaluation summary saved to: {save_path}")


evaluation summary saved to: /content/drive/MyDrive/ANLP/project_2/task_2/evaluation_summary.json


**grdaio**

In [25]:
import gradio as gr
import re
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# load fine-tuned model
model_path = '/content/drive/MyDrive/ANLP/project_2/task_2/gpt2_recipe_model'
tokenizer = GPT2Tokenizer.from_pretrained(model_path)
model = GPT2LMHeadModel.from_pretrained(model_path).to(device)
model.eval()

# clean and format recipe text
def clean_recipe_text(text):
    text = text.replace("<|endoftext|>", "").encode("utf-8").decode("unicode_escape")

    # extract sections
    title_match = re.search(r"Title:\s*(.*?)\s*\|", text)
    ingr_match = re.search(r"Ingredients:\s*(.*?)\s*\|", text)
    dir_match = re.search(r"Directions:\s*(.*)", text)

    title = title_match.group(1).strip() if title_match else "Untitled"
    ingredients = ingr_match.group(1).replace("[", "").replace("]", "").replace('"', '').replace("'", '')
    directions = dir_match.group(1).replace("[", "").replace("]", "").replace('"', '').replace("'", '')

    # split items neatly
    ingr_list = [i.strip() for i in re.split(r",\s*", ingredients) if i.strip()]
    dir_list = [d.strip() for d in re.split(r"\.\s*", directions) if d.strip()]

    # build formatted output
    formatted = f"**Title:** {title}\n\n**Ingredients:**\n"
    formatted += "\n".join(f"- {i}" for i in ingr_list)
    formatted += "\n\n**Directions:**\n"
    formatted += "\n".join(f"{idx+1}. {step}" for idx, step in enumerate(dir_list))
    return formatted.strip()


# generate recipe text
def generate_recipe(prompt):
    if not prompt.strip():
        return "please enter a dish name or ingredients."

    input_text = f"Title: {prompt} |"
    inputs = tokenizer(input_text, return_tensors='pt').to(device)

    with torch.no_grad():
        outputs = model.generate(
            inputs['input_ids'],
            max_length=300,
            temperature=0.8,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    raw = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return clean_recipe_text(raw)


# color and style settings
main_color = "#FF6F61"  # warm coral (food tone)
accent_color = "#FFA94D"  # soft orange accent

# gradio interface
app = gr.Interface(
    fn=generate_recipe,
    inputs=gr.Textbox(
        lines=2,
        label="Enter dish name or ingredients",
        placeholder="e.g. Chocolate Cake or Chicken, Rice, Garlic"
    ),
    outputs=gr.Markdown(label="Generated Recipe"),
    title="🍽️ GPT-2 Recipe Generator",
    description="Type a dish name or a few ingredients to create a full recipe.",
    examples=[
        ["Spicy Chicken Curry"],
        ["Chocolate Cake"],
        ["Vegetable Fried Rice"],
        ["Mango Smoothie"],
        ["Pasta with Garlic and Cheese"]
    ],
    allow_flagging="never",  # flag fully removed
    theme=gr.themes.Default(primary_hue=gr.themes.colors.orange),
    css=f"""
        body {{
            background: linear-gradient(135deg, {main_color}33, {accent_color}33);
        }}
        .gradio-container {{
            border: 2px solid {main_color};
            border-radius: 12px;
            padding: 15px;
        }}
        h1, h2, h3 {{
            color: {main_color};
        }}
        textarea {{
            font-size: 15px !important;
            height: 400px !important;
        }}
    """
)

app.launch(share=True)




Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://7fec1d07c0ed5ce4ed.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


