# End-to-End Food Taste Profiling and Evaluation

This notebook implements the complete workflow for generating and evaluating taste profiles for a list of unlabeled foods. It uses the knowledge base generated in the previous steps, combined with an LLM. Evaluation is done via an LLM as a Judge.

## Setup

In [None]:
%pip install transformers torch accelerate bitsandbytes


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
import pandas as pd
import io
import csv
import time
from huggingface_hub import login

from getpass import getpass
import ast
import json
import re

In [None]:
HUGGINGFACE_API_KEY = getpass("Enter your Hugging Face API key: ")
login(token=HUGGINGFACE_API_KEY)

## Model Loading

In [None]:
MODEL = "mistralai/Mistral-Nemo-Instruct-2407"
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    MODEL,
    quantization_config=quantization_config,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(MODEL)
print("Model and tokenizer loaded successfully.")

## 1. Taste Profile Generation

This section contains all the logic for converting an unlabeled food item into a labeled one with a 5-point taste profile.

### Model Prompts - Ingredient Extraction

In [None]:
SYSTEM_PROMPT = """You are a culinary expert and food scientist. Your task is to analyze a food item's name and description to identify its key ingredients and their contribution to the overall taste profile.

CONTEXT:
- You will be given the name and description of a food item.
- Some ingredients will be explicitly mentioned, while others are implicit and require your internal knowledge (e.g., "pad thai" implies rice noodles, peanuts, egg, a protein, salt, etc.).
- You must determine the "taste proportion" of each ingredient, which represents its impact on the final taste, not its volume or weight. For instance, a small amount of a potent ingredient like wasabi will have a high taste proportion.
- The sum of all taste proportions for a given food item should ideally add up to 1.0.

RULES:
1.  **Identify Ingredients:** Extract all key ingredients, both explicit and implicit.
2.  **Use Processed Names:** If an ingredient is processed in a way that significantly alters its taste, use the format "ingredient_processed" (e.g., "cucumber_pickled", "pork_cured", "cabbage_fermented"). If it's a basic ingredient, use its normal name (e.g. "bread", "tofu").
3.  **Estimate Taste Proportion:** Assign a decimal value from 0.0 to 1.0 for each ingredient's taste proportion.
4.  **Format Output Correctly:** The output MUST be only a list of CSV lines. Do not include headers, explanations, or any other text.

**OUTPUT FORMAT** STRICTLY ONLY RETURN a valid JSON list of dictionaries, enclosed in square brackets [ ... :
[{ingredient: ingredient_name,taste_proportion: taste_proportion}, {}, ...]

EXAMPLES:
Input:
Name: Sriracha
Description: A type of hot sauce or chili sauce made from a paste of chili peppers, distilled vinegar, garlic, sugar, and salt.

Output:
[{'ingredient': 'chili_peppers_fermented', 'taste_proportion': 0.5},
{'ingredient': 'vinegar_distilled', 'taste_proportion': 0.2},
{'ingredient': 'sugar', 'taste_proportion': 0.15},
{'ingredient': 'garlic', 'taste_proportion': 0.1},
{'ingredient': 'salt', 'taste_proportion': 0.05}]

Input:
Name: Guacamole
Description: An avocado-based dip, spread, or salad first developed in Mexico, often containing lime juice, cilantro, and onions.

Output:
[{'ingredient': avocado, 'taste_proportion': 0.5},
{'ingredient': lime, 'taste_proportion': 0.2},
{'ingredient': onion, 'taste_proportion': 0.15},
{'ingredient': cilantro, 'taste_proportion': 0.1},
{'ingredient': salt, 'taste_proportion': 0.05}]
"""

In [None]:
USER_PROMPT_TEMPLATE = """
Generate the ingredient and taste proportion list for this food item:
Name: {food_name}
Description: {food_description}

**IMPORTANT**: Your response must contain ONLY the raw CSV lines and nothing else.
"""

In [None]:
# Prompts for when the knowledge base does not have the requisite ingredient
SYSTEM_PROMPT_FALLBACK = """You are a food science expert. Your task is to estimate the 5-point taste profile for a given ingredient.

RULES:
1.  Analyze the ingredient name.
2.  Estimate the scores for salty, umami, sweet, sour, and bitter on a scale from 0.0 to 1.0.
3.  Your response MUST BE ONLY a valid JSON object with the five taste keys. Do not include any other text, explanations, or markdown.

**OUTPUT FORMAT** STRICTLY ONLY RETURN a valid JSON object:
{"salty": 0.0, "umami": 0.0, "sweet": 0.0, "sour": 0.0, "bitter": 0.0}

EXAMPLES:
Input: lime
Output:
{"salty": 0.0, "umami": 0.1, "sweet": 0.1, "sour": 0.8, "bitter": 0.1}

Input: soy_sauce
Output:
{"salty": 0.9, "umami": 0.8, "sweet": 0.2, "sour": 0.1, "bitter": 0.2}
"""

USER_PROMPT_FALLBACK_TEMPLATE = "Ingredient: {ingredient_name}"

### Generation Functions

In [None]:
df = pd.read_csv("/kaggle/input/food-dataset/data/food_items_unlabeled.csv") # replace with the unlabeled dataset path
food_name = df.iloc[1]['name']
food_description = df.iloc[1]['description']

In [None]:
def extract_ingredients(food_name: str, food_description: str, model, tokenizer) -> str:
    # Generates the ingredient list and taste proportions for a given food item using the LLM.

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": USER_PROMPT_TEMPLATE.format(
            food_name=food_name,
            food_description=food_description
        )}
    ]

    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    outputs = model.generate(
        **inputs,
        max_new_tokens=1024,
        do_sample=False
    )

    output_text = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
    return output_text.strip()

In [None]:
def get_taste_profile_from_llm(ingredient_name: str, model, tokenizer):
    # Uses the LLM to generate a taste profile for a single ingredient not found in the knowledge base.
    
    print(f"Ingredient '{ingredient_name}' not in KB. Querying LLM")
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT_FALLBACK},
        {"role": "user", "content": USER_PROMPT_FALLBACK_TEMPLATE.format(ingredient_name=ingredient_name)}
    ]

    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    outputs = model.generate(**inputs, max_new_tokens=100, do_sample=False)
    output_text = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True).strip()

    try:
        profile = json.loads(output_text)
        if all(key in profile for key in ['salty', 'umami', 'sweet', 'sour', 'bitter']):
            print(f"  -> LLM generated profile for '{ingredient_name}'.")
            return profile
        else:
            raise ValueError("Missing keys in LLM JSON output.")
    except (json.JSONDecodeError, ValueError) as e:
        print(f"Failed to parse LLM fallback response for '{ingredient_name}': {e}")
        return None

In [None]:
def calculate_weighted_taste_profile(llm_output_str: str, knowledge_base_path: str, model, tokenizer):
    # parses ingredient output, looks them up in the knowledge base, and uses an LLM fallback for missing ingredients to calculate a weighetd average taste profile

    try:
        kb_df = pd.read_csv(knowledge_base_path).set_index('entity_name')
    except FileNotFoundError:
        print(f"Error: Knowledge base file not found at '{knowledge_base_path}'")
        return None

    try:
        ingredients = json.loads(llm_output_str)
        if not isinstance(ingredients, list): raise ValueError("LLM output is not a list.")
    except (ValueError, json.JSONDecodeError) as e:
        print(f"Error parsing LLM output: {e}")
        return None

    final_profile = {'salty': 0.0, 'umami': 0.0, 'sweet': 0.0, 'sour': 0.0, 'bitter': 0.0}
    taste_cols = list(final_profile.keys())

    for item in ingredients:
        name = item.get('ingredient')
        proportion = float(item.get('taste_proportion', 0.0))
        ingredient_profile = None

        if name in kb_df.index:
            # 1. get ingredient from KB
            ingredient_profile = kb_df.loc[name, taste_cols].to_dict()
        else:
            # 2. fallback to LLM if it does not exist
            ingredient_profile = get_taste_profile_from_llm(name, model, tokenizer)

        # 3. add to list
        if ingredient_profile:
            for taste in taste_cols:
                final_profile[taste] += float(ingredient_profile[taste]) * proportion

    return final_profile

In [None]:
def process_food_data(input_csv: str, output_csv: str, knowledge_base_path: str, model, tokenizer):
    # reads unlabeled food data, processes each item to get a taste profile, and saves the labeled data

    print(f"Starting food data processing for '{input_csv}'")
    df = pd.read_csv(input_csv)
    results = []

    for index, row in df.iterrows():
        start_time = time.time()
        food_name = row['name']
        
        print(f"[{index + 1}/{len(df)}] Processing: '{food_name}'...")

        # 1. Ingredient proportions from LLM
        ingredient_json_str = extract_ingredients(food_name, row['description'], model, tokenizer)
        print(f"  -> Got ingredients from LLM.")

        # 2. Taste profile from KB
        taste_profile = calculate_weighted_taste_profile(ingredient_json_str, knowledge_base_path, model, tokenizer)

        if taste_profile:
            # 3. Best Label
            best_label = max(taste_profile, key=taste_profile.get)
            taste_profile['best_label'] = best_label
            print(f"Best Label: '{best_label}'.")
        else:
            print(f"Failed to calculate profile for '{food_name}'. Skipping.")
            # empty profile for failed rows
            taste_profile = {
                'salty': 0.0, 'umami': 0.0, 'sweet': 0.0, 'sour': 0.0, 'bitter': 0.0, 'best_label': 'unknown'
            }

        new_row = {**row.to_dict(), **taste_profile}
        results.append(new_row)

        end_time = time.time()
        print(f"Done in {end_time - start_time:.2f} seconds.\n")


    # save to new csv
    output_df = pd.DataFrame(results)
    cols = ['id', 'name', 'description', 'best_label', 'salty', 'umami', 'sweet', 'sour', 'bitter']
    output_df = output_df[cols]

    output_df.to_csv(output_csv, index=False)
    print(f"Processing complete. Labeled data saved to '{output_csv}'.")


### Workflow Execution

In [None]:
# replace these with your actual file paths
INPUT_CSV_PATH = "/kaggle/input/food-dataset/data/food_items_unlabeled.csv"
KNOWLEDGE_BASE_PATH = "/kaggle/input/food-dataset/data/knowledge_base_average_processed.csv"
OUTPUT_CSV_PATH = "food_items_labeled.csv"

In [None]:
process_food_data(INPUT_CSV_PATH, OUTPUT_CSV_PATH, KNOWLEDGE_BASE_PATH, model, tokenizer)


## Evaluation Workflow

### Model Prompts - Evaluation

In [None]:
SYSTEM_PROMPT_JUDGE = """You are an impartial food science expert acting as an automated judge. Your sole task is to evaluate the plausibility of a dominant taste label assigned to a food item by another system.

You will be given the food's name, its description, the system's chosen dominant taste ('best_label'), and its runner-up choice ('second_best_label').

Based on this information, you must categorize the system's choice by responding with a single letter (A, B, C, or D) based on the following definitions:

A) **Accurate:** The provided 'best_label' is clearly the most accurate and logical dominant taste for this food item.
B) **Acceptable:** The 'best_label' is a reasonable choice, but the 'second_best_label' (or another unlisted taste) is equally or more dominant. The choice is defensible but not definitively the best.
C) **Inaccurate:** The 'best_label' is clearly incorrect. A different taste is obviously dominant.
D) **No Single Dominant Taste:** The food item is complex and does not have one single dominant taste that stands out significantly above others.

**IMPORTANT RULE:** Your entire response MUST be only the single capital letter (A, B, C, or D) that best fits the evaluation. Do not provide any explanation, context, or additional text.
"""

USER_PROMPT_JUDGE_TEMPLATE = """Food Name: {name}
Description: {description}
System's best_label: {best_label}
System's second_best_label: {second_best_label}
"""

### Evaluation Functions

In [None]:
def get_evaluation_grade(row: pd.Series, model, tokenizer) -> str:
    # Evaluates a single labeled food item using the LLM as a Judge
    
    # get the second best label
    taste_cols = ['salty', 'umami', 'sweet', 'sour', 'bitter']
    scores = row[taste_cols].to_dict()
    best_label = row['best_label']
    
    scores.pop(best_label, None)
    second_best_label = max(scores, key=scores.get) if scores else "N/A"

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT_JUDGE},
        {"role": "user", "content": USER_PROMPT_JUDGE_TEMPLATE.format(
            name=row['name'],
            description=row['description'],
            best_label=best_label,
            second_best_label=second_best_label
        )}
    ]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    outputs = model.generate(**inputs, max_new_tokens=5, do_sample=False)
    output_text = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True).strip()

    # parse output to get a letter grade
    match = re.search(r'^[A-D]', output_text)
    if match:
        return match.group(0)
    else:
        print(f"  -> Warning: Could not parse grade from LLM output: '{output_text}'")
        return "Error"


In [None]:
def augment_with_evaluations(input_csv: str, output_csv: str, model, tokenizer, sample_size: int = None):
    # reads a labeled dataset and gets an evaluation grade for each item, saving to a new CSV

    print(f"Starting evaluation process for '{input_csv}'")
    try:
        df = pd.read_csv(input_csv)
    except FileNotFoundError:
        print(f"Error: Labeled data file not found at '{input_csv}'")
        return

    if sample_size and sample_size < len(df):
        print(f"Processing a random sample of {sample_size} items.")
        eval_df = df.sample(n=sample_size, random_state=42)
    else:
        print(f"Processing all {len(df)} items.")
        eval_df = df

    grades = []
    total_rows = len(eval_df)
    for i, (index, row) in enumerate(eval_df.iterrows()):
        print(f"Processing item {i + 1}/{total_rows}: '{row['name']}'...")
        grade = get_evaluation_grade(row, model, tokenizer)
        grades.append(grade)

    eval_df['eval_grade'] = grades
    eval_df.to_csv(output_csv, index=False)
    print(f"\nProcessing complete. Augmented data saved to '{output_csv}'.")

In [None]:
def report_metrics_from_csv(evaluation_csv: str):
    # prints final evaluation report

    print(f"\n--- Reading Evaluation Report from '{evaluation_csv}' ---")
    try:
        df = pd.read_csv(evaluation_csv)
    except FileNotFoundError:
        print(f"Error: Evaluation file not found at '{evaluation_csv}'")
        return

    grades = df['eval_grade'].tolist()
    total_evaluated = len([g for g in grades if g != 'Error'])
    if total_evaluated == 0:
        print("No valid evaluations found in the file.")
        return

    count_a = grades.count('A')
    count_b = grades.count('B')
    count_c = grades.count('C')
    count_d = grades.count('D')

    strict_accuracy = count_a / total_evaluated
    acceptable_accuracy = (count_a + count_b) / total_evaluated

    print(f"Total Items Evaluated: {total_evaluated}")
    print("-" * 25)
    print(f"Grade 'A' (Accurate):         {count_a}")
    print(f"Grade 'B' (Acceptable):       {count_b}")
    print(f"Grade 'C' (Inaccurate):       {count_c}")
    print(f"Grade 'D' (No Dominant):      {count_d}")
    print("-" * 25)
    print(f"Strict Accuracy (A only):       {strict_accuracy:.2%}")
    print(f"Acceptable Accuracy (A + B):    {acceptable_accuracy:.2%}")
    print("---------------------------\n")

### Run Evaluation Workflow

In [None]:
# replace these with your actual file paths
LABELED_CSV_PATH = "food_items_labeled.csv"
EVALUATION_CSV_PATH = "food_items_evaluation.csv"

In [None]:
augment_with_evaluations(
    input_csv=LABELED_CSV_PATH,
    output_csv=EVALUATION_CSV_PATH,
    model=model,
    tokenizer=tokenizer,
)

In [None]:
report_metrics_from_csv(EVALUATION_CSV_PATH)