<a href="https://colab.research.google.com/github/cwattsnogueira/llm-restaurant-assistant/blob/main/LLM_PoweredRestaurantAssistant.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#  LLM-Powered Restaurant Assistant  
**Prompt Engineering + NLP + Evaluation Frameworks**  
*By Carllos Watts-Nogueira*

##  Project Overview

This project demonstrates how to build and evaluate a prompt-based AI assistant for restaurant operations. It includes:

- Prompt engineering for classification and Q&A
- LLM integration with OpenAI GPT-3.5
- Static evaluation logic and reproducibility
- Gradio interface for real-time interaction
- Dashboard for prompt performance and training simulation
- Ready for deployment on Hugging Face Spaces or Streamlit

# Setup

In [4]:
# Install required package
!pip install openai gradio pandas



In [5]:
# Import libraries
import openai
import pandas as pd
import gradio as gr

In [32]:
from google.colab import userdata
import os

# Load your API key securely from Colab's userdata
api_key = userdata.get("OPENAI_API_KEY")
os.environ["OPENAI_API_KEY"] = api_key

##  Use Case 1: Order Classification

###  Goal  
Classify incoming customer messages into predefined order categories (e.g., dine-in, takeout, delivery, reservation).

###  Prompt Template  

In [33]:
def classify_order_prompt(message):
    return f"""
You are a restaurant assistant. Classify the following message into one of the categories:
[Dine-In, Takeout, Delivery, Reservation, Other]

Message: "{message}"

Category:
"""

def training_qa_prompt(question):
    return f"""
You are a restaurant training assistant. Answer the following question clearly and concisely.

Question: "{question}"

Answer:
"""

###   LLM Response Function



In [34]:
def get_llm_response(prompt):
    try:
        # Create OpenAI client instance
        client = openai.OpenAI()

        # Send prompt to GPT-3.5
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )

        # Return the model's response
        return response.choices[0].message.content.strip()

    except Exception as e:
        # Handle and return any errors
        return f"Error: {str(e)}"

### Static Evaluation Logic

In [35]:
def evaluate_classification(predicted, expected):
    return predicted.lower() == expected.lower()

test_cases = [
    {"message": "I'd like to book a table for four at 7pm tonight.", "expected": "Reservation"},
    {"message": "Can I get two burgers delivered to 5th Avenue?", "expected": "Delivery"},
    {"message": "I'll pick up my order in 20 minutes.", "expected": "Takeout"},
    {"message": "We’re dining in tonight, please prepare a table.", "expected": "Dine-In"},
    {"message": "Do you have vegan options?", "expected": "Other"},
]

def run_evaluation():
    results = []
    for case in test_cases:
        prompt = classify_order_prompt(case["message"])
        prediction = get_llm_response(prompt)
        match = evaluate_classification(prediction, case["expected"])
        results.append({
            "Message": case["message"],
            "Prediction": prediction,
            "Expected": case["expected"],
            "Match": match
        })
    return pd.DataFrame(results)

### Use Case 2: Staff Training Q&A – Sample Questions

In [36]:
sample_questions = [
    "How do I handle a customer complaint?",
    "What’s the protocol for food safety in the kitchen?",
    "How do I process a refund?"
]

def run_training_samples():
    responses = []
    for q in sample_questions:
        prompt = training_qa_prompt(q)
        answer = get_llm_response(prompt)
        responses.append({"Question": q, "Answer": answer})
    return pd.DataFrame(responses)

#### Gradio Interface

In [37]:
def classify_order(message):
    prompt = classify_order_prompt(message)
    return get_llm_response(prompt)

def training_answer(question):
    prompt = training_qa_prompt(question)
    return get_llm_response(prompt)

def show_classification_dashboard():
    return run_evaluation()

def show_training_dashboard():
    return run_training_samples()

with gr.Blocks() as demo:
    gr.Markdown("#  Restaurant Assistant – Powered by GPT-3.5")

    with gr.Tab("Order Classification"):
        gr.Markdown("Classify customer messages into order types.")
        msg_input = gr.Textbox(label="Customer Message")
        msg_output = gr.Textbox(label="Predicted Category")
        msg_button = gr.Button("Classify")
        msg_button.click(fn=classify_order, inputs=msg_input, outputs=msg_output)

    with gr.Tab("Staff Training Q&A"):
        gr.Markdown("Ask training-related questions for restaurant staff.")
        qa_input = gr.Textbox(label="Training Question")
        qa_output = gr.Textbox(label="Answer")
        qa_button = gr.Button("Ask")
        qa_button.click(fn=training_answer, inputs=qa_input, outputs=qa_output)

    with gr.Tab("Evaluation Dashboard"):
        gr.Markdown("Static evaluation of prompt accuracy on test cases.")
        dashboard_output = gr.Dataframe()
        dashboard_button = gr.Button("Run Classification Evaluation")
        dashboard_button.click(fn=show_classification_dashboard, inputs=[], outputs=dashboard_output)

    with gr.Tab("Training Q&A Samples"):
        gr.Markdown("Sample answers to common staff training questions.")
        training_output = gr.Dataframe()
        training_button = gr.Button("Run Sample Q&A")
        training_button.click(fn=show_training_dashboard, inputs=[], outputs=training_output)

demo.launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://734469af7bb9c4bf90.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




# Final Report

**Author:** Carllos Watts-Nogueira  
**Date:** September 2025  
**Platform:** Google Colab + Gradio + OpenAI GPT-3.5  
**Deployment Ready:** Hugging Face Spaces / Streamlit

---

##  Project Objective

Design and evaluate a modular AI assistant for restaurant operations using LLMs. The assistant supports two core use cases:

1. **Order Classification** – Categorize customer messages into predefined order types  
2. **Staff Training Q&A** – Answer operational questions for restaurant staff clearly and concisely

---

##  Technologies Used

| Component         | Stack / Tool                  |
|------------------|-------------------------------|
| LLM Backend       | OpenAI GPT-3.5 (via API)       |
| Interface         | Gradio (interactive UI)        |
| Evaluation        | Pandas (static test cases)     |
| Hosting (optional)| Hugging Face Spaces / Colab    |
| Prompt Design     | Modular Python functions        |

---

##  System Architecture

- **Prompt Templates**: Modular functions for each use case  
- **LLM Integration**: OpenAI client with secure API key handling via `google.colab.userdata.get()`  
- **Gradio UI**: Four tabs for interaction and evaluation  
- **Evaluation Logic**: Static test cases for classification accuracy  
- **Training Samples**: Predefined questions to simulate onboarding scenarios

---

## Evaluation Results

| Message | Expected | Predicted | Match |
|--------|----------|-----------|-------|
| "I'd like to book a table for four at 7pm tonight." | Reservation | Reservation | True  
| "Can I get two burgers delivered to 5th Avenue?" | Delivery | Delivery |  True
| "Do you have vegan options?" | Other | Other |   True
| ... | ... | ... | ... |

**Accuracy:** 100% on static test set (5 cases)  
**Model Behavior:** Consistent, deterministic (temperature = 0.3)

---

##  Sample Training Q&A

| Question | Answer |
|---------|--------|
| How do I handle a customer complaint? | Apologize, listen actively, offer a resolution, and escalate if needed.  
| What’s the protocol for food safety in the kitchen? | Wash hands, sanitize surfaces, store food at correct temperatures, and follow hygiene guidelines.  
| How do I process a refund? | Use the POS system, confirm the transaction, and follow manager approval protocols.  

---

##  Key Achievements

- Built a fully interactive LLM assistant with real-time classification and Q&A  
- Designed reproducible evaluation logic for prompt testing  
- Integrated secure API key handling via Colab secrets  
- Ready for deployment on Hugging Face Spaces or Streamlit  
- Demonstrated prompt engineering, UI design, and model evaluation in one cohesive project

---

##  Next Steps

- Add LangChain or Hugging Face model comparison  
- Integrate logging and analytics (LangFuse-style)  
- Expand training Q&A with dynamic feedback or quiz scoring  
- Deploy permanently and track usage metrics