# Solve Business Problems with AI

## Objective
Develop a proof-of-concept application to intelligently process email order requests and customer inquiries for a fashion store. The system should accurately categorize emails as either product inquiries or order requests and generate appropriate responses using the product catalog information and current stock status.

You are encouraged to use AI assistants (like ChatGPT or Claude) and any IDE of your choice to develop your solution. Many modern IDEs (such as PyCharm, or Cursor) can work with Jupiter files directly.

## Task Description

### Inputs

Google Spreadsheet **[Document](https://docs.google.com/spreadsheets/d/14fKHsblfqZfWj3iAaM2oA51TlYfQlFT4WKo52fVaQ9U)** containing:

- **Products**: List of products with fields including product ID, name, category, stock amount, detailed description, and season.

- **Emails**: Sequential list of emails with fields such as email ID, subject, and body.

### Instructions

- Implement all requirements using advanced Large Language Models (LLMs) to handle complex tasks, process extensive data, and generate accurate outputs effectively.
- Use Retrieval-Augmented Generation (RAG) and vector store techniques where applicable to retrieve relevant information and generate responses.
- You are provided with a temporary OpenAI API key granting access to GPT-4o, which has a token quota. Use it wisely or use your own key if preferred.
- Address the requirements in the order listed. Review them in advance to develop a general implementation plan before starting.
- Your deliverables should include:
   - Code developed within this notebook.
   - A single spreadsheet containing results, organized across separate sheets.
   - Comments detailing your thought process.
- You may use additional libraries (e.g., langchain) to streamline the solution. Use libraries appropriately to align with best practices for AI and LLM tools.
- Use the most suitable AI techniques for each task. Note that solving tasks with traditional programming methods will not earn points, as this assessment evaluates your knowledge of LLM tools and best practices.

### Requirements

#### 1. Classify emails
    
Classify each email as either a _**"product inquiry"**_ or an _**"order request"**_. Ensure that the classification accurately reflects the intent of the email.

**Output**: Populate the **email-classification** sheet with columns: email ID, category.

#### 2. Process order requests
1.   Process orders
  - For each order request, verify product availability in stock.
  - If the order can be fulfilled, create a new order line with the status “created”.
  - If the order cannot be fulfilled due to insufficient stock, create a line with the status “out of stock” and include the requested quantity.
  - Update stock levels after processing each order.
  - Record each product request from the email.
  - **Output**: Populate the **order-status** sheet with columns: email ID, product ID, quantity, status (**_"created"_**, **_"out of stock"_**).

2.   Generate responses
  - Create response emails based on the order processing results:
      - If the order is fully processed, inform the customer and provide product details.
      - If the order cannot be fulfilled or is only partially fulfilled, explain the situation, specify the out-of-stock items, and suggest alternatives or options (e.g., waiting for restock).
  - Ensure the email tone is professional and production-ready.
  - **Output**: Populate the **order-response** sheet with columns: email ID, response.

#### 3. Handle product inquiry

Customers may ask general open questions.
  - Respond to product inquiries using relevant information from the product catalog.
  - Ensure your solution scales to handle a full catalog of over 100,000 products without exceeding token limits. Avoid including the entire catalog in the prompt.
  - **Output**: Populate the **inquiry-response** sheet with columns: email ID, response.

## Evaluation Criteria
- **Advanced AI Techniques**: The system should use Retrieval-Augmented Generation (RAG) and vector store techniques to retrieve relevant information from data sources and use it to respond to customer inquiries.
- **Tone Adaptation**: The AI should adapt its tone appropriately based on the context of the customer's inquiry. Responses should be informative and enhance the customer experience.
- **Code Completeness**: All functionalities outlined in the requirements must be fully implemented and operational as described.
- **Code Quality and Clarity**: The code should be well-organized, with clear logic and a structured approach. It should be easy to understand and maintain.
- **Presence of Expected Outputs**: All specified outputs must be correctly generated and saved in the appropriate sheets of the output spreadsheet. Ensure the format of each output matches the requirements—do not add extra columns or sheets.
- **Accuracy of Outputs**: The accuracy of the generated outputs is crucial and will significantly impact the evaluation of your submission.

We look forward to seeing your solution and your approach to solving real-world problems with AI technologies.

# Prerequisites

### Configure OpenAI API Key.

In [None]:
# Install the OpenAI Python package.
!pip install openai httpx==0.27.2

# Install the Llama Index + OpenAI Python packages.
!pip install llama-index-core
!pip install llama_index.embeddings.openai
!pip install llama-index-llms-openai

Collecting httpx==0.27.2
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Downloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: httpx
  Attempting uninstall: httpx
    Found existing installation: httpx 0.28.1
    Uninstalling httpx-0.28.1:
      Successfully uninstalled httpx-0.28.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-genai 1.10.0 requires httpx<1.0.0,>=0.28.1, but you have httpx 0.27.2 which is incompatible.[0m[31m
[0mSuccessfully installed httpx-0.27.2
Collecting llama-index-core
  Downloading llama_index_core-0.12.32-py3-none-any.whl.metadata (2.6 kB)
Collecting banks<3.0.0,>=2.0.0 (from llama-index-core)
  Downloading banks-2.1.2-py3-none-any.whl.metadata (12 kB)
Colle

**IMPORTANT: If you are going to use our custom API Key then make sure that you also use custom base URL as in example below. Otherwise it will not work.**

In [None]:
import pandas as pd

### ===== 1. Read data from Google Sheets

def read_data_frame(document_id, sheet_name):
    export_link = f"https://docs.google.com/spreadsheets/d/{document_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
    return pd.read_csv(export_link)

document_id = "14fKHsblfqZfWj3iAaM2oA51TlYfQlFT4WKo52fVaQ9U"
products_df = read_data_frame(document_id, "products")
emails_df = read_data_frame(document_id, "emails")

# Copy all emails for processing; can filter specific ones during testing
test_emails_df = emails_df.copy()

display(test_emails_df)

Unnamed: 0,email_id,subject,message
0,E001,Leather Wallets,"Hi there, I want to order all the remaining LT..."
1,E002,Buy Vibrant Tote with noise,"Good morning, I'm looking to buy the VBT2345 V..."
2,E003,Need your help,"Hello, I need a new bag to carry my laptop and..."
3,E004,Buy Infinity Scarves Order,"Hi, I'd like to order three to four SFT1098 In..."
4,E005,Inquiry on Cozy Shawl Details,"Good day, For the CSH1098 Cozy Shawl, the desc..."
5,E006,,"Hey there, I was thinking of ordering a pair o..."
6,E007,"Order for Beanies, Slippers","Hi, this is Liz. Please send me 5 CLF2109 Cabl..."
7,E008,Ordering a Versatile Scarf-like item,"Hello, I'd want to order one of your Versatile..."
8,E009,Pregunta Sobre Gorro de Punto Grueso,"Hola, tengo una pregunta sobre el DHN0987 Gorr..."
9,E010,Purchase Retro Sunglasses,"Hello, I would like to order 1 pair of RSG8901..."


In [None]:
### ===== 2. Google Auth and creating spreadsheet

from google.colab import auth
import gspread
from google.auth import default
from gspread_dataframe import set_with_dataframe
import openai
from openai import OpenAI
from llama_index.core import VectorStoreIndex, Document, ServiceContext
from llama_index.embeddings.openai import OpenAIEmbedding
import os

auth.authenticate_user()
creds, _ = default()

gc = gspread.authorize(creds)

# This code goes after creating google client
output_document = gc.create('Solving Business Problems with AI - Output')

# Create 'email-classification' sheet
email_classification_sheet = output_document.add_worksheet(title="email-classification", rows=50, cols=2)
email_classification_sheet.update([['email ID', 'category']], 'A1:B1')

# Create 'order-status' sheet
order_status_sheet = output_document.add_worksheet(title="order-status", rows=50, cols=2)
order_status_sheet.update([['email ID', 'product ID', 'quantity', 'status']], 'A1:D1')

# Create 'order-response' sheet
order_response_sheet = output_document.add_worksheet(title="order-response", rows=50, cols=2)
order_response_sheet.update([['email ID', 'response']], 'A1:B1')

# Create 'inquiry-response' sheet
inquiry_response_sheet = output_document.add_worksheet(title="inquiry-response", rows=50, cols=2)
inquiry_response_sheet.update([['email ID', 'response']], 'A1:B1')



os.environ["OPENAI_BASE_URL"] = 'https://47v4us7kyypinfb5lcligtc3x40ygqbs.lambda-url.us-east-1.on.aws/v1/'
os.environ["OPENAI_API_KEY"] = 'a0BIj000002aPOIMA2'
openai.api_key = os.environ["OPENAI_API_KEY"]
openai.base_url = os.environ["OPENAI_BASE_URL"]

client = OpenAI()



# Important Note :

1. There is a lot of batching logic which was un-neccasary for the completion but I felt that it will be better to call API once and save tokens.
2. The process of data flow is explained in each section
3. There can be some code which slows down the execution but is just for  printing the ouptput - like un-necceary DB Get calls. They can be removed from Production version.
4. Product inquiry responses sometimes include similar product recommendations when the LLM encounters an unknown or unrecognized product. Otherwise, it refers to a specific, known product. Due to the variability in the LLM's responses, the final Excel file in this run doesn't include those recommendation entries, although they do appear in some other runs.
---



# Task 1. Classify emails

### Steps :

1. Create a batch of emails (50 per list) - to optimize API calls and Tokens.
2. Emails will be added to bottom of prompt
3. Output of Prompt will be again Split into categories
4. These categories are then added to the existing dataframe of emails dataframe
5. Then we extract the 2 fields requied in the Google Sheet

#### Note : In rare case LLM might not give categories in desired format. For that prompt has been designed very strictly and verbose.

In [None]:
### ===== 3. Extracting 'Type' and 'Gender' info from products for using in indexes later for better semantic search

import json

product_list = []
for _,product in products_df.iterrows():
    product_list.append({'id':product['product_id'],'name':product['name']})

test_prompt = f"""You are an expert product categorizer.

Your ONLY job is to return the type for each product using the format below.
Do NOT include any introduction, explanation, or markdown formatting like ```.

ONLY output the following format per line:

product_id | <type> | <gender_type>

Use only the product name to determine the type.
Also guess the gender type for each product male | female | both

Here is the list of product names with IDs:

{json.dumps(product_list)}

Remember:
- NO markdown
- NO extra text
- NO explanations
- Just plain lines like: ABC123 | hat | male
"""

response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": test_prompt}],
        temperature=0
    )

output = response.choices[0].message.content

rows = [line.strip().split(' | ') for line in output.strip().split('\n') if line.strip()]

# Create DataFrame
test_df = pd.DataFrame(rows, columns=['product_id', 'type', 'gender'])

merged_product_df = pd.merge(products_df, test_df, on='product_id')
display(merged_product_df)

Unnamed: 0,product_id,name,category,description,stock,seasons,price,type,gender
0,RSG8901,Retro Sunglasses,Accessories,Transport yourself back in time with our retro...,1,"Spring, Summer",26.99,sunglasses,both
1,SWL2345,Sleek Wallet,Accessories,Keep your essentials organized and secure with...,5,All seasons,30.00,wallet,both
2,VSC6789,Versatile Scarf,Accessories,Add a touch of versatility to your wardrobe wi...,6,"Spring, Fall",23.00,scarf,both
3,CSH1098,Cozy Shawl,Accessories,Wrap yourself in comfort with our cozy shawl. ...,3,"Fall, Winter",22.00,shawl,female
4,CHN0987,Chunky Knit Beanie,Accessories,Keep your head toasty with our chunky knit bea...,2,"Fall, Winter",22.00,beanie,both
...,...,...,...,...,...,...,...,...,...
94,SND7654,Strappy Sandals,Women's Shoes,Step into summer with our strappy sandals. The...,8,"Spring, Summer",27.00,sandals,female
95,CHK8901,Chunky Sneakers,Women's Shoes,Step into trendy style with our chunky sneaker...,1,"Spring, Fall",42.00,sneakers,both
96,MLR0123,Mule Loafers,Women's Shoes,Slip into effortless style with our mule loafe...,2,"Spring, Fall",47.00,loafers,both
97,CLG4567,Clog Sandals,Women's Shoes,Step into retro-inspired style with our clog s...,3,"Spring, Summer",48.00,sandals,female


In [None]:
import re

### ===== 4. Classification of Emails and adding to spreadsheet

# Create batches of 50 emails (list) - only "message" part of email is taken for classification

batch_size = 50
batches = [test_emails_df['message'][i:i + batch_size] for i in range(0, len(test_emails_df), batch_size)]

print('=== Batches === \n ')
display(batches)

classification_list = []

# function to concat prompt of 1 batch together

def build_batch_prompt(email_list):
    prompt = """
You are a classification assistant for a retail customer support system. Your task is to analyze each of the following customer emails and classify them into one of two categories:

1. "order request" – The customer clearly wants to place an order (mentions product names, IDs, quantities, phrases like "please send", "I'd like to order", "ship to", etc.).

2. "product inquiry" – The customer is only asking for product info, availability, options, or is unsure about ordering.

IMPORTANT:
- If an email shows any clear intent to make a purchase, even informally, classify it as **order request**.
- If it’s vague or exploratory, classify as **product inquiry**.

Output the results as a list of classifications, one per email, in the format:
1. order request
2. product inquiry
...

Here are the emails:
"""
    for i, email in enumerate(email_list, start=1):
        prompt += f"\n{i}. {email.strip()}\n"
    prompt += "\nClassifications:"
    return prompt

# Go through each batch and get classifications from Open AI completion API

for batch in batches:
    prompt = build_batch_prompt(batch.tolist())
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )

    output = response.choices[0].message.content

    # Extract lines like "1. order request", "2. product inquiry"
    classifications = re.findall(r'\d+\.\s*(order request|product inquiry)', output, re.IGNORECASE)

    classification_list.extend([c.lower() for c in classifications])



print('\n=== Categories Obtained === \n ')
display(classification_list)


# Attach back to DataFrame
test_emails_df['category'] = classification_list

# Create new Dataframe with only 2 columns
email_category_list_df = test_emails_df[["email_id", "category"]]


print('\n === Email, Category Dataframe === \n')
print(email_category_list_df)


# Rename colums as per the output google sheet
email_category_list_df = email_category_list_df.rename(
    columns={"email_id": "email ID", "category": "Category"}
)

#### Add to Google Sheet of 'email-classification'
set_with_dataframe(email_classification_sheet, email_category_list_df)


=== Batches === 
 


[0     Hi there, I want to order all the remaining LT...
 1     Good morning, I'm looking to buy the VBT2345 V...
 2     Hello, I need a new bag to carry my laptop and...
 3     Hi, I'd like to order three to four SFT1098 In...
 4     Good day, For the CSH1098 Cozy Shawl, the desc...
 5     Hey there, I was thinking of ordering a pair o...
 6     Hi, this is Liz. Please send me 5 CLF2109 Cabl...
 7     Hello, I'd want to order one of your Versatile...
 8     Hola, tengo una pregunta sobre el DHN0987 Gorr...
 9     Hello, I would like to order 1 pair of RSG8901...
 10    Hi there, The description for the RSG8901 Retr...
 11    Hey, hope you're doing well. Last year for my ...
 12    Hi, my name is Marco and I need to buy a pair ...
 13         Please send me 1 Sleek Wallet. Thanks, Johny
 14    Good morning, I'm looking for a nice bag for m...
 15    Hello, I'm looking for a dress for a summer we...
 16    Hi there I want to place an order for that pop...
 17    Hello I'd like to buy 2 


=== Categories Obtained === 
 


['order request',
 'order request',
 'product inquiry',
 'order request',
 'product inquiry',
 'product inquiry',
 'order request',
 'order request',
 'product inquiry',
 'order request',
 'product inquiry',
 'product inquiry',
 'order request',
 'order request',
 'product inquiry',
 'product inquiry',
 'order request',
 'order request',
 'order request',
 'product inquiry',
 'product inquiry',
 'order request',
 'order request']


 === Email, Category Dataframe === 

   email_id         category
0      E001    order request
1      E002    order request
2      E003  product inquiry
3      E004    order request
4      E005  product inquiry
5      E006  product inquiry
6      E007    order request
7      E008    order request
8      E009  product inquiry
9      E010    order request
10     E011  product inquiry
11     E012  product inquiry
12     E013    order request
13     E014    order request
14     E015  product inquiry
15     E016  product inquiry
16     E017    order request
17     E018    order request
18     E019    order request
19     E020  product inquiry
20     E021  product inquiry
21     E022    order request
22     E023    order request



# Task 2.1 Process Order Requests - Process Orders

### Steps:

1. Prepare **LlamaIndex** for `emails` and `products` data using text and metadata (using persistent storage for better performance).  
   _[Section #5, #6, #7]_

2. Create **query engines** with different filters for `order-request` and `product-inquiry`, so that each query engine only works with a specific category of emails.  
   _[Section #8]_

3. Extract `product ID` and `quantity` from each email where possible using the query engine + LLM.  
   _[Section #9]_

4. Categorize orders into `correct` and `incorrect` by checking whether a valid `Product ID` was extracted.  
   _[Section #10]_

5. Initialize the **SQL DB (SQLite)** for DB-related operations.  
   _[Section #11]_

6. Set up **SQLAlchemy** — define the Product model and queries for various DB operations using the **NLSQL query retriever**.  
   _[Section #12]_

7. Perform a **fuzzy search** for Product IDs in `incorrect` orders and attempt to reclassify them as `correct`. Final valid orders are kept as `correct`, and the remaining ones are marked as `confusing_orders` for later processing.  
   _[Section #13]_

8. **Fulfill orders** by checking available `stock` and updating it. Then take the `final_order_df` and add it to the Google Sheet `order-status`.   _[Section #14]_

---

### Note:

- `Confusing_orders` are **not** included in the `order-status` sheet, but they **will** be used in the `order-response` sheet, as we still email the customer regarding the products we couldn't identify or interpret from their message.


In [None]:
import os

### ===== 5. Prepare Text, Metadata (document data) for creating a LlamaIndex => emailIndex
email_texts = []
email_metadata_list = []

for _, row in test_emails_df.iterrows():
    text = f"Email Id: {row['email_id']}\nSubject: {row['subject']}\nMessage: {row['message']}\nCategory: {row['category']}"
    email_texts.append(text)
    email_metadata_list.append(
        {"email_id": row["email_id"], "category": row["category"]}
    )


### ===== 6. Prepare Text, Metadata (document data) for creating a LlamaIndex => productIndex

product_texts = []
product_metadata_list = []
for _, row in merged_product_df.iterrows():
     name, category, description, seasons, price, product_id, typeof,gender = row['name'], row['category'], row['description'], row['seasons'], row['price'], row['product_id'], row['type'],row['gender']
     product_texts.append(f"type: {typeof}\ngender: {gender}\ncategory: {category}\ndescription: {description}\nseasons: {seasons}")
     product_metadata_list.append({
        "product_id": product_id,
        "name": name,
        "description": description,
        "category": category,
        "seasons": seasons,
        "price": price,
        "type":typeof,
        "gender":gender
    })

### ===== 7. Create LlamaIndex => emailIndex and productIndex [ for simplicity we used persitent storage context for vector stores, no 3rd party vector DB is used]

from llama_index.core import (
    StorageContext,
    load_index_from_storage,
    Settings,
    VectorStoreIndex,
    Document,
)
from llama_index.llms.openai import OpenAI
from llama_index.core.vector_stores import (
    MetadataFilter,
    MetadataFilters,
)
from llama_index.embeddings.openai import OpenAIEmbedding

# Setting for Embedding and LLM model
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.llm = OpenAI(model="gpt-4o")

email_docs = [
    Document(text=t, metadata=m) for t, m in zip(email_texts, email_metadata_list)
]

product_docs = [
    Document(text=t, metadata=m, id=m["product_id"])
    for t, m in zip(product_texts, product_metadata_list)
]

email_index_id = "email_index"
product_index_id = "product_index"
email_index_dir = "storage_email"
product_index_dir = "storage_product"

if not os.path.exists(email_index_dir):
    # email index
    emailIndex = VectorStoreIndex.from_documents(email_docs)
    emailIndex.set_index_id(email_index_id)
    emailIndex.storage_context.persist(f"./{email_index_dir}")
else:
    # rebuild storage context
    email_storage_context = StorageContext.from_defaults(persist_dir=email_index_dir)
    # load index
    emailIndex = load_index_from_storage(email_storage_context, index_id=email_index_id)

if not os.path.exists(product_index_dir):
    # product index
    productIndex = VectorStoreIndex.from_documents(product_docs)
    productIndex.set_index_id(product_index_id)
    productIndex.storage_context.persist(f"./{product_index_dir}")
else:
    # rebuild storage context
    product_storage_context = StorageContext.from_defaults(persist_dir=product_index_dir)
    # load index
    productIndex = load_index_from_storage(product_storage_context, index_id=product_index_id)



In [None]:
### ===== 8. Query Engine with filters for 'order request' and 'product inquiry'

ORDER_REQUEST = "order request"
PRODUCT_INQUIRY = "product inquiry"

filter_order_request = MetadataFilters(
    filters=[
        MetadataFilter(key="category", value=ORDER_REQUEST),
    ],
)

filter_product_inquiry = MetadataFilters(
    filters=[
        MetadataFilter(key="category", value=PRODUCT_INQUIRY),
    ],
)

email_query_engine_order_request = emailIndex.as_query_engine(filters=filter_order_request, similarity_top_k=50)

email_query_engine_product_inquiry = emailIndex.as_query_engine(filters=filter_product_inquiry, similarity_top_k=50)


### ===== 9. Prompt for Getting Product ID and Quantity from Emails which are of type 'order request'.

## Prompt is again batched and combines all emails ( to save multiple API requests)
## for simple POC - we only considered one batch [ otherwise we have to add FOR loop of dividing in batches]

product_id_quantity_prompt = """**Task**: For each document:
1. Product ID (format: 3 letters + 4 digits, e.g., ABC1234)
2. Quantity requested (numeric value)

**Instructions**:
- For each document:
  a. Identify product IDs matching the pattern 3 letters 4 digits, might have spaces etc.
  b. Extract quantities (convert written numbers to digits)
  c. If quantity unspecified, make it 1
  d. only return single number in quantity, choose higher one
  e. There can be multiple product IDs in text
  f. format the Product ID (if not in proper format) to have 3 upper case letters and 4 digits e.g. ABC1234
**Output Format**:


For each document and for each Product ID reference in it, return EXACTLY :
Email ID: [ID]
Product ID: [ID, if product ID not available return product name exactly as name mentioned in email - don't add new text]
Quantity: [number]

example :
Email ID: E003 | Product ID: SFT1098 | Quantity: 3

Note: There can be multiple output lines for email which have more than 1 product IDs
"""

extracted_products_from_email = email_query_engine_order_request.query(product_id_quantity_prompt)
print('=== LLM extracted Product IDs (some are not IDs but will be replaced in further steps) ===\n')
print(extracted_products_from_email)

=== LLM extracted Product IDs (some are not IDs but will be replaced in further steps) ===

Email ID: E017 | Product ID: popular item | Quantity: 1  
Email ID: E007 | Product ID: CLF2109 | Quantity: 5  
Email ID: E007 | Product ID: FZZ1098 | Quantity: 2  
Email ID: E023 | Product ID: CGN2345 | Quantity: 5  
Email ID: E008 | Product ID: Versatile Scarf | Quantity: 1  
Email ID: E004 | Product ID: SFT1098 | Quantity: 4  
Email ID: E001 | Product ID: LTH0976 | Quantity: 1  
Email ID: E014 | Product ID: Sleek Wallet | Quantity: 1  
Email ID: E019 | Product ID: CBT8901 | Quantity: 1  
Email ID: E022 | Product ID: amazing bags | Quantity: 3  
Email ID: E018 | Product ID: RSG8901 | Quantity: 2  
Email ID: E010 | Product ID: RSG8901 | Quantity: 1  
Email ID: E002 | Product ID: VBT2345 | Quantity: 1  
Email ID: E013 | Product ID: slide sandals | Quantity: 1  


In [None]:
### =====  10. Convert Response to Order Dataframe [ Separate for correct format of ID and incorrect format]
import re

pattern = r"Email ID:\s*(?P<Email_ID>\w+)\s*\|\s*Product ID:\s*(?P<Product_ID>[A-Z]{3}[0-9]{4})\s*\|\s*Quantity:\s*(?P<Quantity>\d+)"
product_name_pattern = r"Email ID:\s*(?P<Email_ID>\w+)\s*\|\s*Product ID:\s*(?P<Product_ID>[a-zA-Z0-9\s-]+)\s\|\s*Quantity:\s*(?P<Quantity>\d+)"

correct_orders = []
incorrect_orders = []

lines = extracted_products_from_email.response.splitlines()
for text in lines:
    match = re.search(pattern, text)
    if match:
        correct_orders.append(
            {
                "Email_ID": match.group("Email_ID"),
                "Product_ID": match.group("Product_ID"),
                "Quantity": int(match.group("Quantity")),
            }
        )
    else:
      # In case Product ID is not found capture the whole text as Product ID (but in a different list : incorrect orders)
        m = re.search(product_name_pattern,text)
        if m :
            incorrect_orders.append(
                    {
                        "Email_ID": m.group("Email_ID"),
                        "Product_ID": m.group("Product_ID"),
                        "Quantity": int(m.group("Quantity")),
                    }
                )

order_df = pd.DataFrame(correct_orders)
print('=== Correct formatted Orders === \n')
print(order_df)

incorrect_order_df = pd.DataFrame(incorrect_orders)
print('\n === Incorrect formatted Orders ===\n')
print(incorrect_order_df)

# For ID matches where the product name is incorrect, we will find similar products using LlamaIndex.
# For this in next section, we'll set up a SQL database with ORM and implement a language-based SQL retriever.
# Using SQL ensures consistency for inventory operations like stock decrement.
# Note: While we're using SQLite for this POC, the solution can be easily migrated to any SQL database or cloud platform.

=== Correct formatted Orders === 

  Email_ID Product_ID  Quantity
0     E007    CLF2109         5
1     E007    FZZ1098         2
2     E023    CGN2345         5
3     E004    SFT1098         4
4     E001    LTH0976         1
5     E019    CBT8901         1
6     E018    RSG8901         2
7     E010    RSG8901         1
8     E002    VBT2345         1

 === Incorrect formatted Orders ===

  Email_ID       Product_ID  Quantity
0     E017     popular item         1
1     E008  Versatile Scarf         1
2     E014     Sleek Wallet         1
3     E022     amazing bags         3
4     E013    slide sandals         1


In [None]:
### =====  11. SQL Lite database for Products

import sqlite3

database = "products.db"

conn = sqlite3.connect(database)
# Insert DataFrame to SQLite
merged_product_df.to_sql('products', conn, if_exists='replace', index=False)

print("\nData inserted successfully!")

# Verify the data was inserted correctly
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
print("\nTables in database:", cursor.fetchall())

cursor.execute("SELECT COUNT(*) FROM products;")
print("\nNumber of rows in products table:", cursor.fetchone()[0])

conn.close()


Data inserted successfully!

Tables in database: [('products',)]

Number of rows in products table: 99


In [None]:
### ===== 12. DB SQL alchemy setup and Query functions

from operator import and_
from sqlalchemy import create_engine, select, text, update
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, Session
from sqlalchemy import String, Integer, Float, Text
from typing import Optional
from sqlalchemy.exc import SQLAlchemyError
from llama_index.core.retrievers import NLSQLRetriever
from llama_index.core import SQLDatabase
from llama_index.core.response.notebook_utils import display_source_node

# Base class for declarative models
class Base(DeclarativeBase):
    pass

# Product model
class Product(Base):
    __tablename__ = "products"

    product_id: Mapped[str] = mapped_column(String, primary_key=True)
    name: Mapped[str] = mapped_column(String)
    category: Mapped[Optional[str]] = mapped_column(String)
    description: Mapped[Optional[str]] = mapped_column(Text)
    stock: Mapped[Optional[int]] = mapped_column(Integer)
    seasons: Mapped[Optional[str]] = mapped_column(String)
    price: Mapped[Optional[float]] = mapped_column(Float)
    type: Mapped[Optional[str]] = mapped_column(String)
    gender: Mapped[Optional[str]] = mapped_column(String)
    def __repr__(self):
        return f"Product(id={self.product_id!r}, name={self.name!r}, price={self.price}, stock={self.stock})"


# Database engine setup
engine = create_engine("sqlite:///products.db")

# SQLDatabase for NLSQL retriever
sql_database = SQLDatabase(engine, include_tables=["products"])

# SQL based retriever on products table
nl_sql_retriever = NLSQLRetriever(
    sql_database, tables=["products"], return_raw=False
)

# DB Queries

# Product by Product ID - direct retrivel

def get_product_by_id(product_id):
    result = nl_sql_retriever.retrieve(
    f"Return product complete details with product_id = {product_id}"
    )
    metadata = result[0].node.metadata
    return metadata

# Product by Product ID - only return if stock is available in required quantity

def get_product_by_id_with_stock_available(product_id, quantity):
    result = nl_sql_retriever.retrieve(
    f"Return product complete details with product_id = {product_id} if stock >= {quantity}"
    )
    if(len(result)>0):
        metadata = result[0].node.metadata
        return metadata
    else:
        return None

# Product name based semantic search

def fuzzy_search_product(product_name):
    result = nl_sql_retriever.retrieve(
    f"Return product details of product whose name is most similar to : {product_name}"
    )
    if(len(result)>0):
        metadata = result[0].node.metadata
        return metadata
    else:
        return None


# Decremet prdocut stock if possible

def update_product_stock(product_id,quantity):
    return nl_sql_retriever.retrieve(
    f"decrement the product stock by {quantity} where product_id : {product_id} if stock >= {quantity}"
    )


# Given some product find its alternatives
def get_similar_products(target_product_id: str, target_product_name: str, target_product_cat:str, retriever) -> list:

    retrieved_nodes = retriever.retrieve(f"find product similar to :{target_product_name}, usually it should match the type of item, which is suffix in product name. Also match product category: {target_product_cat}" )

    similar_products = []
    for node in retrieved_nodes:
        product_data = node.metadata
         # DISCARD : if same product_id comes as similar product
        if product_data['product_id'] != target_product_id:
            similar_products.append({
                'product': product_data,
                'score': node.score
            })

    return similar_products[:2]  # Return top 2 similar products

def extract_type(text):
    match = re.search(r'\btype\s*:\s*([^\s,]+)', text, re.IGNORECASE)
    return match.group(1)

# Given some text context - find if we can find some relevant products [ We will need a 'Index as Retriver' for this function which we will declare later]
def search_for_product(text: str,  retriever) -> list:

    inferred_type = extract_type(text)
    retrieved_nodes = retriever.retrieve(
    f"""Find retail products of the same type as: {text}.
Match based on the 'type' field first (e.g., hat, wallet, scarf) then 'gender' information.
Then, product name.
"""
)


    similar_products = []
    for node in retrieved_nodes:
        product_data = node.metadata
        similar_products.append({
            'product': product_data,
            'score': node.score
        })

    return similar_products[:3]


In [None]:
# Example
print('=== product with ID ===')
display(get_product_by_id('CLF2109'))
print('\n=== product with ID and check stock for 2 ===\n')
display(get_product_by_id_with_stock_available('CLF2109',2))
print('\n=== product with ID and check stock for 5 ===\n')
display(get_product_by_id_with_stock_available('CLF2109',5))


=== product with ID ===


{'product_id': 'CLF2109',
 'name': 'Cable Knit Beanie',
 'category': 'Accessories',
 'description': 'Bundle up in our cable knit beanie. Knitted from premium wool, this classic beanie features a timeless cable knit pattern and a soft, stretchy fit. A versatile accessory for adding a touch of warmth and texture to your cold-weather looks.',
 'stock': 2,
 'seasons': 'Winter',
 'price': 16.0,
 'type': 'beanie',
 'gender': 'both'}


=== product with ID and check stock for 2 ===



{'product_id': 'CLF2109',
 'name': 'Cable Knit Beanie',
 'category': 'Accessories',
 'description': 'Bundle up in our cable knit beanie. Knitted from premium wool, this classic beanie features a timeless cable knit pattern and a soft, stretchy fit. A versatile accessory for adding a touch of warmth and texture to your cold-weather looks.',
 'stock': 2,
 'seasons': 'Winter',
 'price': 16.0,
 'type': 'beanie',
 'gender': 'both'}


=== product with ID and check stock for 5 ===



None

In [None]:
### ===== 13. Fuzzy Search for correct product ID names and replace data.

## final_order_df = This will collect all orders with IDs after fuzzy search - we will try to fulfil 'order request' of only these orders [ used in Section #14]
## confusing_order_df = This will collect are orders with still not having IDs - we will respond with a different mail to these customers [ used in Section #19]

final_order_df = order_df.copy()

confusing_order_df = pd.DataFrame(columns=["Email_ID", "Product_ID", "Quantity"])

resolved_rows = []
unresolved_rows = []

for _, row in incorrect_order_df.iterrows():
    result = fuzzy_search_product(row['Product_ID']) # fuzzy search
    new_row = row.copy()
    if result:
        new_row['Product_ID'] = result['product_id']
        resolved_rows.append(new_row)
    else:
        unresolved_rows.append(new_row)

# Append resolved rows to final_order_df
final_order_df = pd.concat([final_order_df, pd.DataFrame(resolved_rows)], ignore_index=True)
confusing_order_df = pd.DataFrame(unresolved_rows)

print('=== FINAL ORDERs ===\n')
display(final_order_df)
print('\n === CONFUSING ORDERS ===\n')
display(confusing_order_df)


=== FINAL ORDERs ===



Unnamed: 0,Email_ID,Product_ID,Quantity
0,E007,CLF2109,5
1,E007,FZZ1098,2
2,E023,CGN2345,5
3,E004,SFT1098,4
4,E001,LTH0976,1
5,E019,CBT8901,1
6,E018,RSG8901,2
7,E010,RSG8901,1
8,E002,VBT2345,1
9,E008,VSC6789,1



 === CONFUSING ORDERS ===



Unnamed: 0,Email_ID,Product_ID,Quantity
0,E017,popular item,1
3,E022,amazing bags,3


In [None]:
### ===== 14. Fulfill orders, check if stock is available in requested quantity mark 'created' or 'out of stock' accordingly



status = []

# Loop through all "final orders"

for index, order in final_order_df.iterrows():

    email_id = order["Email_ID"]
    product_id = order["Product_ID"]
    quantity = order["Quantity"]

    current_product = get_product_by_id(product_id)

    print(f"\nStock Available at START {product_id}:", current_product['stock'])

     # Check if stock is sufficient for each product

    res = get_product_by_id_with_stock_available(product_id, quantity)
    if res:
        update_product_stock(product_id, quantity) # decrement quantity of stock
        print(f"This order item is fulfilled", email_id, product_id, quantity)
         # Mark orders as "created"
        status.append("created")
    else:
        print(
            f"This order item is out of stock ",
            email_id,
            product_id,
            f"item reqd quantity : {quantity} but only {current_product['stock']} is/are available",
        )
         # Mark orders as "out of stock"
        status.append("out of stock")

    print(f"Stock Remaining after processing {product_id}:", get_product_by_id(product_id)['stock'])

## This 'final_order_df' now contains are field required for output of Task 2.1


final_order_df["status"] = status
final_order_df = final_order_df.rename(
    columns={"Email_ID": "email ID", "Product_ID": "product Id", "Quantity": "quantity"}
)

print('\n === Processed Order requests with status ===\n')
display(final_order_df)

# Add to Google Sheet 'order-status'
set_with_dataframe(order_status_sheet, final_order_df)



Stock Available at START CLF2109: 2
This order item is out of stock  E007 CLF2109 item reqd quantity : 5 but only 2 is/are available
Stock Remaining after processing CLF2109: 2

Stock Available at START FZZ1098: 2
This order item is fulfilled E007 FZZ1098 2
Stock Remaining after processing FZZ1098: 0

Stock Available at START CGN2345: 2
This order item is out of stock  E023 CGN2345 item reqd quantity : 5 but only 2 is/are available
Stock Remaining after processing CGN2345: 2

Stock Available at START SFT1098: 8
This order item is fulfilled E004 SFT1098 4
Stock Remaining after processing SFT1098: 4

Stock Available at START LTH0976: 4
This order item is fulfilled E001 LTH0976 1
Stock Remaining after processing LTH0976: 3

Stock Available at START CBT8901: 2
This order item is fulfilled E019 CBT8901 1
Stock Remaining after processing CBT8901: 1

Stock Available at START RSG8901: 1
This order item is out of stock  E018 RSG8901 item reqd quantity : 2 but only 1 is/are available
Stock Rema

Unnamed: 0,email ID,product Id,quantity,status
0,E007,CLF2109,5,out of stock
1,E007,FZZ1098,2,created
2,E023,CGN2345,5,out of stock
3,E004,SFT1098,4,created
4,E001,LTH0976,1,created
5,E019,CBT8901,1,created
6,E018,RSG8901,2,out of stock
7,E010,RSG8901,1,created
8,E002,VBT2345,1,created
9,E008,VSC6789,1,created



# Task 2.2 Process Order Requests - Email Responses


### Steps :

1. Group order items by email ID — combining multiple order requests from the same email ID.  
   _[Section #15]_

2. Use a **similar product suggestion** function with the product index as a retriever for semantic search.  
   _[Section #16]_

3. Create a **prompt** for email responses for `order requests` that were successfully processed, and call the LLM.  
   _[Section #17]_

4. Extract email responses in proper format into a DataFrame for Google Sheets.  
   _[Section #18]_

5. Create a **prompt** for email responses for `confusing orders`, and call the LLM.  
   _[Section #19]_

6. Extract email responses and **combine** them with the previous DataFrame (Step 4) — to be added to the Google Sheet `order-response`.  
   _[Section #20]_


In [None]:
### ===== 15. Group Order items by email ID - combining multiple order request from same Email ID
# This product grouping enables our email response system to include information
# about all relevant products mentioned in the customer's inquiry.

grouped_by_email = final_order_df.groupby('email ID')

all_orders = []
for email, group in grouped_by_email:

    # extract full email record using email_id
    email_info = emails_df[emails_df['email_id']==email].to_dict('records')[0]

    complete_order_info = {'email':email_info, 'products':[]}

    for index, row in group.iterrows():
        product = get_product_by_id(row['product Id'])
        status = row['status']
        quantity = row['quantity']
        complete_order_info['products'].append({'info':product,'status':status,'quantity':quantity})
    all_orders.append(complete_order_info)

print('\n === All Processed Orders in List(dict) format Collected for Email response === \n')
display(all_orders)

# Example of all_orders formatted data

# [
#     {
#     email : {email_id: .., subject: .. , message: ....},
#     products : [
#         0 :  { info : <product_detail_dictionary>, status:'created', quantity:5}
#         1 :  { info : <product_detail_dictionary>, status:'out of stock', quantity:5}
#     ]
#    },
#     email : {email_id: .., subject: .. , message: ....},
#     products : [
#         0 :  { info : <product_detail_dictionary>, status:'created', quantity:5}
#     ]
#    }
# ]



 === All Processed Orders in List(dict) format Collected for Email response === 



[{'email': {'email_id': 'E001',
   'subject': 'Leather Wallets',
   'message': "Hi there, I want to order all the remaining LTH0976 Leather Bifold Wallets you have in stock. I'm opening up a small boutique shop and these would be perfect for my inventory. Thank you!"},
  'products': [{'info': {'product_id': 'LTH0976',
     'name': 'Leather Bifold Wallet',
     'category': 'Accessories',
     'description': 'Upgrade your everyday carry with our leather bifold wallet. Crafted from premium, full-grain leather, this sleek wallet features multiple card slots, a billfold compartment, and a timeless, minimalist design. A sophisticated choice for any occasion.',
     'stock': 3,
     'seasons': 'All seasons',
     'price': 21.0,
     'type': 'wallet',
     'gender': 'male'},
    'status': 'created',
    'quantity': 1}]},
 {'email': {'email_id': 'E002',
   'subject': 'Buy Vibrant Tote with noise',
   'message': "Good morning, I'm looking to buy the VBT2345 Vibrant Tote bag. My name is Jessica a

In [None]:
### ===== 16. A function for Finding alternative for product which are 'OUT OF STOCK' based on similarity and stock availability

## 'retriver' param is llamaIndex for product used as a Retriever
def get_alternatives(product_id, product_name, product_cat, quantity, retriever):

# Similar product recommendations
    recommendations = []
    similar_items = get_similar_products(
        product_id, product_name, product_cat,retriever
    )

    for idx, item in enumerate(similar_items, 1):
        product_info = item["product"]

        # CHECK quantity is in 'Stock'
        if get_product_by_id_with_stock_available(product_info["product_id"], quantity):
            recommendations.append(product_info)

    return recommendations



In [None]:
### ===== 17. Creating a PROMPT for Email Response for 'order requests' processed

## prepare JSON like Dump which has info of all emails
## this contains both 'out of stock' and 'created' order-requests grouped by email ID
## we will add info about Alternative product recommendation for 'out of stock' ones


import json

product_ret = productIndex.as_retriever(similarity_top_k=3)

all_order_json_dump = []

for order in all_orders:
    products = order['products']

    shipped = [product for product in products if product["status"] == "created"]
    out_of_stock = [product for product in products if product["status"] == "out of stock"]

    ## Adding alternative product suggestions to 'out of stock' products
    for product in out_of_stock:
        product['info']['alternative'] = get_alternatives(product['info']["product_id"], product['info']["name"], product['info']["category"],product['quantity'],product_ret)

    ## This will be format of data for 1 email : Email content + Shipped Item List + Out of Stock item list
    order_json =  {
          "customer_message": order['email'],
          "shipped_items": shipped,
          "out_of_stock_items": out_of_stock
        }
    ## Complete list of data for all emails
    all_order_json_dump.append(order_json)



## Single Prompt which will have multiple emails info in JSON dump. So we can call only API once
## In full application we need to create JSON in batch of 50 obejcts etc. But here we just used all in one for simplicity

order_success_prompt = f"""You are an email assistant for an online retail store. Your task is to generate personalized order confirmation emails for multiple customers based on their order data, provided as a list of JSON objects.

Each JSON object contains:
- The original customer message
- A list of items that have been shipped
- A list of items that are out of stock, including availability and possible alternatives

---

For each JSON input, generate a personalized email that:

1. Greets the customer using their name if available in the message; otherwise, use "Dear Customer."
2. Clearly states whether the order is fully shipped, partially shipped, or not shipped at all, based on the number of items shipped and unavailable.
3. Lists each shipped item with the following details:
   - Product name
   - Product ID
   - Price per unit
   - Quantity - this should come from quantity mentioned
   - Subtotal (price × quantity)
4. Mentions each out-of-stock item. If stock > 0 but < requested quantity, indicate how many units are still available and offer to ship those.
5. Suggests alternatives for out-of-stock items, including:
   - Product name
   - Product ID
   - Price (if available)
   - A short description (if relevant)
6. If the order is not shipped at all, provide more detailed descriptions of suggested alternatives, and invite the customer to explore related items from our catalog in the same category.
7. If the order is not fulfilled or only partially fulfilled, explain the situation, list the unavailable items, and offer choices such as waiting for restock or selecting alternatives.
8. Reflect a warm, enthusiastic tone and acknowledge any context from the original customer message (e.g., gifts, holidays, shops).
9. Calculate and display the total cost of shipped items.
10. Keep the email friendly, informative, and concise, ending with a thank-you note.

---

Format your response as a list of email responses using the structure below:

---
Email ID: <email_id>
Subject: Your Order is < Shipped | Not Shipped | Partially Shipped >
<personalized email content>
---

Here is the list of JSON inputs:
{json.dumps(all_order_json_dump)}

"""

print('\n === PROMPT with JSON data for Email Response [Order Requests] ====\n')
print(order_success_prompt)

email_response_all = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": order_success_prompt}],
    temperature=0
)
print(email_response_all.choices[0].message.content)



 === PROMPT with JSON data for Email Response [Order Requests] ====

You are an email assistant for an online retail store. Your task is to generate personalized order confirmation emails for multiple customers based on their order data, provided as a list of JSON objects.

Each JSON object contains:
- The original customer message
- A list of items that have been shipped
- A list of items that are out of stock, including availability and possible alternatives

---

For each JSON input, generate a personalized email that:

1. Greets the customer using their name if available in the message; otherwise, use "Dear Customer."
2. Clearly states whether the order is fully shipped, partially shipped, or not shipped at all, based on the number of items shipped and unavailable.
3. Lists each shipped item with the following details:
   - Product name
   - Product ID
   - Price per unit
   - Quantity - this should come from quantity mentioned
   - Subtotal (price × quantity)
4. Mentions each out

In [None]:
### ===== 18. Splitting it into a Dataframe to be used in final Google Sheet

import re
import pandas as pd

def extract_email_blocks_to_df(response_text: str) -> pd.DataFrame:
    """
    Extracts email responses from a structured GPT output and converts them to a DataFrame.

    Args:
        response_text (str): Raw GPT response containing blocks like:
            ---
            Email ID: EXXX
            <response body>
            ---

    Returns:
        pd.DataFrame: DataFrame with columns ['email ID', 'response']
    """
    email_blocks = re.findall(
        r"---\s*Email ID: (E\d+)\s+(.*?)(?=---|$)",
        response_text,
        re.DOTALL
    )

    emails_list = [
        {"email ID": email_id.strip(), "response": response.strip()}
        for email_id, response in email_blocks
    ]

    emails_df = pd.DataFrame(emails_list)
    return emails_df



order_request_emails_responses_df = extract_email_blocks_to_df(email_response_all.choices[0].message.content)

display(order_request_emails_responses_df)


Unnamed: 0,email ID,response
0,E001,Subject: Your Order is Partially Shipped \n\n...
1,E002,Subject: Your Order is Shipped \n\nGood morni...
2,E004,"Subject: Your Order is Shipped \n\nHi,\n\nTha..."
3,E007,Subject: Your Order is Partially Shipped \n\n...
4,E008,"Subject: Your Order is Shipped \n\nHello,\n\n..."
5,E010,"Subject: Your Order is Shipped \n\nHello,\n\n..."
6,E013,"Subject: Your Order is Shipped \n\nHi Marco,\..."
7,E014,"Subject: Your Order is Shipped \n\nHi Johny,\..."
8,E018,Subject: Your Order is Not Shipped \n\nDear C...
9,E019,"Subject: Your Order is Shipped \n\nHey there,..."


In [None]:
### ===== 19. Finally, responsd to few emails [Confusing emails] which are 'order-request' but no product name could be fetched from message

grouped_by_email_confusing = confusing_order_df.groupby('Email_ID')

confusing_order_requests = []
for email, group in grouped_by_email_confusing:
    email_info = emails_df[emails_df['email_id']==email].to_dict('records')[0]
    order_info = {'email':email_info, 'confusing_item':[]}
    for index, row in group.iterrows():
        product = row['Product_ID']
        order_info['confusing_item'].append(product)
    confusing_order_requests.append(order_info)


confusing_emails_response_prompt = f"""
You are a customer service email assistant for an online retail store.

Your task is to write polite, friendly, and helpful emails to customers when their original inquiry is unclear or contains confusing product references.

You will be given a JSON object that contains a list of emails. Each email is in the following format:


  "email_id": "E012",
  "subject": "Looking for a replacement bag",
  "message": "Hi, I bought a leather briefcase earlier and the strap broke. I'm looking for something similar but slightly smaller."


You will also be provided with a list `confusing_item`, which flags items or parts of the message that were confusing or couldn’t be matched to products.

---

**Your task:**
For each email:
- Gently let the customer know that parts of their message were unclear or the product references could not be understood.
- Highlight the confusing part(s) using information from the `confusing_item` dictionary.
- Be polite and appreciative of their interest.
- Encourage them to clarify their request so you can assist them better.
- Maintain a warm and professional tone.
- If the customer's name can be inferred from the message, use it; otherwise, use a friendly generic greeting like "Hi there".

---

Format your response as a list of email responses using the structure below:

---
Email ID: <email_id>
Subject: <>
<personalized email content>
---

**Output Format Example:**

---
Email ID: E012
Subject: Clarification needed about your recent product inquiry

Hi there,

Thank you for reaching out!

We reviewed your message and are eager to help, but we had some difficulty understanding a few details:

Original message:
"Hi, I bought a leather briefcase earlier and the strap broke. I'm looking for something similar but slightly smaller."

However, we were unsure about the following part(s):
- Confusing item: "something similar but slightly smaller" – it's unclear which specific product or style you're referring to.

Could you please clarify <something infered from their message or subject>

We’re here and happy to help with recommendations as soon as we have more details!

Warm regards,
[Your Store Name]

---

Please format your responses using the same structure for each email provided.

Here is the info provided:
{json.dumps(confusing_order_requests)}
"""

email_response_confusing = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": confusing_emails_response_prompt}],
    temperature=0
)
print(email_response_confusing.choices[0].message.content)



---
Email ID: E017  
Subject: Clarification needed about your recent product inquiry  

Hi there,

Thank you for reaching out!

We reviewed your message and are eager to help, but we had some difficulty understanding a few details:

Original message:  
"Hi there I want to place an order for that popular item you sell. The one that's been selling like hotcakes lately. You know what I mean right?"

However, we were unsure about the following part(s):  
- Confusing item: "popular item" – it's unclear which specific product you're referring to.

Could you please clarify which product you have in mind? We’re here and happy to help with your order as soon as we have more details!

Warm regards,  
[Your Store Name]

---

Email ID: E022  
Subject: Clarification needed about your recent product inquiry  

Hi Monica,

Thank you for your enthusiasm and interest in our collection!

We reviewed your message and are eager to assist, but we had some difficulty understanding a few details:

Original m

In [None]:
### ===== 20. Split the response into individual email  AND combine with earlier email responses

confusing_emails_df = extract_email_blocks_to_df(email_response_confusing.choices[0].message.content)

display(confusing_emails_df)

# Combine these emails with previous ones
combined_df = pd.concat([order_request_emails_responses_df, confusing_emails_df], ignore_index=True)

display(combined_df)

## Google sheet 'order-response' updated
set_with_dataframe(order_response_sheet,combined_df)



Unnamed: 0,email ID,response
0,E017,Subject: Clarification needed about your recen...
1,E022,Subject: Clarification needed about your recen...


Unnamed: 0,email ID,response
0,E001,Subject: Your Order is Partially Shipped \n\n...
1,E002,Subject: Your Order is Shipped \n\nGood morni...
2,E004,"Subject: Your Order is Shipped \n\nHi,\n\nTha..."
3,E007,Subject: Your Order is Partially Shipped \n\n...
4,E008,"Subject: Your Order is Shipped \n\nHello,\n\n..."
5,E010,"Subject: Your Order is Shipped \n\nHello,\n\n..."
6,E013,"Subject: Your Order is Shipped \n\nHi Marco,\..."
7,E014,"Subject: Your Order is Shipped \n\nHi Johny,\..."
8,E018,Subject: Your Order is Not Shipped \n\nDear C...
9,E019,"Subject: Your Order is Shipped \n\nHey there,..."


In [None]:
print(f"GOOGLESHEET LINK --> https://docs.google.com/spreadsheets/d/{output_document.id}")


GOOGLESHEET LINK --> https://docs.google.com/spreadsheets/d/1fE04rak5mA5FCA4nryzB5cy2Unr6TdnQ5HlzanakFR4
1fE04rak5mA5FCA4nryzB5cy2Unr6TdnQ5HlzanakFR4



# Task 3. Handle Product Inquiry

### Steps:

1. Create a **prompt** for the email query engine (based on `product-inquiry`) to extract useful information from emails in parts like info, intent to buy, product name, product ID, etc.  
   _[Section #21]_

2. Filter `product inquiries` that have a clear intent to buy using the product index-based query engine to identify and enrich with product IDs.  
   _[Section #22]_

3. Separate the `remaining inquiries` that are not clearly product inquiries but can be categorized later under multiple types.  
   _[Section #23]_

4. Categorize the `remaining inquiries` based on various criteria.  
   _[Section #24]_

5. Group all `inquiries`, combining them into a common list of dictionaries to be passed to the LLM as a prompt.  
   _[Section #25]_

6. Create a **prompt** that takes all inquiry data and generates email responses based on it. All email data is batched in one input to reduce API calls.  
   _[Section #26]_

7. Extract emails in the proper format for a DataFrame, and add them to the Google Sheet `product-inquiry`.  
   _[Section #27]_



In [None]:
### ===== 21. Prompt for Email Query engine (with llm) - it takes emails of type 'product inquiry' and extract product info / query / intent to buy  etc from them

# Each email can result in multi line output : one for each product mentioned

product_inquiry_info_prompt = """**Task**: For each customer email, extract any product-related information mentioned. Each product may contain any combination of the following fields (all are optional):

**Product Info Fields**:
1. Product ID (format: 3 letters followed by 4 digits, e.g., ABC1234)
2. Product Name (e.g., "Leather Wallet", "Infinity Scarf")
3. Quantity (e.g., "3", "two", "a pair", etc.)
4. Description or other useful context (e.g., reason for purchase, season, usage)
5. Previously Bought: yes or no
6. Intent to buy  : yes or no
7. Gender : male | female | both (who can use this product)
8. Type : 'hat', 'glasses', 'coat', etc
9. Previously Bought and Intent to buy can be both yes in single line
A single email may contain references to one or more products. Return **one line per product mentioned**, using the following format:

**Output Format**:
Email ID: <email_id> | Product Info: <structured or natural language info with any available fields>

**Examples**:
Email ID: E003 | Product Info: Product Name: Infinity Scarf, Quantity: 3, Description: Wife loves colorful scarves, Previously Bought: no, Intent to buy: yes, Gender : female, Type : scarf
Email ID: E005 | Product Info: Product Name: Retro Sunglasses,  Previously Bought : yes, Intent to buy, Gender : both, Type : sunglass
Email ID: E005 | Product Info: Description: Looking for sunglasses for beach vacation, Previously Bought:no, Intent to buy: yes, Gender : both, Type : sunglass

**Note**:
- If some fields are not mentioned, include only what is available.
- Interpret and normalize quantities where possible (e.g., "a couple of", "five" → 2, 5).
- Capture each product mention even if incomplete
"""

product_inquiry_all_lines =  email_query_engine_product_inquiry.query(product_inquiry_info_prompt)

In [None]:
print(product_inquiry_all_lines)


Email ID: E021 | Product Info: Product ID: SDE2345, Previously Bought: yes, Gender: both, Type: vintage item  
Email ID: E021 | Product Info: Product ID: DJN8901, Previously Bought: yes, Gender: both, Type: vintage item  
Email ID: E021 | Product Info: Product ID: RGD7654, Previously Bought: yes, Gender: both, Type: vintage item  
Email ID: E021 | Product Info: Product ID: CRD3210, Previously Bought: yes, Gender: both, Type: vintage item  
Email ID: E021 | Product Info: Intent to buy: yes, Type: hat, Description: Looking for winter hats  

Email ID: E012 | Product Info: Product Name: leather briefcase, Previously Bought: yes, Gender: male, Type: bag  
Email ID: E012 | Product Info: Product Name: messenger bag or briefcase, Intent to buy: yes, Description: Looking for a slightly smaller work bag  

Email ID: E016 | Product Info: Product Name: dress, Intent to buy: yes, Gender: female, Type: dress, Description: For a summer wedding, flattering and comfortable  

Email ID: E006 | Product 

In [None]:
### ===== 22. filter out the line which are having Intent to Buy : yes [ They are product inquiries which might have some product Id - and we can response with proper info ]

product_query_engine = productIndex.as_query_engine(similarity_top_k=50)

product_inquiry_with_product_info = product_query_engine.query(f"""
**Task**: From the list of extracted email lines below, identify only the entries where **Intent to buy: yes** is mentioned.

For each matching Email line:
- Extract the **Product ID** or **Product Name** (whichever is available).
- Use this identifier to look up and return the corresponding product details from the index - which includes decription,season,price etc.
- If both Product ID and Product Name are available, prefer **Product ID** for the lookup.
- If Product ID is not available, and only Product Name is there. Try to find matching Product ID
- Also keep the original Product Info part text in the ouput as suggested in output format.
- Format the response exactly as follows:

**Output Format**:
Email ID: <email_id> | Product ID: <product_id> | Product Details: <product_info_stored_in_index_nodes> | Product Info: <original_input>
Email ID: <email_id> | Product ID: <product_id> | Product Details: <product_info_stored_in_index_nodes> | Product Info: <original_input>
...

---

**Emails to process**:
 {product_inquiry_all_lines}

""")


In [None]:
print(product_inquiry_with_product_info)

Email ID: E021 | Product ID: CBY6789 | Product Details: name: Corduroy Bucket Hat, description: Keep it casual and cool with our corduroy bucket hat. This trendy hat features a soft corduroy construction and a relaxed, bucket silhouette. A must-have accessory for achieving a laidback, streetwear-inspired look., category: Accessories, seasons: Fall, Winter, price: 28.0, type: hat, gender: both | Product Info: Intent to buy: yes, Type: hat, Description: Looking for winter hats

Email ID: E012 | Product ID: LTH2109 | Product Details: name: Leather Messenger Bag, description: Carry your essentials in style with our leather messenger bag. Crafted from premium, full-grain leather, this bag features a spacious main compartment, multiple pockets, and an adjustable strap for a comfortable fit. A timeless choice for work, travel, or everyday use., category: Bags, seasons: All seasons, price: 37.99, type: messenger bag, gender: male | Product Info: Product Name: messenger bag or briefcase, Intent

In [None]:
### ===== 23. find non-product inquiries, which have no product ID or intent to purchase

  ## other inquiries = [ all inquiry lines - inquiry line with product info ]
  ## This is just a simple difference of above to lines of outputs

def extract_email_info_pairs(lines: str) -> set:
    return {
        (
            line.split("Email ID:")[1].split("|")[0].strip(),
            line.split("Product Info:")[1].strip()
        )
        for line in lines.strip().splitlines() if line.strip()
    }

# Get (email_id, product_info) from subset
info_pairs = extract_email_info_pairs(str(product_inquiry_with_product_info))

# Filter lines not in info_pairs [ only have remaining inquiries]
other_product_inquiries = [
    line.strip()
    for line in str(product_inquiry_all_lines).strip().splitlines() if line.strip()
    if (
        (line.split("Email ID:")[1].split("|")[0].strip(),
         line.split("Product Info:")[1].strip())
        not in info_pairs
    )
]

display(other_product_inquiries)



['Email ID: E021 | Product Info: Product ID: SDE2345, Previously Bought: yes, Gender: both, Type: vintage item',
 'Email ID: E021 | Product Info: Product ID: DJN8901, Previously Bought: yes, Gender: both, Type: vintage item',
 'Email ID: E021 | Product Info: Product ID: RGD7654, Previously Bought: yes, Gender: both, Type: vintage item',
 'Email ID: E021 | Product Info: Product ID: CRD3210, Previously Bought: yes, Gender: both, Type: vintage item',
 'Email ID: E012 | Product Info: Product Name: leather briefcase, Previously Bought: yes, Gender: male, Type: bag',
 'Email ID: E011 | Product Info: Product ID: RSG8901, Gender: both, Type: sunglasses, Description: Inspired by a nostalgic era',
 'Email ID: E009 | Product Info: Product ID: DHN0987, Gender: both, Type: hat, Description: Material inquiry, suitable for winter']

In [None]:
### ===== 24. # Prompt Data JSON Structure for Inquiries Response [ we will use this in next section]
#
# Direct Product Inquiries has only 1 category:
# 1. 'product_found' - There is a product ID which is also found in Database
#
# Other Inquiries has 3 categories:
# 2. 'query' - If NO purchase intent, it's just a "query"
# 3. 'product_id_not_found' - If purchase intent exists and includes a PRODUCT_ID,
#     it means the ID isn't found in our database (otherwise it should be in category 'product_found')
# 4. 'search_similar_products' - If purchase intent exists but no clear product identifier,
#     we try semantic search for suggestions

import re

def process_product_inquiry_data(lines):
    data_list =  [line for line in f"{lines}".split('\n') if line]
    return [{
        'email_id': item.split('Email ID:')[1].split('|')[0].strip(),
        'product_id': item.split('Product ID:')[1].split('|')[0].strip(),
        'product_details': item.split('Product Details:')[1].split('|')[0].strip(),
        'product_info': item.split('Product Info:')[1].strip(),
        'category': 'product_found'
    } for item in data_list]

def process_other_inquiry_data(lines):
    result = []

    for line in lines:
        # Extract email_id and product_info
        email_part, product_part = line.split('| Product Info:', 1)
        email_id = email_part.split('Email ID:')[1].strip()
        product_info = product_part.strip()

        # Initialize dictionary
        entry = {
            'email_id': email_id,
            'product_info': product_info,
            'category': None
        }

        # Check conditions
        has_intent = 'Intent to buy: yes' in product_info
        has_product_id = re.search(r'Product ID:\s*[A-Z0-9]+', product_info)
        has_product_name = 'Product Name:' in product_info
        has_description = 'Description:' in product_info

        if not has_intent:
            entry['category'] = 'query'
        elif has_intent and has_product_id:
            entry['category'] = 'product_id_not_found'
        elif has_intent and (has_product_name or has_description):
            entry['category'] = 'search_similar_products'
        else:
            entry['category'] = 'query'  # Fallback

        result.append(entry)

    return result


## Complete List of email related data
complete_list_of_data_for_emails = process_other_inquiry_data(other_product_inquiries)+ process_product_inquiry_data(product_inquiry_with_product_info)

### . Adding alternatives for product in category 'search_similar_products'

for email in complete_list_of_data_for_emails:
  email
  if email['category'] == 'search_similar_products':
    suggestions = search_for_product(email['product_info'],product_ret)
    email['suggestions'] = suggestions

display(complete_list_of_data_for_emails)

[{'email_id': 'E021',
  'product_info': 'Product ID: SDE2345, Previously Bought: yes, Gender: both, Type: vintage item',
  'category': 'query'},
 {'email_id': 'E021',
  'product_info': 'Product ID: DJN8901, Previously Bought: yes, Gender: both, Type: vintage item',
  'category': 'query'},
 {'email_id': 'E021',
  'product_info': 'Product ID: RGD7654, Previously Bought: yes, Gender: both, Type: vintage item',
  'category': 'query'},
 {'email_id': 'E021',
  'product_info': 'Product ID: CRD3210, Previously Bought: yes, Gender: both, Type: vintage item',
  'category': 'query'},
 {'email_id': 'E012',
  'product_info': 'Product Name: leather briefcase, Previously Bought: yes, Gender: male, Type: bag',
  'category': 'query'},
 {'email_id': 'E011',
  'product_info': 'Product ID: RSG8901, Gender: both, Type: sunglasses, Description: Inspired by a nostalgic era',
  'category': 'query'},
 {'email_id': 'E009',
  'product_info': 'Product ID: DHN0987, Gender: both, Type: hat, Description: Material in

In [None]:
### ===== 25. Grouping by Email ID : combining all inquiies under one dict object

from collections import defaultdict

grouped = defaultdict(list)
for item in complete_list_of_data_for_emails:
    email = item['email_id']
    # Create a copy without email_id
    inquiry = {k: v for k, v in item.items() if k != 'email_id'}
    grouped[email].append(inquiry)

# Convert to desired output format
grouped_list_of_data_for_emails = [{'email_id': email, 'inquiries': inquiries} for email, inquiries in grouped.items()]

for e in grouped_list_of_data_for_emails:
    email_info = emails_df[emails_df['email_id']==e['email_id']].to_dict('records')[0]
    e['email'] = email_info

display(grouped_list_of_data_for_emails)

[{'email_id': 'E021',
  'inquiries': [{'product_info': 'Product ID: SDE2345, Previously Bought: yes, Gender: both, Type: vintage item',
    'category': 'query'},
   {'product_info': 'Product ID: DJN8901, Previously Bought: yes, Gender: both, Type: vintage item',
    'category': 'query'},
   {'product_info': 'Product ID: RGD7654, Previously Bought: yes, Gender: both, Type: vintage item',
    'category': 'query'},
   {'product_info': 'Product ID: CRD3210, Previously Bought: yes, Gender: both, Type: vintage item',
    'category': 'query'},
   {'product_id': 'CBY6789',
    'product_details': 'name: Corduroy Bucket Hat, description: Keep it casual and cool with our corduroy bucket hat. This trendy hat features a soft corduroy construction and a relaxed, bucket silhouette. A must-have accessory for achieving a laidback, streetwear-inspired look., category: Accessories, seasons: Fall, Winter, price: 28.0, type: hat, gender: both',
    'product_info': 'Intent to buy: yes, Type: hat, Descriptio

In [None]:
### ===== 26. Product Inquiry prompt creation and passing to Query engine


product_inquiry_email_prompt = f"""
You are a helpful and friendly email assistant for an online retail store. Your goal is to generate warm, informative, and relevant replies to customers based on their inquiries.

You will receive input in the form of a list of email threads, where each email ID is associated with one or more inquiries. Each inquiry may include:

- product_info: the customer's product-related message
- category: the type of inquiry
- suggestions : product suggestions to users
- Optional: product_id, product_details (if matched from our catalog)

You will also be given a email field for each email ID in this format:


  "email_id": "E012",
  "subject": "Looking for a replacement bag",
  "message": "Hi, I bought a leather briefcase earlier and the strap broke. I'm looking for something similar but slightly smaller."


---

Your task:

For each email_id, analyze the list of inquiries and generate one cohesive and human-like response based on the inquiry categories present. Use the guidance below for each category:

---

Response Guidelines by Category:

1. category: product_found
- The inquired product was successfully found.
- give proper details in list manner like product_id, name , price, seasons etc
- Use the provided product_details to highlight the product's key features.
- Personalize the reply by incorporating the customer's original message and any preferences (e.g., size, use case).
- If multiple product_found entries exist, mention each one briefly and suggest choosing among them.

2. category: product_id_not_found
- We couldn't find a match for the provided product ID.
- Respond gently and ask the customer to double-check the product ID or provide more details.
- Reassure the customer you're happy to help once clarified.

3. category: query
- This is a general inquiry or a follow-up from a previous purchase.
- If it includes "Previously Bought: yes" and product IDs, thank the customer for their past purchases.
- Address the nature of the query (e.g., style, fit, material, vintage appeal) based on the message and product description.
- Be friendly and encourage the customer to checkout our product or contact again for more info
- If query is completely not related to retail purchase - give some witty reply, but be warm

4. category: search_similar_products
- The customer is interested in finding similar items.
- from suggestion field give suggestions regarding products listing product_id, name , price, seasons and short description
- Be friendly and encourage the customer to inquire more about any product in our catalog for suggestions

---

Example Output Format:

For each email, your response should look like:

---

Email ID: E012
Subject: Your Inquiry regarding......

Hi there [Or use customer name extracted from email],

Thank you for reaching out! We're really sorry to hear about the issue with your leather briefcase. Based on your message....

Leather Messenger Bag [LFT1234] – Crafted from premium full-grain leather, this stylish messenger bag features a roomy main compartment, handy pockets, and a comfy adjustable strap. It's slightly smaller than a briefcase and perfect for both work and daily errands.

Let us know if you'd like more options—we’re happy to help!

Warm regards,
Your Retail Store Team

---

Repeat this format for each email_id in the dictionary.
Here is the JSON data :
{json.dumps(grouped_list_of_data_for_emails)}

"""

inquiry_response_emails = product_query_engine.query(product_inquiry_email_prompt)
print(inquiry_response_emails)



---

Email ID: E021  
Subject: Your Inquiry about Winter Hats

Hi there,

Thank you for reaching out and for being such a valued customer with your collection of vintage items from our store! We're thrilled to hear that they fit your style perfectly. Regarding your inquiry about winter hats, we have a great option for you:

- **Corduroy Bucket Hat**  
  - **Product ID**: CBY6789  
  - **Price**: $28.00  
  - **Seasons**: Fall, Winter  
  - **Description**: Keep it casual and cool with our corduroy bucket hat. This trendy hat features a soft corduroy construction and a relaxed, bucket silhouette. A must-have accessory for achieving a laidback, streetwear-inspired look.

Feel free to let us know if you have any more questions or need further assistance. We're here to help!

Warm regards,  
Your Retail Store Team

---

Email ID: E012  
Subject: Your Inquiry about a New Work Bag

Hi there,

Thank you for reaching out! We're really sorry to hear about the issue with your leather briefcase. 

In [None]:
### ===== 27. Extracting emails from response and adding to Google Sheet

inquiry_emails_df = extract_email_blocks_to_df(f"{inquiry_response_emails}")

display(inquiry_emails_df)

## Google sheet 'product-inquiry' updated
set_with_dataframe(inquiry_response_sheet,inquiry_emails_df)

Unnamed: 0,email ID,response
0,E021,Subject: Your Inquiry about Winter Hats\n\nHi ...
1,E012,Subject: Your Inquiry about a New Work Bag\n\n...
2,E011,Subject: Your Inquiry about Retro Sunglasses\n...
3,E009,Subject: Pregunta Sobre Gorro de Punto Grueso\...
4,E016,Subject: Your Inquiry about a Summer Wedding G...
5,E006,Subject: Your Inquiry about Chelsea Boots\n\nH...
6,E003,Subject: Your Inquiry about Bags for Work\n\nH...
7,E015,Subject: Your Inquiry about a Stylish and Prac...
8,E005,Subject: Your Inquiry about the Cozy Shawl\n\n...
9,E020,Subject: Your Inquiry about the Saddle Bag\n\n...


In [None]:
print(f"GOOGLESHEET LINK --> https://docs.google.com/spreadsheets/d/{output_document.id}")


GOOGLESHEET LINK --> https://docs.google.com/spreadsheets/d/1fE04rak5mA5FCA4nryzB5cy2Unr6TdnQ5HlzanakFR4
