# Solving Business Problems with AI

## Objective
Develop a proof-of-concept application to intelligently process email order requests and customer inquiries for a fashion store. The system should accurately categorize emails as either product inquiries or order requests and generate appropriate responses using the product catalog information and current stock status.

## Task Description

### Inputs

Google Spreadsheet **[Document](https://docs.google.com/spreadsheets/d/14fKHsblfqZfWj3iAaM2oA51TlYfQlFT4WKo52fVaQ9U)** containing:

- **Products**: List of products with fields including product ID, name, category, stock amount, detailed description, and season.

- **Emails**: Sequential list of emails with fields such as email ID, subject, and body.

### Instructions

- Implement all requirements using advanced Large Language Models (LLMs) to handle complex tasks, process extensive data, and generate accurate outputs effectively.
- Use Retrieval-Augmented Generation (RAG) and vector store techniques where applicable to retrieve relevant information and generate responses.
- You are provided with a temporary OpenAI API key granting access to GPT-4o, which has a token quota. Use it wisely or use your own key if preferred.
- Address the requirements in the order listed. Review them in advance to develop a general implementation plan before starting.
- Your deliverables should include:
   - Code developed within this notebook.
   - A single spreadsheet containing results, organized across separate sheets.
   - Comments detailing your thought process.
- You may use additional libraries (e.g., langchain) to streamline the solution. Use libraries appropriately to align with best practices for AI and LLM tools.
- Use the most suitable AI techniques for each task. Note that solving tasks with traditional programming methods will not earn points, as this assessment evaluates your knowledge of LLM tools and best practices.

### Requirements

#### 1. Classify emails
    
Classify each email as either a _**"product inquiry"**_ or an _**"order request"**_. Ensure that the classification accurately reflects the intent of the email.

**Output**: Populate the **email-classification** sheet with columns: email ID, category.

#### 2. Process order requests
1.   Process orders
  - For each order request, verify product availability in stock.
  - If the order can be fulfilled, create a new order line with the status “created”.
  - If the order cannot be fulfilled due to insufficient stock, create a line with the status “out of stock” and include the requested quantity.
  - Update stock levels after processing each order.
  - Record each product request from the email.
  - **Output**: Populate the **order-status** sheet with columns: email ID, product ID, quantity, status (**_"created"_**, **_"out of stock"_**).

2.   Generate responses
  - Create response emails based on the order processing results:
      - If the order is fully processed, inform the customer and provide product details.
      - If the order cannot be fulfilled or is only partially fulfilled, explain the situation, specify the out-of-stock items, and suggest alternatives or options (e.g., waiting for restock).
  - Ensure the email tone is professional and production-ready.
  - **Output**: Populate the **order-response** sheet with columns: email ID, response.

#### 3. Handle product inquiry

Customers may ask general open questions.
  - Respond to product inquiries using relevant information from the product catalog.
  - Ensure your solution scales to handle a full catalog of over 100,000 products without exceeding token limits. Avoid including the entire catalog in the prompt.
  - **Output**: Populate the **inquiry-response** sheet with columns: email ID, response.

## Evaluation Criteria
- **Advanced AI Techniques**: The system should use Retrieval-Augmented Generation (RAG) and vector store techniques to retrieve relevant information from data sources and use it to respond to customer inquiries.
- **Tone Adaptation**: The AI should adapt its tone appropriately based on the context of the customer's inquiry. Responses should be informative and enhance the customer experience.
- **Code Completeness**: All functionalities outlined in the requirements must be fully implemented and operational as described.
- **Code Quality and Clarity**: The code should be well-organized, with clear logic and a structured approach. It should be easy to understand and maintain.
- **Presence of Expected Outputs**: All specified outputs must be correctly generated and saved in the appropriate sheets of the output spreadsheet. Ensure the format of each output matches the requirements—do not add extra columns or sheets.
- **Accuracy of Outputs**: The accuracy of the generated outputs is crucial and will significantly impact the evaluation of your submission.

We look forward to seeing your solution and your approach to solving real-world problems with AI technologies.

# Prerequisites

### Configure OpenAI API Key.

In [3]:
# Install the OpenAI Python package.
%pip install openai
!pip install langchain
!pip install -U langchain-community
!pip install faiss-cpu
!pip install tiktoken
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS


Collecting langchain
  Downloading langchain-0.3.0-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_core-0.3.5-py3-none-any.whl.metadata (6.3 kB)
Collecting langchain-text-splitters<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_text_splitters-0.3.0-py3-none-any.whl.metadata (2.3 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.125-py3-none-any.whl.metadata (13 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<0.4.0,>=0.3.0->langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting orjson<4.0.0,>=3.9.14 (from langsmith<0.2.0,>=0.1.17->langchain)
  Downloading orjson-3.10.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (50 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50

**IMPORTANT: If you are going to use our custom API Key then make sure that you also use custom base URL as in example below. Otherwise it will not work.**

In [4]:
# Code example of OpenAI communication

import openai
from openai import OpenAI
from langchain.embeddings.openai import OpenAIEmbeddings

openai_api_key ='a0BIj000001cSt3MAE'
openai.api_key = openai_api_key
client = OpenAI(
    # In order to use provided API key, make sure that models you create point to this custom base URL.
    base_url='https://47v4us7kyypinfb5lcligtc3x40ygqbs.lambda-url.us-east-1.on.aws/v1/',
    # The temporary API key giving access to ChatGPT 4o model. Quotas apply: you have 500'000 input and 500'000 output tokens, use them wisely ;)
    api_key=openai_api_key
)

completion = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {"role": "user", "content": "Hello!"}
  ]
)

print(completion.choices[0].message)

embeddings_model = OpenAIEmbeddings(openai_api_key=openai_api_key)

ChatCompletionMessage(content='Hello! How can I assist you today?', refusal=None, role='assistant', function_call=None, tool_calls=None)


  embeddings_model = OpenAIEmbeddings(openai_api_key=openai_api_key)


In [5]:
try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print("API key is working:", completion.choices[0].message)
except Exception as e:
    print("Error:", e)

API key is working: ChatCompletionMessage(content='Hello! How can I assist you today?', refusal=None, role='assistant', function_call=None, tool_calls=None)


In [6]:
# Code example of reading input data

import pandas as pd
from IPython.display import display

def read_data_frame(document_id, sheet_name):
    export_link = f"https://docs.google.com/spreadsheets/d/{document_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
    return  pd.read_csv(export_link)

document_id = '14fKHsblfqZfWj3iAaM2oA51TlYfQlFT4WKo52fVaQ9U'
products_df = read_data_frame(document_id, 'products')
emails_df = read_data_frame(document_id, 'emails')

# Display first 3 rows of each DataFrame
display(products_df.head(3))
display(emails_df.head(3))

Unnamed: 0,product_id,name,category,description,stock,seasons,price
0,RSG8901,Retro Sunglasses,Accessories,Transport yourself back in time with our retro...,1,"Spring, Summer",26.99
1,SWL2345,Sleek Wallet,Accessories,Keep your essentials organized and secure with...,5,All seasons,30.0
2,VSC6789,Versatile Scarf,Accessories,Add a touch of versatility to your wardrobe wi...,6,"Spring, Fall",23.0


Unnamed: 0,email_id,subject,message
0,E001,Leather Wallets,"Hi there, I want to order all the remaining LT..."
1,E002,Buy Vibrant Tote with noise,"Good morning, I'm looking to buy the VBT2345 V..."
2,E003,Need your help,"Hello, I need a new bag to carry my laptop and..."


# Task 1. Classify emails

In [7]:
from google.colab import files
uploaded = files.upload()

Saving ai-test-436218-bb896f943c5b.json to ai-test-436218-bb896f943c5b.json


In [8]:
import gspread
from oauth2client.service_account import ServiceAccountCredentials

# Define the scope for accessing Google Sheets
scope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive']

# Load credentials from the JSON keyfile you downloaded
# Ensure the path is correct and the file exists
creds = ServiceAccountCredentials.from_json_keyfile_name('ai-test-436218-bb896f943c5b.json', scope)

# Authenticate the client
goog = gspread.authorize(creds)

# Open the Google Sheet by name
spreadsheet = goog.open('AI Test')


In [9]:
try:
    classification_sheet = spreadsheet.worksheet('email-classification')
except gspread.exceptions.WorksheetNotFound:
    classification_sheet = spreadsheet.add_worksheet(title='email-classification', rows="1000", cols="2")

# Add headers (if it's a new sheet)
classification_sheet.clear()  # Clear the sheet if not empty
classification_sheet.append_row(["email ID", "category"])

{'spreadsheetId': '1kjwdDzFTizffzqISjlvKkWX1Xbhl-6VipMLYjDjU_xo',
 'updates': {'spreadsheetId': '1kjwdDzFTizffzqISjlvKkWX1Xbhl-6VipMLYjDjU_xo',
  'updatedRange': "'email-classification'!A1:B1",
  'updatedRows': 1,
  'updatedColumns': 2,
  'updatedCells': 2}}

In [10]:
def classify_email_gpt(subject, message):
    prompt = f"""
    You are an AI that classifies emails. Here is an email:
    Subject: {subject}
    Body: {message}

    Classify this email as either a "product inquiry" or an "order request". Reply with only the classification.
    """

    # Sending a request to the GPT-4o API
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": prompt}
        ]
    )


    # Get the classification result from the response
    # Use dot notation (.) to access the content attribute
    classification = response.choices[0].message.content.strip()
    return classification

# Example: Classify the first few emails from the dataset
for index, email in emails_df.iterrows():
    subject = email['subject']
    message = email['message']
    email_id = email['email_id']

    # Classify the email
    category = classify_email_gpt(subject, message)

    # Print classification for review
    print(f"Email ID: {email_id}, Category: {category}")

    classification_sheet.append_row([email_id, category])

Email ID: E001, Category: order request
Email ID: E002, Category: product inquiry
Email ID: E003, Category: product inquiry
Email ID: E004, Category: order request
Email ID: E005, Category: product inquiry
Email ID: E006, Category: product inquiry
Email ID: E007, Category: order request
Email ID: E008, Category: order request
Email ID: E009, Category: product inquiry
Email ID: E010, Category: order request
Email ID: E011, Category: product inquiry
Email ID: E012, Category: product inquiry
Email ID: E013, Category: product inquiry
Email ID: E014, Category: order request
Email ID: E015, Category: Product inquiry
Email ID: E016, Category: product inquiry
Email ID: E017, Category: order request
Email ID: E018, Category: order request
Email ID: E019, Category: product inquiry
Email ID: E020, Category: product inquiry
Email ID: E021, Category: product inquiry


# Task 2. Process order requests

In [11]:


# Initialize embeddings model with the correct API key
embeddings_model = OpenAIEmbeddings(model="text-embedding-ada-002", openai_api_key=('sk-a0BIj000001cSt3MAE'))



In [12]:
# Prepare product data for vectorization
product_texts = products_df.apply(
    lambda row: f"Product ID: {row['product_id']}, Name: {row['name']}, Category: {row['category']}, Description: {row['description']}", axis=1).to_list()



In [16]:
try:
    product_vectors = client.embeddings.create(
        model="text-embedding-ada-002",  # Replace with your desired model
        input=product_texts
    )
    print("Embeddings computed successfully.")
except Exception as e:
    print("Error:", e)

Embeddings computed successfully.


In [19]:
# Extract embeddings from the response object
product_embeddings = [embedding.embedding for embedding in product_vectors.data]

# Convert the list of embeddings into a numpy array
product_vectors_np = np.array(product_embeddings).astype('float32')

print(f"Converted {len(product_embeddings)} product embeddings to a numpy array.")

Converted 99 product embeddings to a numpy array.


In [21]:
import faiss

# Get the dimension of the embeddings (length of each vector)
dimension = product_vectors_np.shape[1]  # Assuming embeddings are 2D (num_products, embedding_dimension)

# Create a FAISS index using the L2 (Euclidean) distance metric
index = faiss.IndexFlatL2(dimension)

# Step 2: Add embeddings to the FAISS index
index.add(product_vectors_np)

print(f"Added {index.ntotal} products to the FAISS index.")

Added 99 products to the FAISS index.


In [22]:
def retrieve_similar_products(query_text, k=5):
    # Step 1: Convert the query text into an embedding using the same model
    query_embedding = client.embeddings.create(
        model="text-embedding-ada-002",
        input=[query_text]
    )

    # Extract the embedding from the response
    query_vector = np.array([query_embedding.data[0].embedding]).astype('float32')

    # Step 2: Perform FAISS search (find the k nearest neighbors)
    distances, indices = index.search(query_vector, k)

    # Step 3: Retrieve the product IDs and distances
    similar_products = products_df.iloc[indices[0]]['product_id'].tolist()
    return similar_products, distances[0].tolist()

# Example query
query = "Looking for a product similar to Product 1 in Category 1"
similar_products, distances = retrieve_similar_products(query, k=3)

print(f"Similar products to '{query}': {similar_products} with distances {distances}")

Similar products to 'Looking for a product similar to Product 1 in Category 1': ['TLR5432', 'SKR3210', 'CPL0123'] with distances [0.462467759847641, 0.4657413959503174, 0.4677693545818329]


# Task 3. Handle product inquiry