# **Product Recommender System: OpenAI Text Embedding**

A renowned online shopping platform named GlimmerGate needs help to improve their product recommendation system. **They aim to provide personalized recommendations to users based on their recent product views.**

They have provided a product dataset containing information such as:
- title
- description
- ID  

About 2000 of their products. Additionally, they have supplied a list of 10 recently viewed products by a user.

**They want you to develop a prototype to recommend products that the user has never viewed before, based on their recently viewed products by leveraging OpenAI's text embedding models to build a text-based recommendation system using Python.**

### Task 1 - Set up project environment

Installing the needed modules

In [1]:
!pip install openai==1.16.2 python-dotenv

Collecting openai==1.16.2
  Downloading openai-1.16.2-py3-none-any.whl (267 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m267.1/267.1 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Collecting httpx<1,>=0.23.0 (from openai==1.16.2)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai==1.16.2)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai==1.16.2)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m1.2 MB/s[0m eta [36m0

Importing the needed modules and setup the OpenAI API

In [2]:
import pandas as pd
import numpy as np
import os
from openai import OpenAI
from dotenv import load_dotenv
from matplotlib import pyplot as plt
import plotly.express as px

from sklearn.decomposition import PCA
from sklearn.metrics.pairwise import cosine_similarity

# Loading API key and organization ID from a dotenv file
load_dotenv(dotenv_path='apikey.env.txt')

# Retrieving API key and organization ID from environment variables
APIKEY = os.getenv("APIKEY")
#ORGID = os.getenv("ORGID")

# Creating an instance of the OpenAI client with the provided API key and organization ID
client = OpenAI(
  #organization= ORGID,
  api_key=APIKEY
)

client

<openai.OpenAI at 0x7e8e8549bd60>

Import our dataset

In [4]:
data = pd.read_csv('products_dataset.csv')
data.head()

Unnamed: 0,product_id,title,description
0,P0,Men's 3X Large Carbon Heather Cotton/Polyester...,"This heavyweight, water-repellent hooded sweat..."
1,P1,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,If you need more length between your existing ...
2,P2,Large Tapestry Bolster Bed,Polyester cover resembling rich Italian tapest...
3,P3,16-Gauge-Sinks Vessel Sink in White with Faucet,It features a rectangle shape. This vessel set...
4,P4,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,This 9 in. black full grain leather logger boo...


List of last 8 products recently viewed by the user.

In [5]:
searched_products_id = [
    'P1938',
    'P1970',
    'P1044',
    'P1838',
    'P1048',
    'P1017',
    'P1310',
    'P1444',
]

### Task 2 - Prepare the dataset

Let's label the data points that are recently veiwed.

In [11]:
data['product_status'] = 'not_viewed'

# Based on the 'searched_products_id' I filter the dataset to label the product status as 'recently_viewed'
data.loc[data.product_id.isin(searched_products_id), 'product_status'] = 'recently viewed'
data[data.product_status == 'recently viewed']

Unnamed: 0,product_id,title,description,product_status
1017,P1017,1 qt. #660D-7 Blackberry Farm Satin Enamel Int...,Love your space like never before with the hig...,recently viewed
1044,P1044,1 qt. #M360-4 Marjoram One-Coat Hide Eggshell ...,Introducing the best of BEHR Paint. Featuring ...,recently viewed
1048,P1048,5 gal. #640C-1 Hosta Flower Extra Durable Sati...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently viewed
1310,P1310,5 gal. #180A-2 Romantic Morn Extra Durable Sem...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently viewed
1444,P1444,5 gal. #PPU12-17 Cameroon Green Extra Durable ...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently viewed
1838,P1838,5 gal. #N340-2 Dune Grass Extra Durable Satin ...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently viewed
1938,P1938,1 gal. #HDC-SP16-10 Japanese Rose Garden Semi-...,Introducing the best of BEHR Paint. Featuring ...,recently viewed
1970,P1970,8 oz. #510C-3 Rivers Edge Semi-Gloss Enamel St...,Introducing the best of BEHR Paint. Featuring ...,recently viewed


Now let's combine the product `title` and `description` and store it into a column called `combined`.

In [12]:
# In order to work on embedding text
data['combined'] = data.title + data.description
data

Unnamed: 0,product_id,title,description,product_status,combined
0,P0,Men's 3X Large Carbon Heather Cotton/Polyester...,"This heavyweight, water-repellent hooded sweat...",not_viewed,Men's 3X Large Carbon Heather Cotton/Polyester...
1,P1,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,If you need more length between your existing ...,not_viewed,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...
2,P2,Large Tapestry Bolster Bed,Polyester cover resembling rich Italian tapest...,not_viewed,Large Tapestry Bolster BedPolyester cover rese...
3,P3,16-Gauge-Sinks Vessel Sink in White with Faucet,It features a rectangle shape. This vessel set...,not_viewed,16-Gauge-Sinks Vessel Sink in White with Fauce...
4,P4,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,This 9 in. black full grain leather logger boo...,not_viewed,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...
...,...,...,...,...,...
1995,P1995,Dotty Black and White Black and White Wallpape...,"With a stylish monochrome look, this dotty wal...",not_viewed,Dotty Black and White Black and White Wallpape...
1996,P1996,Abrielle Brown/Light Gray 8 ft. x 10 ft. Orien...,The Abrielle collection features a stunning as...,not_viewed,Abrielle Brown/Light Gray 8 ft. x 10 ft. Orien...
1997,P1997,20 in. x 2-1/2 in. x 2-1/2 in. Polyurethane As...,"With Fypon balustrade systems, you can transfo...",not_viewed,20 in. x 2-1/2 in. x 2-1/2 in. Polyurethane As...
1998,P1998,1 gal. #P120-6 Diva Glam Flat Exterior Paint &...,BEHR PREMIUM PLUS Exterior Paint & Primer is a...,not_viewed,1 gal. #P120-6 Diva Glam Flat Exterior Paint &...


### Task 3 - Text embedding and visualization


Creating the text embedding vectors

In [14]:
response = client.embeddings.create(
    input = data.combined.to_list(), #List of text we want to use
    model = 'text-embedding-3-small', #Model for embedding from OpenAI
    dimensions = 512 #due to complexity of phrases, we need more dimensions
)

vectors = [d.embedding for d in response.data]

# Add a new column
data['text_embeddings'] = vectors

In [15]:
data

Unnamed: 0,product_id,title,description,product_status,combined,text_embeddings
0,P0,Men's 3X Large Carbon Heather Cotton/Polyester...,"This heavyweight, water-repellent hooded sweat...",not_viewed,Men's 3X Large Carbon Heather Cotton/Polyester...,"[0.03744583949446678, 0.03042474389076233, -0...."
1,P1,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,If you need more length between your existing ...,not_viewed,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,"[0.03523961082100868, 0.013278326019644737, 0...."
2,P2,Large Tapestry Bolster Bed,Polyester cover resembling rich Italian tapest...,not_viewed,Large Tapestry Bolster BedPolyester cover rese...,"[0.035860564559698105, -0.05905349925160408, 0..."
3,P3,16-Gauge-Sinks Vessel Sink in White with Faucet,It features a rectangle shape. This vessel set...,not_viewed,16-Gauge-Sinks Vessel Sink in White with Fauce...,"[-0.05834035575389862, -0.007969953119754791, ..."
4,P4,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,This 9 in. black full grain leather logger boo...,not_viewed,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,"[0.01998496614396572, 0.05075598508119583, -0...."
...,...,...,...,...,...,...
1995,P1995,Dotty Black and White Black and White Wallpape...,"With a stylish monochrome look, this dotty wal...",not_viewed,Dotty Black and White Black and White Wallpape...,"[0.08823681622743607, -0.05279356613755226, -0..."
1996,P1996,Abrielle Brown/Light Gray 8 ft. x 10 ft. Orien...,The Abrielle collection features a stunning as...,not_viewed,Abrielle Brown/Light Gray 8 ft. x 10 ft. Orien...,"[0.010978340171277523, -0.04043574631214142, 0..."
1997,P1997,20 in. x 2-1/2 in. x 2-1/2 in. Polyurethane As...,"With Fypon balustrade systems, you can transfo...",not_viewed,20 in. x 2-1/2 in. x 2-1/2 in. Polyurethane As...,"[-0.034120045602321625, -0.009548034518957138,..."
1998,P1998,1 gal. #P120-6 Diva Glam Flat Exterior Paint &...,BEHR PREMIUM PLUS Exterior Paint & Primer is a...,not_viewed,1 gal. #P120-6 Diva Glam Flat Exterior Paint &...,"[-0.010861445218324661, -0.014231621287763119,..."


> We know that each vector has 512 dimensions. In order to be able to visualize the vectors in a scatter plot, we need to use Principal Component Analysis (PCA) to reduce the dimension from 512 to 2.

In [17]:
pca = PCA(2)
vector_2d = pca.fit_transform(data.text_embeddings.to_list())
data['pc1'] = vector_2d[:, 0] #first column
data['pc2'] = vector_2d[:, 1] #second column

data

Unnamed: 0,product_id,title,description,product_status,combined,text_embeddings,pc1,pc2
0,P0,Men's 3X Large Carbon Heather Cotton/Polyester...,"This heavyweight, water-repellent hooded sweat...",not_viewed,Men's 3X Large Carbon Heather Cotton/Polyester...,"[0.03744583949446678, 0.03042474389076233, -0....",-0.013349,-0.071634
1,P1,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,If you need more length between your existing ...,not_viewed,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,"[0.03523961082100868, 0.013278326019644737, 0....",-0.357218,-0.236111
2,P2,Large Tapestry Bolster Bed,Polyester cover resembling rich Italian tapest...,not_viewed,Large Tapestry Bolster BedPolyester cover rese...,"[0.035860564559698105, -0.05905349925160408, 0...",-0.201098,0.206637
3,P3,16-Gauge-Sinks Vessel Sink in White with Faucet,It features a rectangle shape. This vessel set...,not_viewed,16-Gauge-Sinks Vessel Sink in White with Fauce...,"[-0.05834035575389862, -0.007969953119754791, ...",-0.181798,-0.043370
4,P4,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,This 9 in. black full grain leather logger boo...,not_viewed,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,"[0.01998496614396572, 0.05075598508119583, -0....",-0.214504,-0.141164
...,...,...,...,...,...,...,...,...
1995,P1995,Dotty Black and White Black and White Wallpape...,"With a stylish monochrome look, this dotty wal...",not_viewed,Dotty Black and White Black and White Wallpape...,"[0.08823681622743607, -0.05279356613755226, -0...",-0.042122,0.196937
1996,P1996,Abrielle Brown/Light Gray 8 ft. x 10 ft. Orien...,The Abrielle collection features a stunning as...,not_viewed,Abrielle Brown/Light Gray 8 ft. x 10 ft. Orien...,"[0.010978340171277523, -0.04043574631214142, 0...",-0.246727,0.481593
1997,P1997,20 in. x 2-1/2 in. x 2-1/2 in. Polyurethane As...,"With Fypon balustrade systems, you can transfo...",not_viewed,20 in. x 2-1/2 in. x 2-1/2 in. Polyurethane As...,"[-0.034120045602321625, -0.009548034518957138,...",-0.082867,-0.103888
1998,P1998,1 gal. #P120-6 Diva Glam Flat Exterior Paint &...,BEHR PREMIUM PLUS Exterior Paint & Primer is a...,not_viewed,1 gal. #P120-6 Diva Glam Flat Exterior Paint &...,"[-0.010861445218324661, -0.014231621287763119,...",0.506965,-0.005666


Now that we have the text embedding vectors in two dimensions, we can use them to create a 2D plot.

In [18]:
px.scatter(data, x = 'pc1', y= 'pc2', color = 'product_status')

In [38]:
# Zoomed visualization
px.scatter(data, x = 'pc1', y= 'pc2', color = 'product_status')

### Task 4 - Find similar products

In [29]:
data.head()

Unnamed: 0,product_id,title,description,product_status,combined,text_embeddings,pc1,pc2
0,P0,Men's 3X Large Carbon Heather Cotton/Polyester...,"This heavyweight, water-repellent hooded sweat...",not_viewed,Men's 3X Large Carbon Heather Cotton/Polyester...,"[0.03744583949446678, 0.03042474389076233, -0....",-0.013349,-0.071634
1,P1,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,If you need more length between your existing ...,not_viewed,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,"[0.03523961082100868, 0.013278326019644737, 0....",-0.357218,-0.236111
2,P2,Large Tapestry Bolster Bed,Polyester cover resembling rich Italian tapest...,not_viewed,Large Tapestry Bolster BedPolyester cover rese...,"[0.035860564559698105, -0.05905349925160408, 0...",-0.201098,0.206637
3,P3,16-Gauge-Sinks Vessel Sink in White with Faucet,It features a rectangle shape. This vessel set...,not_viewed,16-Gauge-Sinks Vessel Sink in White with Fauce...,"[-0.05834035575389862, -0.007969953119754791, ...",-0.181798,-0.04337
4,P4,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,This 9 in. black full grain leather logger boo...,not_viewed,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,"[0.01998496614396572, 0.05075598508119583, -0....",-0.214504,-0.141164


Get the data related to `recently_viewed` and `not_viewed` products

In [30]:
df_recently_viewed = data[data.product_status == 'recently viewed']
df_not_viewed = data[data.product_status == 'not_viewed']

df_recently_viewed

Unnamed: 0,product_id,title,description,product_status,combined,text_embeddings,pc1,pc2
1017,P1017,1 qt. #660D-7 Blackberry Farm Satin Enamel Int...,Love your space like never before with the hig...,recently viewed,1 qt. #660D-7 Blackberry Farm Satin Enamel Int...,"[0.05400046706199646, -0.026193415746092796, 0...",0.469569,0.056976
1044,P1044,1 qt. #M360-4 Marjoram One-Coat Hide Eggshell ...,Introducing the best of BEHR Paint. Featuring ...,recently viewed,1 qt. #M360-4 Marjoram One-Coat Hide Eggshell ...,"[0.028097040951251984, -0.02537091076374054, 0...",0.465344,0.047156
1048,P1048,5 gal. #640C-1 Hosta Flower Extra Durable Sati...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently viewed,5 gal. #640C-1 Hosta Flower Extra Durable Sati...,"[0.0015978449955582619, -0.027278482913970947,...",0.465323,0.034618
1310,P1310,5 gal. #180A-2 Romantic Morn Extra Durable Sem...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently viewed,5 gal. #180A-2 Romantic Morn Extra Durable Sem...,"[0.0022077560424804688, -0.008812451735138893,...",0.473427,0.052158
1444,P1444,5 gal. #PPU12-17 Cameroon Green Extra Durable ...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently viewed,5 gal. #PPU12-17 Cameroon Green Extra Durable ...,"[0.05109540745615959, -0.016649870201945305, 0...",0.472328,0.056748
1838,P1838,5 gal. #N340-2 Dune Grass Extra Durable Satin ...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently viewed,5 gal. #N340-2 Dune Grass Extra Durable Satin ...,"[0.006433435715734959, -0.01584387570619583, 0...",0.466746,0.056686
1938,P1938,1 gal. #HDC-SP16-10 Japanese Rose Garden Semi-...,Introducing the best of BEHR Paint. Featuring ...,recently viewed,1 gal. #HDC-SP16-10 Japanese Rose Garden Semi-...,"[0.003495427081361413, -0.060138072818517685, ...",0.464684,0.053802
1970,P1970,8 oz. #510C-3 Rivers Edge Semi-Gloss Enamel St...,Introducing the best of BEHR Paint. Featuring ...,recently viewed,8 oz. #510C-3 Rivers Edge Semi-Gloss Enamel St...,"[0.030109742656350136, -0.0616336390376091, 0....",0.464459,0.049911


Convert the embedding vectors to Numpy arrays

In [31]:
vectors_recently_viewed = [np.array(vector) for vector in df_recently_viewed.text_embeddings]
vectors_not_viewed = [np.array(vector) for vector in df_not_viewed.text_embeddings]

vectors_recently_viewed

[array([ 0.05400047, -0.02619342,  0.05523884,  0.02583691, -0.057603  ,
        -0.02634352,  0.04300524,  0.02272223, -0.06976154,  0.04837151,
         0.09208974,  0.00064205, -0.00980377,  0.01416621,  0.00778203,
         0.07002423,  0.03748886, -0.00290829,  0.0073927 ,  0.04593229,
         0.02540536,  0.00571339,  0.00096161,  0.03144711,  0.04304276,
         0.02199046, -0.00903917, -0.06458291, -0.01081698, -0.02199046,
        -0.0006356 , -0.03454304, -0.00992573, -0.05129857,  0.00931123,
        -0.07355171,  0.05681495, -0.07171292, -0.00760378, -0.02334141,
        -0.02191541,  0.05054804,  0.05066062, -0.03056524,  0.03161598,
         0.06233132, -0.0361942 , -0.05719021,  0.04867172,  0.05602689,
        -0.02032054,  0.00511766, -0.02407318,  0.00157611,  0.04833398,
        -0.03570635, -0.06233132, -0.00135799,  0.03388632,  0.01063873,
         0.05880385, -0.01267454, -0.02825737,  0.0273755 , -0.07388945,
        -0.05899148, -0.03122195,  0.03525604,  0.0

Find the similarity between each viewed product and all the unviewed products.

In [34]:
similarity_matrix = cosine_similarity(vectors_recently_viewed, vectors_not_viewed)

top_ids = []
for row in similarity_matrix:
  top_id = np.argmax(row)
  top_ids.append(top_id)

#df_not_viewed.iloc[top_ids]
top_ids

[854, 1058, 1700, 733, 1323, 1700, 1056, 314]

In [35]:
#Create a variable with a list of all the similar products
most_similar = list(df_not_viewed.iloc[top_ids].product_id)
most_similar

['P854', 'P1061', 'P1705', 'P733', 'P1327', 'P1705', 'P1059', 'P314']

### Task 5 - Recommend products based on the searched products

Let's update the status of the top similar products to `recommended`.

In [36]:
data.loc[data.product_id.isin(most_similar), 'product_status'] = 'recommended'

# Let's see the recommended products
data[data.product_status == 'recommended']

Unnamed: 0,product_id,title,description,product_status,combined,text_embeddings,pc1,pc2
314,P314,8 oz. #230F-7 Florence Brown Semi-Gloss Enamel...,Introducing the best of BEHR Paint. Featuring ...,recommended,8 oz. #230F-7 Florence Brown Semi-Gloss Enamel...,"[-0.003975710831582546, -0.0579850934445858, 0...",0.486859,0.06001
733,P733,5 gal. #N440-1 Streetwise Extra Durable Semi-G...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recommended,5 gal. #N440-1 Streetwise Extra Durable Semi-G...,"[0.010829787701368332, -0.01894751563668251, 0...",0.468495,0.00942
854,P854,1 qt. #N460-1 Evening White Satin Enamel Inter...,Love your space like never before with the hig...,recommended,1 qt. #N460-1 Evening White Satin Enamel Inter...,"[0.03936273232102394, -0.017813587561249733, 0...",0.493101,0.05217
1059,P1059,1 gal. Home Decorators Collection #HDC-SP14-6 ...,Introducing the best of BEHR Paint. Featuring ...,recommended,1 gal. Home Decorators Collection #HDC-SP14-6 ...,"[-0.0021748889703303576, -0.05987035483121872,...",0.448105,0.054046
1061,P1061,1 gal. #MQ1-28 Orange Flambe One-Coat Hide Egg...,Introducing the best of BEHR Paint. Featuring ...,recommended,1 gal. #MQ1-28 Orange Flambe One-Coat Hide Egg...,"[0.014321798458695412, -0.024152863770723343, ...",0.491518,0.066628
1327,P1327,5 gal. #MQ4-44 Green Dynasty Extra Durable Egg...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recommended,5 gal. #MQ4-44 Green Dynasty Extra Durable Egg...,"[0.04926867410540581, -0.019617563113570213, 0...",0.473394,0.065309
1705,P1705,5 gal. #310D-4 Gold Buff Extra Durable Satin E...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recommended,5 gal. #310D-4 Gold Buff Extra Durable Satin E...,"[-0.002173374406993389, -0.013826336711645126,...",0.467005,0.037501


Let's visualize the recommended products.

In [39]:
px.scatter(data, x = 'pc1', y= 'pc2', color = 'product_status', hover_data = 'title')

In [37]:
px.scatter(data, x = 'pc1', y= 'pc2', color = 'product_status', hover_data = 'title')

- We started from a list of 'Recently Viewed' products. Based on them we worked so that we could find similar products for customers.

- In this Visualization we can see the **green dots** that are the **recently viewed products**, and the **red dots** that are the **recommended products**.

- We can also hover data to see if the titles of both categories are comparable and aligned.

 **By analyzing the text embeddings of the viewed products and comparing them with the entire product database, this system will generate recommendations that align with the user's preferences.**

 This prototype will significantly enhance the platform's user experience by offering relevant and engaging product suggestions on the GlimmerGate website main page, ultimately boosting customer satisfaction and retention rates.