# **Welcome to the notebook**

### Task 1 - Set up project environment

Installing the needed modules

In [1]:
!pip install -q -U google-generativeai python-dotenv

Importing the needed modules and setup the OpenAI API

In [22]:
import pandas as pd
import numpy as np
import os
from dotenv import load_dotenv
from matplotlib import pyplot as plt
import plotly.express as px
import google.generativeai as genai
from sklearn.decomposition import PCA
from sklearn.metrics.pairwise import cosine_similarity

# Loading API key and organization ID from a dotenv file
load_dotenv(dotenv_path='apikey.env.txt')

# Retrieving API key and organization ID from environment variables
APIKEY = os.getenv("APIKEY")

# Setting the API key and organization ID for Google's Generative AI
genai.configure(api_key=APIKEY)

client = genai.GenerativeModel('gemini-1.5-flash')

client

genai.GenerativeModel(
    model_name='models/gemini-1.5-flash',
    generation_config={},
    safety_settings={},
    tools=None,
    system_instruction=None,
    cached_content=None
)

Import our dataset

In [23]:
df = pd.read_csv('/content/products_dataset.csv')
df.head()

Unnamed: 0,product_id,title,description
0,P0,Men's 3X Large Carbon Heather Cotton/Polyester...,"This heavyweight, water-repellent hooded sweat..."
1,P1,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,If you need more length between your existing ...
2,P2,Large Tapestry Bolster Bed,Polyester cover resembling rich Italian tapest...
3,P3,16-Gauge-Sinks Vessel Sink in White with Faucet,It features a rectangle shape. This vessel set...
4,P4,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,This 9 in. black full grain leather logger boo...


List of last 8 products recently viewed by the user.

In [24]:
searched_products_id = [
    'P1938',
    'P1970',
    'P1044',
    'P1838',
    'P1048',
    'P1017',
    'P1310',
    'P1444',
]

### Task 2 - Prepare the dataset

Let's label the data points that are recently veiwed.

In [25]:
df['product_status'] = 'not_viewed'
df.loc[df['product_id'].isin(searched_products_id), 'product_status'] = 'recently_viewed'
df[df.product_status == 'recently_viewed']

Unnamed: 0,product_id,title,description,product_status
1017,P1017,1 qt. #660D-7 Blackberry Farm Satin Enamel Int...,Love your space like never before with the hig...,recently_viewed
1044,P1044,1 qt. #M360-4 Marjoram One-Coat Hide Eggshell ...,Introducing the best of BEHR Paint. Featuring ...,recently_viewed
1048,P1048,5 gal. #640C-1 Hosta Flower Extra Durable Sati...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently_viewed
1310,P1310,5 gal. #180A-2 Romantic Morn Extra Durable Sem...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently_viewed
1444,P1444,5 gal. #PPU12-17 Cameroon Green Extra Durable ...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently_viewed
1838,P1838,5 gal. #N340-2 Dune Grass Extra Durable Satin ...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recently_viewed
1938,P1938,1 gal. #HDC-SP16-10 Japanese Rose Garden Semi-...,Introducing the best of BEHR Paint. Featuring ...,recently_viewed
1970,P1970,8 oz. #510C-3 Rivers Edge Semi-Gloss Enamel St...,Introducing the best of BEHR Paint. Featuring ...,recently_viewed


Now let's combine the product `title` and `description` and store it into a column called `combined`.

In [26]:
df['combined'] = df['title'] + ' ' + df['description']
df.head()

Unnamed: 0,product_id,title,description,product_status,combined
0,P0,Men's 3X Large Carbon Heather Cotton/Polyester...,"This heavyweight, water-repellent hooded sweat...",not_viewed,Men's 3X Large Carbon Heather Cotton/Polyester...
1,P1,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,If you need more length between your existing ...,not_viewed,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...
2,P2,Large Tapestry Bolster Bed,Polyester cover resembling rich Italian tapest...,not_viewed,Large Tapestry Bolster Bed Polyester cover res...
3,P3,16-Gauge-Sinks Vessel Sink in White with Faucet,It features a rectangle shape. This vessel set...,not_viewed,16-Gauge-Sinks Vessel Sink in White with Fauce...
4,P4,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,This 9 in. black full grain leather logger boo...,not_viewed,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...


### Task 3 - Text embedding and visualization


In [7]:
text = "Hello World!"
result = genai.embed_content(
    model="models/text-embedding-004", content=text, output_dimensionality=1
)
print(result["embedding"])

[0.002266675]


Creating the text embedding vectors

In [8]:
from tqdm import tqdm

results = []
for item in tqdm(df['combined'], desc="Embedding content"):
    results.append(genai.embed_content(
        model="models/text-embedding-004",
        content=[item],  # Embed one item at a time
        output_dimensionality=128,
    ))

Embedding content: 100%|██████████| 2000/2000 [32:17<00:00,  1.03it/s]


In [30]:
for i in (results[0]['embedding'][0]):
  print(i)
  break

0.02661769


In [31]:
df['text_embedding'] = [result['embedding'][0] for result in results]
df.head()

Unnamed: 0,product_id,title,description,product_status,combined,text_embedding
0,P0,Men's 3X Large Carbon Heather Cotton/Polyester...,"This heavyweight, water-repellent hooded sweat...",not_viewed,Men's 3X Large Carbon Heather Cotton/Polyester...,"[0.02661769, 0.021672012, -0.0073499964, -0.00..."
1,P1,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,If you need more length between your existing ...,not_viewed,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,"[0.0054305517, -0.03324972, -0.039408334, 0.02..."
2,P2,Large Tapestry Bolster Bed,Polyester cover resembling rich Italian tapest...,not_viewed,Large Tapestry Bolster Bed Polyester cover res...,"[-0.024200058, 0.04525282, 0.02500831, 0.00932..."
3,P3,16-Gauge-Sinks Vessel Sink in White with Faucet,It features a rectangle shape. This vessel set...,not_viewed,16-Gauge-Sinks Vessel Sink in White with Fauce...,"[-0.012281764, -0.023819033, 0.013256148, -0.0..."
4,P4,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,This 9 in. black full grain leather logger boo...,not_viewed,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,"[-0.001543015, 0.026881523, -0.029850263, -0.0..."


In [32]:
df['text_embedding'].shape

(2000,)

> We know that each vector has 512 dimensions. In order to be able to visualize the vectors in a scatter plot, we need to use Principal Component Analysis (PCA) to reduce the dimension from 512 to 2.

In [33]:
pca=PCA(n_components=2)
vector_2d = pca.fit_transform(np.array(df['text_embedding'].tolist()) )
df['pca1'] = vector_2d[:,0]
df['pca2'] = vector_2d[:,1]
df.head()


Unnamed: 0,product_id,title,description,product_status,combined,text_embedding,pca1,pca2
0,P0,Men's 3X Large Carbon Heather Cotton/Polyester...,"This heavyweight, water-repellent hooded sweat...",not_viewed,Men's 3X Large Carbon Heather Cotton/Polyester...,"[0.02661769, 0.021672012, -0.0073499964, -0.00...",-0.085661,0.012239
1,P1,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,If you need more length between your existing ...,not_viewed,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,"[0.0054305517, -0.03324972, -0.039408334, 0.02...",-0.129703,-0.063566
2,P2,Large Tapestry Bolster Bed,Polyester cover resembling rich Italian tapest...,not_viewed,Large Tapestry Bolster Bed Polyester cover res...,"[-0.024200058, 0.04525282, 0.02500831, 0.00932...",-0.099862,0.114211
3,P3,16-Gauge-Sinks Vessel Sink in White with Faucet,It features a rectangle shape. This vessel set...,not_viewed,16-Gauge-Sinks Vessel Sink in White with Fauce...,"[-0.012281764, -0.023819033, 0.013256148, -0.0...",-0.068318,-0.105377
4,P4,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,This 9 in. black full grain leather logger boo...,not_viewed,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,"[-0.001543015, 0.026881523, -0.029850263, -0.0...",-0.134826,0.032858


Now that we have the text embedding vectors in two dimensions, we can use them to create a 2D plot.

In [34]:
px.scatter(df, x='pca1', y='pca2', color='product_status', hover_data=['title', 'description'])

### Task 4 - Find similar products

In [35]:
df.head()

Unnamed: 0,product_id,title,description,product_status,combined,text_embedding,pca1,pca2
0,P0,Men's 3X Large Carbon Heather Cotton/Polyester...,"This heavyweight, water-repellent hooded sweat...",not_viewed,Men's 3X Large Carbon Heather Cotton/Polyester...,"[0.02661769, 0.021672012, -0.0073499964, -0.00...",-0.085661,0.012239
1,P1,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,If you need more length between your existing ...,not_viewed,Turmode 30 ft. RP TNC Female to RP TNC Male Ad...,"[0.0054305517, -0.03324972, -0.039408334, 0.02...",-0.129703,-0.063566
2,P2,Large Tapestry Bolster Bed,Polyester cover resembling rich Italian tapest...,not_viewed,Large Tapestry Bolster Bed Polyester cover res...,"[-0.024200058, 0.04525282, 0.02500831, 0.00932...",-0.099862,0.114211
3,P3,16-Gauge-Sinks Vessel Sink in White with Faucet,It features a rectangle shape. This vessel set...,not_viewed,16-Gauge-Sinks Vessel Sink in White with Fauce...,"[-0.012281764, -0.023819033, 0.013256148, -0.0...",-0.068318,-0.105377
4,P4,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,This 9 in. black full grain leather logger boo...,not_viewed,Men's Crazy Horse 9'' Logger Boot - Steel Toe ...,"[-0.001543015, 0.026881523, -0.029850263, -0.0...",-0.134826,0.032858


Get the data related to `recently_viewed` and `not_viewed` products

In [36]:
df_recently_viewed = df[df.product_status == 'recently_viewed']
df_not_viewed = df[df.product_status == 'not_viewed']

Convert the embedding vectors to Numpy arrays

In [None]:
df_recently_viewed['text_embedding'] = df_recently_viewed['text_embedding'].apply(np.array)
df_not_viewed['text_embedding'] = df_not_viewed['text_embedding'].apply(np.array)

Find the similarity between each viewed product and all the unviewed products.

In [40]:
similarity_matrix = cosine_similarity(df_recently_viewed['text_embedding'].tolist(), df_not_viewed['text_embedding'].tolist())
top_ids=[]
for row in similarity_matrix:
  top_id = np.argmax(row)
  top_ids.append(top_id)



In [42]:
most_similar_products = df_not_viewed.iloc[top_ids].product_id.tolist()
most_similar_products

['P1480', 'P1183', 'P300', 'P575', 'P1327', 'P1816', 'P976', 'P469']

### Task 5 - Recommend products based on the searched products

Let's update the status of the top similar products to `recommended`.

In [43]:
df.loc[df['product_id'].isin(most_similar_products), 'product_status'] = 'recommended'
df[df.product_status == 'recommended']

Unnamed: 0,product_id,title,description,product_status,combined,text_embedding,pca1,pca2
300,P300,1 qt. #MQ3-54 Dayflower Extra Durable Satin En...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recommended,1 qt. #MQ3-54 Dayflower Extra Durable Satin En...,"[0.011734788, -0.02116309, -0.020741303, -0.02...",0.199416,0.024334
469,P469,8 oz. 560C-3 Holiday Road Matte Stain-Blocking...,Introducing the best of BEHR Paint. Featuring ...,recommended,8 oz. 560C-3 Holiday Road Matte Stain-Blocking...,"[-0.004898785, -0.014543502, 0.022011619, -0.0...",0.171787,0.02644
575,P575,5 gal. #S480-1 Rain Dance Extra Durable Semi-G...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recommended,5 gal. #S480-1 Rain Dance Extra Durable Semi-G...,"[-0.002327398, -0.032607585, -0.009485828, -0....",0.170865,-0.000843
976,P976,1 qt. #240C-2 Heavenly Song Eggshell Enamel In...,Introducing the best of BEHR Paint. Featuring ...,recommended,1 qt. #240C-2 Heavenly Song Eggshell Enamel In...,"[0.0056962622, -0.044923574, -0.015496335, -0....",0.20729,0.011227
1183,P1183,1 qt. #PPU3-02 Marmalade Glaze Eggshell Enamel...,Introducing the best of BEHR Paint. Featuring ...,recommended,1 qt. #PPU3-02 Marmalade Glaze Eggshell Enamel...,"[-0.009784818, -0.04942396, -0.012847489, -0.0...",0.22441,0.029056
1327,P1327,5 gal. #MQ4-44 Green Dynasty Extra Durable Egg...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recommended,5 gal. #MQ4-44 Green Dynasty Extra Durable Egg...,"[-0.002912318, -0.01188779, -0.04778934, -0.02...",0.209698,0.022958
1480,P1480,1 gal. #S440-7 Thermal One-Coat Hide Eggshell ...,Love your space like never before with the hig...,recommended,1 gal. #S440-7 Thermal One-Coat Hide Eggshell ...,"[0.010610258, -0.05519484, -0.010768119, -0.04...",0.207471,0.018006
1816,P1816,1 qt. #N370-5 Incognito Extra Durable Semi-Glo...,BEHR ULTRA SCUFF DEFENSE Stain-Blocking Paint ...,recommended,1 qt. #N370-5 Incognito Extra Durable Semi-Glo...,"[-0.016892718, -0.037415806, -0.020209111, -0....",0.219438,0.001347


Let's visualize the recommended products.

In [44]:
px.scatter(df, x='pca1', y='pca2', color='product_status', hover_data=['title', 'description'])