## Text Embedding

### Imports

In [1]:
from sentence_transformers import SentenceTransformer
import pandas as pd
import numpy as np

### Setup data and models

`all-MiniLM-L6-v2` is small, fast, and good for many use-cases in text embedding

In [2]:
model = SentenceTransformer('all-MiniLM-L6-v2')

`reviews` and `items` are imported

In [3]:
reviews = pd.read_csv('../datasets/slimmed/reviews.csv')
items = pd.read_csv('../datasets/slimmed/items.csv')

We will want to put the reviews for a particular item with the item

In [32]:
reviews[['parent_asin', 'text']].head()

Unnamed: 0,parent_asin,text
0,B07DK1H3H5,I’m playing on ps5 and it’s interesting. It’s...
1,B07SRWRH5D,Nostalgic fun. A bit slow. I hope they don’t...
2,B07MFMFW34,This was an order for my kids & they have real...
3,B0BCHWZX95,"These work great, They use batteries which is ..."
4,B00HUWA45W,I would recommend to anyone looking to add jus...


Group reviews by `parent_asin` (product ID) and aggregate them into a panda Series 

In [7]:
grouped_texts = (
    reviews
    .groupby('parent_asin')['text']
    .agg(lambda x: ' '.join(x.dropna()))
)

asin_to_text = pd.Series(grouped_texts)

A combined DataFrame of items with the review texts are created (merged on `parent_asin` which is unique for each item)

In [None]:
items_with_reviews = items.merge(right=asin_to_text, on='parent_asin')
items_with_reviews.head()

Unnamed: 0,title,features,description,videos,details,images,parent_asin,categories,average_rating,rating_number,main_category,store,price,text
0,Phantasmagoria: A Puzzle of Flesh,['Windows 95'],[],[],"{'Best Sellers Rank': {'Video Games': 137612, ...",[{'thumb': 'https://m.media-amazon.com/images/...,B00069EVOG,"['Video Games', 'PC', 'Games']",4.1,18,Video Games,Sierra,,Came missing one disk which makes the game tot...
1,NBA 2K17 - Early Tip Off Edition - PlayStation 4,['The #1 rated NBA video game simulation serie...,['Following the record-breaking launch of NBA ...,[{'title': 'NBA 2K17 - Kobe: Haters vs Players...,"{'Release date': 'September 16, 2016', 'Best S...",[{'thumb': 'https://m.media-amazon.com/images/...,B00Z9TLVK0,"['Video Games', 'PlayStation 4', 'Games']",4.3,223,Video Games,2K,58.0,My child love getting this for his birthday. G...
2,Nintendo Selects: The Legend of Zelda Ocarina ...,['Authentic Nintendo Selects: The Legend of Ze...,[],[],"{'Best Sellers Rank': {'Video Games': 51019, '...",[{'thumb': 'https://m.media-amazon.com/images/...,B07SZJZV88,"['Video Games', 'Legacy Systems', 'Nintendo Sy...",4.9,22,Video Games,Amazon Renewed,37.42,Great game for 7 year olds. My son loves it No...
3,"Spongebob Squarepants, Vol. 1",['Bubblestand: SpongeBob shows Patrick and Squ...,['Now you can watch the wild underwater antics...,[],"{'Release date': 'August 15, 2004', 'Best Sell...",[{'thumb': 'https://m.media-amazon.com/images/...,B0001ZNU56,"['Video Games', 'Legacy Systems', 'Nintendo Sy...",4.4,32,Video Games,Majesco,33.98,"Great condition and very clean. Once again, i'..."
4,eXtremeRate Soft Touch Top Shell Front Housing...,['Compatibility Models: Ultra fits for Xbox On...,[],[],"{'Best Sellers Rank': {'Video Games': 48130, '...",[{'thumb': 'https://m.media-amazon.com/images/...,B07H93H878,"['Video Games', 'Xbox One', 'Accessories', 'Fa...",4.5,3061,Video Games,eXtremeRate,17.59,Awesome it was everything I thought it would b...


The `text` column is renamed to `review_texts` in the merged DataFrame

In [9]:
items_with_reviews.rename(columns={'text': 'review_texts'}, inplace=True)

The texts are formatted to make it clearer for the model to understand and use (seperated into pieces)

In [33]:
texts = (
    "TITLE: " + items_with_reviews['title'].fillna('') + '\n' +
    "FEATURES: " + items_with_reviews['features'].fillna('') + '\n' +
    "DESCRIPTION: " + items_with_reviews['description'].fillna('') + '\n' +
    "DETAILS: " + items_with_reviews['details'].fillna('') + '\n' +
    "CATEGORIES: " + items_with_reviews['categories'].fillna('') + '\n' +
    "REVIEWS: " + items_with_reviews['review_texts'].fillna('') + '\n' +
    "STORE: " + items_with_reviews['store'].fillna('')
)

Embeddings are now created for the items using the model defined aboves

In [13]:
item_embeddings = model.encode(texts, batch_size=32, show_progress_bar=True)

Batches:   0%|          | 0/3807 [00:00<?, ?it/s]

A test run with item 0 (Phantasmagoria) shows very relevant items

In [51]:
ITEM_ID = 0

sim_item_id = np.dot(item_embeddings, item_embeddings[ITEM_ID]) \
       / (np.linalg.norm(item_embeddings, axis=1) * np.linalg.norm(item_embeddings[ITEM_ID]))

top5_idx = np.argsort(-sim_item_id)[1:5+1]
print(f"Top 5 similar items to item {ITEM_ID}:", top5_idx)

texts.loc[top5_idx]

Top 5 similar items to item 0: [ 75404   8193  57705 100350  25053]


75404     TITLE: Phantasy Star Universe: Ambition of the...
8193      TITLE: Phantasmat [Download]\nFEATURES: ['Unco...
57705     TITLE: Roberta Williams' Phantasmagoria: Pray ...
100350    TITLE: Tales of Phantasia: Narikiri Dungeon X ...
25053     TITLE: Hidden Expedition The Crown of Solomon ...
dtype: object

In [57]:
items[items['title'].str.find('Assassins Creed Unity (PS4') >= 0]

Unnamed: 0,title,features,description,videos,details,images,parent_asin,categories,average_rating,rating_number,main_category,store,price
41457,Assassins Creed Unity (PS4),"['A RUTHLESS NEW HERO FOR A BRUTAL WORLD', 'BR...","[""Paris, 1789. The French Revolution turns a o...",[],"{'Release date': 'May 26, 2015', 'Best Sellers...",[{'thumb': 'https://m.media-amazon.com/images/...,B071FN3797,"['Video Games', 'PlayStation 4', 'Games']",4.6,2714,Video Games,Ubisoft,19.5


In [63]:
ITEM_ID = 41457
N_SIMILAR = 10

sim_item_id = np.dot(item_embeddings, item_embeddings[ITEM_ID]) \
       / (np.linalg.norm(item_embeddings, axis=1) * np.linalg.norm(item_embeddings[ITEM_ID]))

top_idx = np.argsort(-sim_item_id)[1:N_SIMILAR+1]
print(f"Top {N_SIMILAR} similar items to item {ITEM_ID}:", top_idx)

texts.loc[top_idx]

Top 10 similar items to item 41457: [104971  62760  97258  78933 109664  14382  55419  51207  43650  88347]


104971    TITLE: Assassin's Creed Unity Gold Edition [On...
62760     TITLE: Assassin's Creed Unity\nFEATURES: ['Bra...
97258     TITLE: Assassin's Creed Unity Limited Edition ...
78933     TITLE: Assassin's Creed Unity - Xbox One\nFEAT...
109664    TITLE: Assassin's Creed Triple Pack: Black Fla...
14382     TITLE: Assassin's Creed Unity Limited Edition ...
55419     TITLE: Assassins Creed III GameStop Edition\nF...
51207     TITLE: Assassin's Creed III Xbox 360\nFEATURES...
43650     TITLE: Assassin's Creed Origins - PS4 [Digital...
88347     TITLE: Assassins Creed Chronicles (PS4)\nFEATU...
dtype: object

The size of the embeddings is ~ 0.1GB

In [36]:
print("Memory size of numpy array in GB:", (item_embeddings.size * item_embeddings.itemsize) / 1e9)

Memory size of numpy array in GB: 0.18711552


Embeddings are saved so that they can be loaded for comparisions later

In [26]:
np.save('../data_structures/item_text_embeddings.npy', item_embeddings)

In [1]:
import pandas as pd
import numpy as np

items = pd.read_csv('../datasets/slimmed/items.csv')
item_map = pd.read_csv('../datasets/mappings/item_map.csv')

440 should map to a PS4 or PS5 item

In [9]:
item_map.loc[440].values[0]

'B08FX1Q339'

In [8]:
items[items['parent_asin'] == 'B08FX1Q339']

Unnamed: 0,title,features,description,videos,details,images,parent_asin,categories,average_rating,rating_number,main_category,store,price
1426,"OIVO Dockable Grip Case for Nintendo Switch, H...",['Special Design for Nintendo Switch: Providin...,[],"[{'title': 'Great Switch Case', 'url': 'https:...",{'Package Dimensions': '9.69 x 4.96 x 2.56 inc...,[{'thumb': 'https://m.media-amazon.com/images/...,B08FX1Q339,"['Video Games', 'Nintendo Switch', 'Accessorie...",3.8,86,All Electronics,OIVO,


In [4]:
parent_asins = item_map.loc[[np.int64(76266), np.int64(440), np.int64(77189), np.int64(30410), np.int64(63588), np.int64(2234), np.int64(56219), np.int64(49786), np.int64(13360), np.int64(87542)]]['parent_asin'].tolist()
parent_asins

['B0B6CFX35R',
 'B08FX1Q339',
 'B01FKBAVG6',
 'B000SSQPU8',
 'B077BVHVVB',
 'B00JA81WQE',
 'B0B7DQBCLT',
 'B00O9W2YT0',
 'B06WVK3L1B',
 'B007L3NDJE']

In [5]:
items[items['parent_asin'].isin(parent_asins)]['title']

1426      OIVO Dockable Grip Case for Nintendo Switch, H...
1779      SCUF Infinity1 Smurf Controller for Xbox One a...
26020     WraptorSkinz Decal Vinyl Skin Wrap compatible ...
34572       Final Fantasy XII: Revenant Wings - Nintendo DS
53680              Buyee 128MB Memory Card Game Memory Card
63453     DLseego Gengar Game Case For Switch Lite / Swi...
72877     Chronicle Keepers The Dreaming Garden Collecto...
72905        The Jackbox Party Trilogy - PS4 [Digital Code]
101818                    SONY BEJEWELED 3 PS3 [video game]
106584    FIFA 23 Standard Edition Playstation 5 (PS5)| ...
Name: title, dtype: object

In [6]:
items[items['parent_asin'].isin(parent_asins)]

Unnamed: 0,title,features,description,videos,details,images,parent_asin,categories,average_rating,rating_number,main_category,store,price
1426,"OIVO Dockable Grip Case for Nintendo Switch, H...",['Special Design for Nintendo Switch: Providin...,[],"[{'title': 'Great Switch Case', 'url': 'https:...",{'Package Dimensions': '9.69 x 4.96 x 2.56 inc...,[{'thumb': 'https://m.media-amazon.com/images/...,B08FX1Q339,"['Video Games', 'Nintendo Switch', 'Accessorie...",3.8,86,All Electronics,OIVO,
1779,SCUF Infinity1 Smurf Controller for Xbox One a...,[],[],[],{'Pricing': 'The strikethrough price is the Li...,[{'thumb': 'https://m.media-amazon.com/images/...,B01FKBAVG6,[],4.0,7,Video Games,Unknown,
26020,WraptorSkinz Decal Vinyl Skin Wrap compatible ...,[],['WraptorSkinz skins are superb photo quality ...,[{'title': 'How to apply WraptorSkinz XBOX gam...,"{'Best Sellers Rank': {'Video Games': 142490, ...",[{'thumb': 'https://m.media-amazon.com/images/...,B077BVHVVB,"['Video Games', 'Xbox One', 'Accessories', 'Fa...",4.0,6,Video Games,WraptorSkinz,15.95
34572,Final Fantasy XII: Revenant Wings - Nintendo DS,['Join FFXII characters in a new story'],['Final Fantasy XII: Revenant Wings - Nintendo...,[],"{'Release date': 'November 21, 2007', 'Best Se...",[{'thumb': 'https://m.media-amazon.com/images/...,B000SSQPU8,"['Video Games', 'Legacy Systems', 'Nintendo Sy...",4.6,297,Video Games,Square Enix,
53680,Buyee 128MB Memory Card Game Memory Card,"['Durable, compact design 128MB, 2043 Blocks o...",['Buyee 128MB Memory Card for Sony Playstation...,[],"{'Release date': 'March 1, 2017', 'Best Seller...",[{'thumb': 'https://m.media-amazon.com/images/...,B00JA81WQE,"['Video Games', 'Legacy Systems', 'PlayStation...",4.6,1933,Video Games,Buyee,
63453,DLseego Gengar Game Case For Switch Lite / Swi...,['👻LARGE-CAPACITY & PORTABLE -- Maximize the u...,[],[],"{'Brand Name': 'DLseego', 'Item Weight': '1.76...",[{'thumb': 'https://m.media-amazon.com/images/...,B0B7DQBCLT,"['Video Games', 'Nintendo Switch', 'Accessorie...",4.9,96,Video Games,DLseego,11.99
72877,Chronicle Keepers The Dreaming Garden Collecto...,"['o 8 new music tracks', 'o 6 new wallpapers',...","['Have you ever had this weird feeling, that t...",[],"{'Release date': 'September 29, 2014', 'Pricin...",[{'thumb': 'https://m.media-amazon.com/images/...,B00O9W2YT0,"['Video Games', 'PC', 'Games']",3.0,1,Video Games,Chronicle Keepers The Dreaming Garden Collecto...,
72905,The Jackbox Party Trilogy - PS4 [Digital Code],[],['NOTE: All games are in English and are local...,[],"{'Release date': 'February 16, 2017', 'Pricing...",[{'thumb': 'https://m.media-amazon.com/images/...,B06WVK3L1B,"['Video Games', 'PlayStation Digital Content',...",3.8,9,Video Games,"Jackbox Games, Inc.",
101818,SONY BEJEWELED 3 PS3 [video game],[],[],[],"{'Best Sellers Rank': {'Video Games': 18642, '...",[{'thumb': 'https://m.media-amazon.com/images/...,B007L3NDJE,"['Video Games', 'Legacy Systems', 'PlayStation...",3.7,17,Other,Electronic Arts,
106584,FIFA 23 Standard Edition Playstation 5 (PS5)| ...,['EA SPORTS FIFA 23 brings even more of the ac...,['FIFA 23 Standard Edition Playstation 5 (PS5)...,[],"{'Best Sellers Rank': {'Video Games': 482, 'Pl...",[{'thumb': 'https://m.media-amazon.com/images/...,B0B6CFX35R,"['Video Games', 'PlayStation 5', 'Games']",4.7,1861,Video Games,Electronic Arts,39.99
