# Amazon electronics dataset exploration

## 2018 Amazon Review Data

A subset of the Amazon Review Data (2018), the electronics category data is roughly 20M engagements from Amazon users.  

*Source*: Justifying recommendations using distantly-labeled reviews and fined-grained aspects
Jianmo Ni, Jiacheng Li, Julian McAuley
Empirical Methods in Natural Language Processing (EMNLP), 2019, https://nijianmo.github.io/amazon/index.html

In [3]:
!ls ../data

Electronics.csv


In [6]:
import pandas as pd
df = pd.read_csv('../data/Electronics.csv')

Ratings only: These datasets include no metadata or reviews, but only (item,user,rating,timestamp) tuples. Thus they are suitable for use with mymedialite (or similar) packages.

In [13]:
df.columns = ["item", "user", "rating", "timestamp"]

In [14]:
df.head()

Unnamed: 0,item,user,rating,timestamp
0,60009810,A3P0KRKOBQK1KN,5.0,1025913600
1,60009810,A192HO2ICJ75VU,5.0,1025654400
2,60009810,A2T278FKFL3BLT,4.0,1025395200
3,60009810,A2ZUXVTW8RXBXW,5.0,1025222400
4,60009810,A21JDG4HA6OLPF,4.0,1024963200


In [10]:
len(df)

20994352

Hmm... are the four columns sufficient for our system? Can we infer a purchase based on the presence of a rating? Do we assume a user with no rating for a product failed to purchase? Yeesh... that doesn't seem supportable. I guess the prediction here is not whether they bought it but whether they were motivated to source a review. Here the review becomes the reward, not the sale ... go off and read the paper: https://cseweb.ucsd.edu/~jmcauley/pdfs/emnlp19a.pdf

In [17]:
df.describe()

Unnamed: 0,rating,timestamp
count,20994350.0,20994350.0
mean,4.073685,1425967000.0
std,1.385792,77692010.0
min,1.0,881193600.0
25%,4.0,1394064000.0
50%,5.0,1440634000.0
75%,5.0,1478736000.0
max,5.0,1538698000.0


In [16]:
df.item.value_counts()

item
B010OYASRG    28539
B00L0YLRUW    20873
B00DIF2BO2    17045
B006GWO5WK    16130
B003L1ZYYW    16056
              ...  
B00GXO0W5K        1
B00GXOIMF2        1
B00A8ZGSOE        1
B003ZTYGMG        1
B01HJF4DUG        1
Name: count, Length: 756489, dtype: int64

## 2023 Amazon Reviews Data

In [19]:
!ls ../data

Electronics.csv        meta_Electronics.jsonl


2023 publication, see https://amazon-reviews-2023.github.io/

In [4]:
import json 
import pandas as pd

df = pd.read_json("../data/meta_Electronics.jsonl", lines=True, nrows=10000) 

In [7]:
df.tail(10)

Unnamed: 0,main_category,title,average_rating,rating_number,features,description,price,images,videos,store,categories,details,parent_asin,bought_together,subtitle,author
9990,Computers,Micro USB to Type C OTG Cable 90 Degree Video ...,4.6,3,[],[],,[{'thumb': 'https://m.media-amazon.com/images/...,[],AKindle,"[Electronics, Computers & Accessories, Compute...","{'Brand': 'AKindle', 'Connector Type': 'Micro ...",B08HLS97QX,,,
9991,,Nutone IS69WH Intercom Door Speaker,3.0,1,[],[Brand new in box This door speaker is designe...,,[{'thumb': 'https://m.media-amazon.com/images/...,[],Nutone,"[Electronics, Security & Surveillance, Home Se...","{'Manufacturer': 'Nutone', 'Item model number'...",B008U9OCVG,,,
9992,All Electronics,"CableCreation 3.5mm Aux Cable, 3.5mm Male to M...",4.6,1185,[►[Incredibly Quality]: Premium zinc alloy cas...,[],9.99,[{'thumb': 'https://m.media-amazon.com/images/...,"[{'title': '3.5mm Nylon Braided Cord', 'url': ...",CableCreation,"[Electronics, Home Audio, Home Audio Accessori...","{'Brand': 'CableCreation', 'Connector Type': '...",B07LCJ4SXG,,,
9993,Automotive,Red Hound Auto Screen Saver 2pc Compatible wit...,3.6,9,"[Easy, Wet Installation! Deluxe kit includes p...",[Deluxe Red Hound Auto Clear Screen Protector:...,19.98,[{'thumb': 'https://m.media-amazon.com/images/...,[],Red Hound Auto,"[Electronics, GPS, Finders & Accessories, GPS ...","{'Manufacturer': 'Red Hound Auto', 'Brand': 'R...",B07YMZ1S51,,,
9994,All Electronics,SZJELEN SP21 2Pin-12Pin Aviation Cable Connect...,4.3,67,[Product Name : 4Pin Waterproof Connector; Mod...,"[Aviation Cable Connector + Socket, IP67 Water...",10.59,[{'thumb': 'https://m.media-amazon.com/images/...,[{'title': 'dc power jack Stripping actual mea...,SZJELEN,"[Electronics, Home Audio, Home Audio Accessori...",{'Package Dimensions': '4.84 x 3.94 x 1.14 inc...,B07C681YVZ,,,
9995,Cell Phones & Accessories,"Artyond Case For 6"" Kindle Paperwhite (10th Ge...",4.6,1095,[[Compatible Model] Designed exclusively for A...,"[Feature:, [Auto Sleep/Wake Feature] [Full Pro...",18.99,[{'thumb': 'https://m.media-amazon.com/images/...,[{'title': 'Fintie Slimshell Case for All-New ...,Artyond,"[Electronics, eBook Readers & Accessories, Cov...",{'Product Dimensions': '6.3 x 5.12 x 0.79 inch...,B0B3TK7C48,,,
9996,All Electronics,UpBright New 6V AC/DC Adapter Compatible with ...,4.5,2,[World Wide Input Voltage 100-240VAC 50/60Hz.O...,[UpBright NEW 6V AC / DC Adapter Compatible wi...,9.99,[{'thumb': 'https://m.media-amazon.com/images/...,[],UPBRIGHT,"[Electronics, Camera & Photo, Accessories, Bat...",{'Item model number': 'Summer Infant 29040 292...,B00P5T4O58,,,
9997,All Electronics,HappyZone Silicone Skin Case Cover for SanDisk...,4.4,405,[Custom made designed for Sandisk Clip Sport 2...,[This item compatible with SanDisk Clip Sport ...,,[{'thumb': 'https://m.media-amazon.com/images/...,[],HappyZone,"[Electronics, Portable Audio & Video, MP3 & MP...","{'Package Dimensions': '6 x 5.4 x 0.6 inches',...",B00LOD3UZ4,,,
9998,All Electronics,2 Pack Apple Earbuds [Apple MFi Certified] Hea...,4.2,56,[],[Headphones],,[{'thumb': 'https://m.media-amazon.com/images/...,[],Hyperian,"[Electronics, Headphones, Earbuds & Accessorie...","{'Product Dimensions': '2 x 0.8 x 0.4 inches',...",B09BZ616M1,,,
9999,All Electronics,Gladiator Joe Monitor Arm/Mount - VESA Adapter...,4.5,110,"[Part of our Dark Series, this elegant Monitor...","[Part of our Dark Series, this elegant Monitor...",29.99,[{'thumb': 'https://m.media-amazon.com/images/...,[{'title': 'Intel NUC Monitor Mount with Human...,Gladiator Joe,"[Electronics, Computers & Accessories, Compute...","{'Brand Name': 'Gladiator Joe', 'Item Weight':...",B07QC94ZZ3,,,


In [8]:
df.iloc[-1]

main_category                                        All Electronics
title              Gladiator Joe Monitor Arm/Mount - VESA Adapter...
average_rating                                                   4.5
rating_number                                                    110
features           [Part of our Dark Series, this elegant Monitor...
description        [Part of our Dark Series, this elegant Monitor...
price                                                          29.99
images             [{'thumb': 'https://m.media-amazon.com/images/...
videos             [{'title': 'Intel NUC Monitor Mount with Human...
store                                                  Gladiator Joe
categories         [Electronics, Computers & Accessories, Compute...
details            {'Brand Name': 'Gladiator Joe', 'Item Weight':...
parent_asin                                               B07QC94ZZ3
bought_together                                                  NaN
subtitle                          

In [11]:
df.iloc[-1].images

[{'thumb': 'https://m.media-amazon.com/images/I/21xceOa15HL._AC_US40_.jpg',
  'large': 'https://m.media-amazon.com/images/I/21xceOa15HL._AC_.jpg',
  'variant': 'MAIN',
  'hi_res': 'https://m.media-amazon.com/images/I/51RmmrYG1iL._AC_SL1500_.jpg'},
 {'thumb': 'https://m.media-amazon.com/images/I/41THPF43FsL._AC_US40_.jpg',
  'large': 'https://m.media-amazon.com/images/I/41THPF43FsL._AC_.jpg',
  'variant': 'PT01',
  'hi_res': 'https://m.media-amazon.com/images/I/512bSEnzQGL._AC_SL1000_.jpg'},
 {'thumb': 'https://m.media-amazon.com/images/I/21ddI8GKwFL._AC_US40_.jpg',
  'large': 'https://m.media-amazon.com/images/I/21ddI8GKwFL._AC_.jpg',
  'variant': 'PT02',
  'hi_res': 'https://m.media-amazon.com/images/I/41klRPJHHWL._AC_SL1500_.jpg'},
 {'thumb': 'https://m.media-amazon.com/images/I/31szAi45SzL._AC_US40_.jpg',
  'large': 'https://m.media-amazon.com/images/I/31szAi45SzL._AC_.jpg',
  'variant': 'PT03',
  'hi_res': 'https://m.media-amazon.com/images/I/71ZIQF2ObSL._AC_SL1500_.jpg'},
 {'thumb

## Model Design 

Managing text-based reviews at this scale could be a challenge, and I'd like to steer clear of LLMs for this effort. We could do an embedding on the review and use that for similarity, but we have pretty rich item data. Perhaps let's ignore the collaborative aspect here and build a shopping interface that: 
- surfaces the most popular items, and encourages you to add items to your shopping cart for a big discount/promo
- based on clicks and cart items, improves the recommendations and surfaces new products

This suggests a dynamic aspect to modeling I haven't encountered before. Is this an opportunity for online learning? What would be the inputs and outputs? How would we train a model to accept a list of items and then emit candidate items? .. 