**Amazon Online Review Classification: Identifying Reviews regarding Electronic/Computer Products and Fashionwear/clothing Products**

1. Electronic/Computer Products: Reviews discussing or concerning electronic or computer-related products available on Amazon.
2. Clothing/Fashion-Related Products: Reviews discussing or concerning products related to various fashionwear available on Amazon.
3. Neutral: Comments addressing non-related issues not specifically related to any of the above categories.

Dataset: The dataset used for this analysis will consist of user reviews from Amazon's online platform. It will contain text data along with corresponding labels indicating the topic category (electronic/computer, clothing/fashionwear, or neutral).

Expected Outcomes: By the end of this notebook, we aim to have a trained and validated model capable of accurately classifying user reviews into the specified categories. This information will provide valuable insights into the customer preferences and sentiments towards electronic/computer products and clothing/fashion products on Amazon's platform.

Note: The code and methodologies presented in this notebook can be adapted for other analyzing any sort of bias in the given data

In [1]:
import pandas as pd
import numpy as np

# data_link = https://www.kaggle.com/datasets/kritanjalijain/amazon-reviews?select=test.csv
# download and place the csv file in the data folder
data = pd.read_csv("data/amazon_test.csv")
# data.rename(columns={"review" : "text"}, inplace=True)
data = data[["text"]]
data = data.sample(n = 35_000)
data.index = np.arange(len(data))
data['text'] = data['text'].replace(regex='(@\w+)|#|&|!',value='')
data['text'] = data['text'].replace(regex=r'http\S+', value='')

# Filtering out one-word review comments
data = data[data.index.isin(
    [i for i, s in enumerate(data.text) if len(s.split()) >= 3]
)]
data

FileNotFoundError: [Errno 2] No such file or directory: 'data/amazon_test.csv'

In [10]:
from src.wordbiases import CalculateWordBias
from src.model import LikelihoodModelForNormalDist

target_set_1 = ["electronic", "tech", "computer", "mobile"]
target_set_2   = ["cloth", "shirt", "jeans", "fabric"] 
F = ["ADJ", "NOUN", "PRON"]
wbcalc = CalculateWordBias(target_set_1, target_set_2, F, computing_device="cuda")
wbcalc.process_documents(data, "text")
c1, c2 = wbcalc.calculate_target_embeddings()

wbiases, _ , biased_words = wbcalc.calculate_biases()
total_pop = [b for _, b in biased_words]

mu = np.mean(total_pop)
sigma = np.std(total_pop)

# TODO : Find a way to compute or estimate t1 or t2
likelihood_clf = LikelihoodModelForNormalDist(0.05, 0.95, threshold_limit=0)
likelihood_clf.fit_total_pop(total_pop)

preds = likelihood_clf.predict(wbiases)[0]
data["prclass"] = preds

100%|██████████| 25150/25150 [03:38<00:00, 115.16it/s]
100%|██████████| 25150/25150 [00:11<00:00, 2235.20it/s]


Reviews for Electronic prodcuts

In [11]:
for s in data[data.prclass == 0].text:
    print(s)

VHF Hand held Radio
The Flybar 800
300 meets Gladiator
The Ruination of Sleep
Great New Band
Must Have for NOMEANSNO Fans
THE VIDEO is not working for Skype for my Windows....
Great Book on Islam  End Times
Low Price, Great Game
Not so Universal
Good Relief for Old Folks
Buy Tissues with the CD
Fun pop experience
Does not fit Aprilaire 760
DVD Box set
Bionic Women's Rose Gloves
Canon Powershot S2 IS 5MP Digital Camera
Bait and switch
Ana Voog Needs To Put Out A New Album
Will not cut
Talk about dissapointing
do not buy
great phone, but wireless network killer
Cry, Laugh and Educate
The Nrsv Bible Cross Reference Edition with Apocrypha
A Travel Guide to Israel?
TIME IS SLIPPING AWAY
Great Movie, Bad Unbox...
Sample Contract is Worth the Price of the Book
Love my Kindle
Amazing Jimi Hendrix
Definitely a MUST
The Heart of A Chief
Deliberate and Majestic
Un mal disco
Lisa Pavelka is Amazing
A Great Guide and Resource
The Alternative to Ye Olde Spanking...
Review for Frank Lloyd Wright's Fa

Reviews for Clothing prodcuts

In [12]:
for s in data[data.prclass == 1].text:
    print(s)


Just use a piece of rope
Best shoes ever
Hymns with style
Thin bottom does not heat evenly
The Wind Done Gone
Love the color....
Ugly, Ugly, Ugly
how now brown cow
okay...but a lot of unnecessary material
The boy in the striped pyjamas
Poor Quality Collars
not really waterproof, not really winter plus
Wonderfully touching and inspiring
GET THESE HAMSTERS
Great Stuff...a must to keep wound dry
Sizing is way off
Good...until the rivets snap
Fasten Your Seatbelts
Covers don't quite measure up
Fits like a glove
attractive bag...poor workmanship
Arthrosoothe does soothe
Pretty thin Fatboy
Cute comfy shoes in PINK
Best blanket I've found
Amulents and Talismans - By Migene Gonzales-Wippler
Beware of mold
Make any bag a diaper bag
Ineffective sealing, rubber seal slips
The chimes just play for curtain hours???
Perfect Bridesmaid dress at a great cost
Slaves to the Rhythm DVD
Shower curtain hooks
Quarter inch mesh
Great Running Sunglasses
Makes a mess
This isn't Star Trek material
Nice cover - 

Regarding other Non-related products

In [13]:
for s in data[data.prclass == 2].text:
    print(s)

Not what I was looking for
The truth and nothing but the truth.....
Not worth it - taped @ extended/super-long play  inaudible.
The Best Murder Mystery Ever
Great toy for active kids
A Great Case for a cheap Phone, or its backwards?
Right up there with a Time to Kill
A five-star compilation of stories to rank among Bass' best.
Wonderful For Family and Friends
Super photos and descriptions
Not happy with this bug zapper
Thoughtful Alternate History
A Mainstay in My Kitchen
ASTONISHINGLY GORY AND INCREDIBLY FUNNY A MUST
Decent design, terrible materials
Don't buy from this business.
The place to start for spiritual disciplines
Didn't work for me
Brides head revisited
Boring and slow
Absolutely the worst actors in a kung fu movie
Was broken by 9 months
finally we sleep
Have loved this stuff for more than 25 years
baby must have
Very Good Entertainment
Great lift the flap book
Pricey good eatin'
Coogan = Funny
Knob stuck, uneven cooking.
Perfect for any mother
SO SORRY I GOT THIS
Wonderful