<a href="https://colab.research.google.com/github/BTT-Cadence-Design-Systems-2A/AI-Studio-Project/blob/absa/Cadence_2A.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 **Install libraries**

In [7]:
!pip install -U datasets huggingface_hub

Collecting huggingface_hub
  Using cached huggingface_hub-1.1.2-py3-none-any.whl.metadata (13 kB)
Using cached huggingface_hub-1.1.2-py3-none-any.whl (514 kB)
Installing collected packages: huggingface_hub
  Attempting uninstall: huggingface_hub
    Found existing installation: huggingface-hub 0.36.0
    Uninstalling huggingface-hub-0.36.0:
      Successfully uninstalled huggingface-hub-0.36.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
transformers 4.57.1 requires huggingface-hub<1.0,>=0.34.0, but you have huggingface-hub 1.1.2 which is incompatible.
Successfully installed huggingface_hub-1.1.2


**Imports & config**

In [8]:
import json
import fsspec
from itertools import islice
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize

nltk.download("punkt")
nltk.download("wordnet")
nltk.download("omw-1.4")
nltk.download("punkt_tab")

REPO = "McAuley-Lab/Amazon-Reviews-2023"


CATEGORIES = ["Electronics"]
ALL_CATEGORIES = ["All_Beauty", "Amazon_Fashion", "Appliances", "Arts_Crafts_and_Sewing", "Automotive", "Baby_Products", "Beauty_and_Personal_Care", "Books",
              "CDs_and_Vinyl", "Cell_Phones_and_Accessories", "Clothing_Shoes_and_Jewelry", "Digital_Music", "Electronics", "Gift_Cards", "Grocery_and_Gourmet_Food",
              "Handmade_Products", "Health_and_Household", "Health_and_Personal_Care", "Home_and_Kitchen", "Industrial_and_Scientific",
              "Kindle_Store", "Magazine_Subscriptions", "Movies_and_TV", "Musical_Instruments", "Office_Products", "Patio_Lawn_and_Garden", "Pet_Supplies",
              "Software", "Sports_and_Outdoors", "Subscription_Boxes", "Tools_and_Home_Improvement", "Toys_and_Games", "Video_Games",
              "Unknown"]


N_PER_CAT = 10_000
N_META    = 60_000

pd.set_option("display.max_colwidth", 200)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


**Load & sample each category (streaming) and concatenate**

In [9]:
def stream_jsonl(url: str, limit: int | None = None):
    """
    Stream a JSONL file line-by-line from Hugging Face
    Normalizes mixed-type fields like 'price'
    """
    with fsspec.open(url, "rt") as f:
        for idx, line in enumerate(f):
            if limit is not None and idx >= limit:
                break
            obj = json.loads(line)


            if "price" in obj and obj["price"] is not None:
                obj["price"] = str(obj["price"])

            return_obj = obj
            yield return_obj


def ensure_asin(df: pd.DataFrame) -> pd.DataFrame:
    """
    Ensure there is an 'asin' column
    """
    for cand in ["asin", "parent_asin", "product_id", "item_id", "Parent_ASIN", "ParentAsin"]:
        if cand in df.columns:
            if "asin" not in df.columns:
                df["asin"] = df[cand]
            return df
    if len(df) > 0:
        print("No recognizable ASIN-like key found. Example row:\n", df.head(1).to_dict("records")[0])
    return df


def load_category(category: str, n_reviews: int, n_meta: int):
    """
    Load one category's reviews and meta as DataFrames
    """
    reviews_url = f"hf://datasets/{REPO}/raw/review_categories/{category}.jsonl"
    meta_url    = f"hf://datasets/{REPO}/raw/meta_categories/meta_{category}.jsonl"

    reviews_df = pd.DataFrame(islice(stream_jsonl(reviews_url), n_reviews)).assign(category=category)
    meta_df    = pd.DataFrame(islice(stream_jsonl(meta_url),    n_meta)).assign(category=category)
    return reviews_df, meta_df

**Inspect schemas and key columns**

In [10]:
all_reviews, all_meta = [], []

for cat in CATEGORIES:
    r_df, m_df = load_category(cat, n_reviews=N_PER_CAT, n_meta=N_META)
    all_reviews.append(r_df)
    all_meta.append(m_df)

reviews_df = pd.concat(all_reviews, ignore_index=True)
meta_df    = pd.concat(all_meta,    ignore_index=True)

reviews_df = ensure_asin(reviews_df)
meta_df    = ensure_asin(meta_df)


if "asin" in reviews_df:
    reviews_df = reviews_df[reviews_df["asin"].notna()]
if "asin" in meta_df:
    meta_df = meta_df[meta_df["asin"].notna()]

print(f"Loaded rows -> reviews: {len(reviews_df):,} | meta: {len(meta_df):,}")
display(reviews_df.head(2))
display(meta_df.head(2))

print(f"Unique products in reviews: {reviews_df['asin'].nunique():,}")
print(f"Unique products in meta: {meta_df['asin'].nunique():,}")


Loaded rows -> reviews: 10,000 | meta: 60,000


Unnamed: 0,rating,title,text,images,asin,parent_asin,user_id,timestamp,helpful_vote,verified_purchase,category
0,3.0,Smells like gasoline! Going back!,First & most offensive: they reek of gasoline so if you are sensitive/allergic to petroleum products like I am you will want to pass on these. Second: the phone adapter is useless as-is. Mine was...,"[{'small_image_url': 'https://m.media-amazon.com/images/I/71YN+Qk3kCL._SL256_.jpg', 'medium_image_url': 'https://m.media-amazon.com/images/I/71YN+Qk3kCL._SL800_.jpg', 'large_image_url': 'https://m...",B083NRGZMM,B083NRGZMM,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,1658185117948,0,True,Electronics
1,1.0,Didn’t work at all lenses loose/broken.,"These didn’t work. Idk if they were damaged in shipping or what, but the lenses were loose or something. I could see half a lens with its edge in the frame and the rest was missing. It looked like...",[],B07N69T6TM,B07N69T6TM,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,1592678549731,0,True,Electronics


Unnamed: 0,main_category,title,average_rating,rating_number,features,description,price,images,videos,store,categories,details,parent_asin,bought_together,subtitle,author,category,asin
0,All Electronics,FS-1051 FATSHARK TELEPORTER V3 HEADSET,3.5,6,[],[Teleporter V3 The “Teleporter V3” kit sets a new level of value in the FPV world with Fat Shark renowned performance and quality. The fun of FPV is experienced firsthand through the large screen ...,,"[{'thumb': 'https://m.media-amazon.com/images/I/41qrX56lsYL._AC_US40_.jpg', 'large': 'https://m.media-amazon.com/images/I/41qrX56lsYL._AC_.jpg', 'variant': 'MAIN', 'hi_res': None}]",[],Fat Shark,"[Electronics, Television & Video, Video Glasses]","{'Date First Available': 'August 2, 2014', 'Manufacturer': 'Fatshark'}",B00MCW7G9M,,,,Electronics,B00MCW7G9M
1,All Electronics,Ce-H22B12-S1 4Kx2K Hdmi 4Port,5.0,1,"[UPC: 662774021904, Weight: 0.600 lbs]",[HDMI In - HDMI Out],,"[{'thumb': 'https://m.media-amazon.com/images/I/31OIMoOW70L._AC_US40_.jpg', 'large': 'https://m.media-amazon.com/images/I/31OIMoOW70L._AC_.jpg', 'variant': 'MAIN', 'hi_res': 'https://m.media-amazo...",[],SIIG,"[Electronics, Television & Video, Accessories, Cables, HDMI Cables]","{'Product Dimensions': '0.83 x 4.17 x 2.05 inches', 'Item Weight': '5.3 ounces', 'Item model number': 'CE-H22B12-S1', 'Is Discontinued By Manufacturer': 'No', 'Date First Available': 'June 3, 2015...",B00YT6XQSE,,,,Electronics,B00YT6XQSE


Unique products in reviews: 8,907
Unique products in meta: 60,000


In [11]:
# print(reviews_df.columns)
# print(meta_df.columns)
# merged = reviews_df.merge(meta_df, on="parent_asin", how="left", suffixes=("_review", "_meta"))
# print(merged)
# print(merged.columns)
# merged.shape

**Helper: ensure_asin + normalize IDs**

In [12]:
meta_keys = {"asin", "parent_asin", "category"}
meta_keep = ["asin", "parent_asin"] + [c for c in meta_df.columns if c not in meta_keys]


m1 = reviews_df.merge(meta_df[meta_keep], on="asin", how="left", suffixes=("_review", "_meta"))


m2 = reviews_df.merge(
    meta_df[meta_keep].rename(columns={"asin": "asin_meta2", "parent_asin": "parent_asin_meta2"}),
    left_on="parent_asin",
    right_on="asin_meta2",
    how="left",
)


merged = m1.copy()
for col in meta_keep:
    if col in {"asin", "parent_asin"}:
        continue
    col_m1 = col
    col_m2 = col + "_m2"
    if col in m2.columns:
        merged[col_m2] = m2[col]
        merged[col] = merged[col].where(merged[col].notna(), merged[col_m2])
        merged.drop(columns=[col_m2], inplace=True)


if "asin_meta2" in m2.columns:
    merged["asin_meta_fallback"] = m2["asin_meta2"]

print("Merged shape:", merged.shape)


meta_signal = [c for c in merged.columns if c.endswith("_meta") or c in ["average_rating", "rating_number", "price", "store", "categories", "details", "title", "images", "videos", "main_category"]]
coverage = merged[meta_signal].notna().any(axis=1).mean() if meta_signal else 0.0
print(f"Rows with ANY meta fields: {coverage:.2%}")

display(merged.head(5))

Merged shape: (10000, 28)
Rows with ANY meta fields: 10.98%


Unnamed: 0,rating,title_review,text,images_review,asin,parent_asin_review,user_id,timestamp,helpful_vote,verified_purchase,category,parent_asin_meta,main_category,title_meta,average_rating,rating_number,features,description,price,images_meta,videos,store,categories,details,bought_together,subtitle,author,asin_meta_fallback
0,3.0,Smells like gasoline! Going back!,First & most offensive: they reek of gasoline so if you are sensitive/allergic to petroleum products like I am you will want to pass on these. Second: the phone adapter is useless as-is. Mine was...,"[{'small_image_url': 'https://m.media-amazon.com/images/I/71YN+Qk3kCL._SL256_.jpg', 'medium_image_url': 'https://m.media-amazon.com/images/I/71YN+Qk3kCL._SL800_.jpg', 'large_image_url': 'https://m...",B083NRGZMM,B083NRGZMM,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,1658185117948,0,True,Electronics,,,,,,,,,,,,,,,,,
1,1.0,Didn’t work at all lenses loose/broken.,"These didn’t work. Idk if they were damaged in shipping or what, but the lenses were loose or something. I could see half a lens with its edge in the frame and the rest was missing. It looked like...",[],B07N69T6TM,B07N69T6TM,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,1592678549731,0,True,Electronics,,,,,,,,,,,,,,,,,
2,5.0,Excellent!,"I love these. They even come with a carry case and several sizes of ear bud inserts. Thank heaven! I get ear pain from most, but the smallest buds fit great. They also have a charger and all of ...",[],B01G8JO5F2,B01G8JO5F2,AFKZENTNBQ7A7V7UXW5JJI6UGRYQ,1523093017534,0,True,Electronics,,,,,,,,,,,,,,,,,
3,5.0,Great laptop backpack!,"I was searching for a sturdy backpack for school that would allow me to carry my laptop as well as schoolbooks. After reading many of the reviews of this bag, I placed my order and crossed my fing...",[],B001OC5JKY,B001OC5JKY,AGGZ357AO26RQZVRLGU4D4N52DZQ,1290278495000,18,True,Electronics,,,,,,,,,,,,,,,,,
4,5.0,Best Headphones in the Fifties price range!,I've bought these headphones three times because I love them so much and overuse them and find ways to break the plug to where I can't use my warranty. The sound and bass are awesome. Hands down m...,[],B013J7WUGC,B07CJYMRWM,AG2L7H23R5LLKDKLBEF2Q3L2MVDA,1676601581238,0,True,Electronics,,,,,,,,,,,,,,,,,


# **Milestone #1: Sentiment Analysis of a Singular Review**


Goal: Take the reviews dataframe, only maintain the rating, title, category, and text columns, and then train a model that predicts the rating given a review text


In [13]:
def load_category_into_review(category: str, n_reviews: int):
    """
    Load one category's reviews as DataFrames
    """
    reviews_url = f"hf://datasets/{REPO}/raw/review_categories/{category}.jsonl"

    data = (
        {k: row.get(k) for k in ["rating", "title", "text"]}
        for row in islice(stream_jsonl(reviews_url), n_reviews)
    )

    reviews_df = pd.DataFrame(data).assign(category=category)
    return reviews_df

In [14]:
sentiment_reviews =  []

for cat in CATEGORIES:
    r_df = load_category_into_review(cat, n_reviews=N_PER_CAT)
    sentiment_reviews.append(r_df)

reviews_df_milestone1 = pd.concat(sentiment_reviews, ignore_index=True)


print(f"Loaded rows -> reviews: {len(reviews_df_milestone1):,}")
display(reviews_df_milestone1.head(2))

Loaded rows -> reviews: 10,000


Unnamed: 0,rating,title,text,category
0,3.0,Smells like gasoline! Going back!,First & most offensive: they reek of gasoline so if you are sensitive/allergic to petroleum products like I am you will want to pass on these. Second: the phone adapter is useless as-is. Mine was...,Electronics
1,1.0,Didn’t work at all lenses loose/broken.,"These didn’t work. Idk if they were damaged in shipping or what, but the lenses were loose or something. I could see half a lens with its edge in the frame and the rest was missing. It looked like...",Electronics


In [15]:
reviews_df_milestone1.info()
reviews_df_milestone1['rating'].value_counts()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   rating    10000 non-null  float64
 1   title     10000 non-null  object 
 2   text      10000 non-null  object 
 3   category  10000 non-null  object 
dtypes: float64(1), object(3)
memory usage: 312.6+ KB


Unnamed: 0_level_0,count
rating,Unnamed: 1_level_1
5.0,6732
4.0,1537
1.0,735
3.0,633
2.0,363


## Milestone #1: Data Cleaning

In [16]:
reviews_df_milestone1.isna().sum()

Unnamed: 0,0
rating,0
title,0
text,0
category,0


### Text Normalization (removing punctuation)

In [17]:
import string


def remove_punctuation(text: str) -> str:
    """
    Function removes all punctuation from a string
    """
    if not isinstance(text, str):
        return ""
    return text.translate(str.maketrans("", "", string.punctuation))

In [18]:
"""
   Creates clean_review and clean_title and clean_review. These two columns will be used during model training.
"""
reviews_df_milestone1['clean_review'] = (
    reviews_df_milestone1['text']
    .str.lower()
    .apply(remove_punctuation)
    .str.replace(r"\s+", " ", regex=True)
    .str.strip()
)

reviews_df_milestone1['clean_title'] = (
    reviews_df_milestone1['title']
    .str.lower()
    .apply(remove_punctuation)
    .str.replace(r"\s+", " ", regex=True)
    .str.strip()
)

### Lemmitization of Reviews

In [19]:
lemmatizer = WordNetLemmatizer()

def lemmatize_text(text: str) -> str:
    if not isinstance(text, str):
        return ""
    tokens = word_tokenize(text)
    lemmas = [lemmatizer.lemmatize(token) for token in tokens]
    return " ".join(lemmas)

In [20]:
reviews_df_milestone1['lemmatized_review'] = reviews_df_milestone1['clean_review'].apply(lemmatize_text)
reviews_df_milestone1['lemmatized_title'] = reviews_df_milestone1['clean_title'].apply(lemmatize_text)

### Creating Sentiment Labels


In [21]:
def create_sentiment_label(rating: int) -> str:
  if rating >= 4:
    return 'positive'
  elif rating <= 2:
    return 'negative'
  else:
    return 'neutral'

In [22]:
reviews_df_milestone1['sentiment_labels'] = (
    reviews_df_milestone1['rating']
    .apply(create_sentiment_label)
)

In [23]:
reviews_df_milestone1.head()

Unnamed: 0,rating,title,text,category,clean_review,clean_title,lemmatized_review,lemmatized_title,sentiment_labels
0,3.0,Smells like gasoline! Going back!,First & most offensive: they reek of gasoline so if you are sensitive/allergic to petroleum products like I am you will want to pass on these. Second: the phone adapter is useless as-is. Mine was...,Electronics,first most offensive they reek of gasoline so if you are sensitiveallergic to petroleum products like i am you will want to pass on these second the phone adapter is useless asis mine was not dril...,smells like gasoline going back,first most offensive they reek of gasoline so if you are sensitiveallergic to petroleum product like i am you will want to pas on these second the phone adapter is useless asis mine wa not drilled...,smell like gasoline going back,neutral
1,1.0,Didn’t work at all lenses loose/broken.,"These didn’t work. Idk if they were damaged in shipping or what, but the lenses were loose or something. I could see half a lens with its edge in the frame and the rest was missing. It looked like...",Electronics,these didn’t work idk if they were damaged in shipping or what but the lenses were loose or something i could see half a lens with its edge in the frame and the rest was missing it looked like it ...,didn’t work at all lenses loosebroken,these didn ’ t work idk if they were damaged in shipping or what but the lens were loose or something i could see half a lens with it edge in the frame and the rest wa missing it looked like it ca...,didn ’ t work at all lens loosebroken,negative
2,5.0,Excellent!,"I love these. They even come with a carry case and several sizes of ear bud inserts. Thank heaven! I get ear pain from most, but the smallest buds fit great. They also have a charger and all of ...",Electronics,i love these they even come with a carry case and several sizes of ear bud inserts thank heaven i get ear pain from most but the smallest buds fit great they also have a charger and all of it fits...,excellent,i love these they even come with a carry case and several size of ear bud insert thank heaven i get ear pain from most but the smallest bud fit great they also have a charger and all of it fit in ...,excellent,positive
3,5.0,Great laptop backpack!,"I was searching for a sturdy backpack for school that would allow me to carry my laptop as well as schoolbooks. After reading many of the reviews of this bag, I placed my order and crossed my fing...",Electronics,i was searching for a sturdy backpack for school that would allow me to carry my laptop as well as schoolbooks after reading many of the reviews of this bag i placed my order and crossed my finger...,great laptop backpack,i wa searching for a sturdy backpack for school that would allow me to carry my laptop a well a schoolbook after reading many of the review of this bag i placed my order and crossed my finger i sh...,great laptop backpack,positive
4,5.0,Best Headphones in the Fifties price range!,I've bought these headphones three times because I love them so much and overuse them and find ways to break the plug to where I can't use my warranty. The sound and bass are awesome. Hands down m...,Electronics,ive bought these headphones three times because i love them so much and overuse them and find ways to break the plug to where i cant use my warranty the sound and bass are awesome hands down my fa...,best headphones in the fifties price range,ive bought these headphone three time because i love them so much and overuse them and find way to break the plug to where i cant use my warranty the sound and bass are awesome hand down my favori...,best headphone in the fifty price range,positive


### Tokenization of Reviews


In [24]:
# documents = reviews_df_milestone1['clean_review'].tolist()

In [25]:
# vectorizer = TfidfVectorizer(
#     stop_words="english",   # remove english stopwords like this, a, the, etc
#     # max_features=5000,      # keep top 5000 words (tune this)
# )
# X = vectorizer.fit_transform(documents)

In [26]:
# print(f"Vocabulary size: {len(vectorizer.vocabulary_)}")

# df_tfidf = pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out())
# df_tfidf.head()

In [27]:
from nltk.tokenize import word_tokenize
reviews_df_milestone1['tokenized_review'] = reviews_df_milestone1['clean_review'].apply(word_tokenize)

In [28]:
reviews_df_milestone1.head(5)

Unnamed: 0,rating,title,text,category,clean_review,clean_title,lemmatized_review,lemmatized_title,sentiment_labels,tokenized_review
0,3.0,Smells like gasoline! Going back!,First & most offensive: they reek of gasoline so if you are sensitive/allergic to petroleum products like I am you will want to pass on these. Second: the phone adapter is useless as-is. Mine was...,Electronics,first most offensive they reek of gasoline so if you are sensitiveallergic to petroleum products like i am you will want to pass on these second the phone adapter is useless asis mine was not dril...,smells like gasoline going back,first most offensive they reek of gasoline so if you are sensitiveallergic to petroleum product like i am you will want to pas on these second the phone adapter is useless asis mine wa not drilled...,smell like gasoline going back,neutral,"[first, most, offensive, they, reek, of, gasoline, so, if, you, are, sensitiveallergic, to, petroleum, products, like, i, am, you, will, want, to, pass, on, these, second, the, phone, adapter, is,..."
1,1.0,Didn’t work at all lenses loose/broken.,"These didn’t work. Idk if they were damaged in shipping or what, but the lenses were loose or something. I could see half a lens with its edge in the frame and the rest was missing. It looked like...",Electronics,these didn’t work idk if they were damaged in shipping or what but the lenses were loose or something i could see half a lens with its edge in the frame and the rest was missing it looked like it ...,didn’t work at all lenses loosebroken,these didn ’ t work idk if they were damaged in shipping or what but the lens were loose or something i could see half a lens with it edge in the frame and the rest wa missing it looked like it ca...,didn ’ t work at all lens loosebroken,negative,"[these, didn, ’, t, work, idk, if, they, were, damaged, in, shipping, or, what, but, the, lenses, were, loose, or, something, i, could, see, half, a, lens, with, its, edge, in, the, frame, and, th..."
2,5.0,Excellent!,"I love these. They even come with a carry case and several sizes of ear bud inserts. Thank heaven! I get ear pain from most, but the smallest buds fit great. They also have a charger and all of ...",Electronics,i love these they even come with a carry case and several sizes of ear bud inserts thank heaven i get ear pain from most but the smallest buds fit great they also have a charger and all of it fits...,excellent,i love these they even come with a carry case and several size of ear bud insert thank heaven i get ear pain from most but the smallest bud fit great they also have a charger and all of it fit in ...,excellent,positive,"[i, love, these, they, even, come, with, a, carry, case, and, several, sizes, of, ear, bud, inserts, thank, heaven, i, get, ear, pain, from, most, but, the, smallest, buds, fit, great, they, also,..."
3,5.0,Great laptop backpack!,"I was searching for a sturdy backpack for school that would allow me to carry my laptop as well as schoolbooks. After reading many of the reviews of this bag, I placed my order and crossed my fing...",Electronics,i was searching for a sturdy backpack for school that would allow me to carry my laptop as well as schoolbooks after reading many of the reviews of this bag i placed my order and crossed my finger...,great laptop backpack,i wa searching for a sturdy backpack for school that would allow me to carry my laptop a well a schoolbook after reading many of the review of this bag i placed my order and crossed my finger i sh...,great laptop backpack,positive,"[i, was, searching, for, a, sturdy, backpack, for, school, that, would, allow, me, to, carry, my, laptop, as, well, as, schoolbooks, after, reading, many, of, the, reviews, of, this, bag, i, place..."
4,5.0,Best Headphones in the Fifties price range!,I've bought these headphones three times because I love them so much and overuse them and find ways to break the plug to where I can't use my warranty. The sound and bass are awesome. Hands down m...,Electronics,ive bought these headphones three times because i love them so much and overuse them and find ways to break the plug to where i cant use my warranty the sound and bass are awesome hands down my fa...,best headphones in the fifties price range,ive bought these headphone three time because i love them so much and overuse them and find way to break the plug to where i cant use my warranty the sound and bass are awesome hand down my favori...,best headphone in the fifty price range,positive,"[ive, bought, these, headphones, three, times, because, i, love, them, so, much, and, overuse, them, and, find, ways, to, break, the, plug, to, where, i, cant, use, my, warranty, the, sound, and, ..."


##

Aspect Based Sentiment Analysis

In [29]:
!pip install pyabsa

Collecting huggingface-hub<1.0,>=0.34.0 (from transformers>=4.18.0->pyabsa)
  Using cached huggingface_hub-0.36.0-py3-none-any.whl.metadata (14 kB)
Using cached huggingface_hub-0.36.0-py3-none-any.whl (566 kB)
Installing collected packages: huggingface-hub
  Attempting uninstall: huggingface-hub
    Found existing installation: huggingface_hub 1.1.2
    Uninstalling huggingface_hub-1.1.2:
      Successfully uninstalled huggingface_hub-1.1.2
Successfully installed huggingface-hub-0.36.0


In [38]:
from pyabsa import AspectTermExtraction as ATEPC

In [37]:
aspect_extractor = ATEPC.AspectExtractor('multilingual', auto_device=True)

[2025-11-10 05:32:24] (2.4.2) ********** Available ATEPC model checkpoints for Version:2.4.2 (this version) **********
[2025-11-10 05:32:24] (2.4.2) ********** Available ATEPC model checkpoints for Version:2.4.2 (this version) **********
[2025-11-10 05:32:24] (2.4.2) Downloading checkpoint:multilingual 
[2025-11-10 05:32:24] (2.4.2) Notice: The pretrained model are used for testing, it is recommended to train the model on your own custom datasets
[2025-11-10 05:32:24] (2.4.2) Checkpoint already downloaded, skip
[2025-11-10 05:32:24] (2.4.2) Load aspect extractor from checkpoints/ATEPC_MULTILINGUAL_CHECKPOINT
[2025-11-10 05:32:24] (2.4.2) config: checkpoints/ATEPC_MULTILINGUAL_CHECKPOINT/fast_lcf_atepc.config
[2025-11-10 05:32:24] (2.4.2) state_dict: checkpoints/ATEPC_MULTILINGUAL_CHECKPOINT/fast_lcf_atepc.state_dict
[2025-11-10 05:32:24] (2.4.2) model: None
[2025-11-10 05:32:24] (2.4.2) tokenizer: checkpoints/ATEPC_MULTILINGUAL_CHECKPOINT/fast_lcf_atepc.tokenizer
[2025-11-10 05:32:25] 



In [39]:
all_reviews = reviews_df_milestone1['clean_review'].tolist()

In [40]:
all_results = []

for review in all_reviews:
  result = aspect_extractor.extract(review, print_result=True, save_result=False)
  all_results.append(result)

AttributeError: 'AspectExtractor' object has no attribute 'extract'

In [35]:
all_results[1]

IndexError: list index out of range