## ABSA aspect extraction on subset (timestamp >= 2023-05-01)
4892 rows

Aspect-Based Sentiment Analysis (ABSA), also known as fine-grained opinion mining, is the task of determining the sentiment of a text with respect to a specific aspect.

1. Identify core text feature aspects via ABSA
2. Temporal changes in aspects over time
3. Store-wise comparison
4. RQ1: Which text feature aspects most strongly influence star ratings
5. RQ2: Which text feature aspects mostly predict review helpfulness(0-1 binary)

| Feature Group                | What it captures                     | Why it matters              |
| ---------------------------- | ------------------------------------ | --------------------------- |
| **Aspect Sentiment scores**  | Per-aspect polarity & intensity      | Main predictors for RQs     |
| **Aspect Mentions (binary)** | Which aspects are discussed          | Controls for omission bias  |
| **Review metadata**          | Rating, helpful votes, verified | Targets + confound controls |
| **Product identifiers**      | Category, brand, ASIN                | Clustering / fixed effects  |


#### 1. Identify core text feature aspects via ABSA

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('../data/joined_data.csv')

In [3]:
subset = df[df['timestamp'] >= '2023-05-01']

In [4]:
subset.shape

(4892, 22)

In [5]:
# subset.to_csv('../data/subset.csv', index=False)

#### Aspect Extraction

pyabsa ATEPC (Aspect Term Extraction & Polarity Classification)<br>
- Extract aspect terms and their sentiment<br>
- Documentation: https://pyabsa.readthedocs.io/en/latest/0_intro/introduction.html


In [None]:
# It could take a while if run locally. Recommend run on Colab

'''from pyabsa import AspectTermExtraction as ATEPC, available_checkpoints

# Load the model directly from Hugging Face Hub
aspect_extractor = ATEPC.AspectExtractor(
    'english',          # Can be replaced with a specific checkpoint name or a local file path
    auto_device=True,        # Use GPU/CPU or Auto
    cal_perplexity=False      # Calculate text perplexity
)
texts = subset['text'].tolist()

# Perform end-to-end aspect-based sentiment analysis
result = aspect_extractor.predict(
    texts,
    print_result=True,       # Console Printing
    save_result=False,       # Save results into a json file
    ignore_error=True,       # Exception handling for error cases
    pred_sentiment=True      # Predict sentiment for extracted aspects
)

# The output automatically identifies aspects and their corresponding sentiments:
# {
#   "text": "The user interface is brilliant, but the documentation is a total mess.",
#   "aspect": ["user interface", "documentation"],
#   "position": [[4, 19], [41, 54]],
#   "sentiment": ["Positive", "Negative"],
#   "probability": [[1e-05, 0.0001, 0.9998], [0.9998, 0.0001, 1e-05]],
#   "confidence": [0.9997, 0.9997]
# }
'''

[2025-11-07 00:59:59] (2.4.2) ********** Available ATEPC model checkpoints for Version:2.4.2 (this version) **********
[2025-11-07 00:59:59] (2.4.2) ********** Available ATEPC model checkpoints for Version:2.4.2 (this version) **********
[2025-11-07 00:59:59] (2.4.2) Downloading checkpoint:english 
[2025-11-07 00:59:59] (2.4.2) Notice: The pretrained model are used for testing, it is recommended to train the model on your own custom datasets
[2025-11-07 00:59:59] (2.4.2) Checkpoint already downloaded, skip
[2025-11-07 00:59:59] (2.4.2) Load aspect extractor from checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43
[2025-11-07 00:59:59] (2.4.2) config: checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43/fast_lcf_atepc.config
[2025-11-07 00:59:59] (2.4.2) state_dict: checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43/fast_lcf_atepc.state_dict


Some weights of the model checkpoint at microsoft/deberta-v3-base were not used when initializing DebertaV2Model: ['lm_predictions.lm_head.dense.weight', 'mask_predictions.dense.weight', 'mask_predictions.classifier.bias', 'mask_predictions.LayerNorm.weight', 'lm_predictions.lm_head.dense.bias', 'mask_predictions.dense.bias', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.LayerNorm.weight', 'mask_predictions.LayerNorm.bias', 'lm_predictions.lm_head.LayerNorm.bias', 'mask_predictions.classifier.weight']
- This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
preparing ate inferen

[2025-11-07 01:29:40] (2.4.2) Example 0: I order this short sleeve shirt in white in my usual size small . at < NUM > < NUM > and < NUM > lbs . , it <fit:Positive Confidence:0.9991> I perfectly . I like the v neck with collar and the scoop hemline . the pleat on the back add a classic look . the shirt seem well make and should hold up to numerous wearing and washing . the <material:Positive Confidence:0.9993> be soft and comfortable and the <fit:Positive Confidence:0.9991> be modest yet stylish . I will enjoy wear this top with jean and short in the warm weather . I also think it will work well as a layer under sweater in cool weather . I m glad I order this versatile top .
[2025-11-07 01:29:40] (2.4.2) Example 1: sloppy fit
[2025-11-07 01:29:40] (2.4.2) Example 2: the <size:Negative Confidence:0.9961> be not the size I want . too small !
[2025-11-07 01:29:40] (2.4.2) Example 3: this be very nice band
[2025-11-07 01:29:40] (2.4.2) Example 4: love they ! !
[2025-11-07 01:29:40] (2.4.2) 

In [21]:
ae = pd.DataFrame(result)

In [None]:
ae.head()

In [None]:
# ae.to_csv('../data/aspect_extraction_results.csv', index=False)