## Experimental Design Document (English): The Impact of Context on Sentiment Perception

### 🎯 Objective

This experiment aims to investigate whether the perceived sentiment of identical review sentences changes significantly under different contextual conditions. By manipulating the context prepended and the sentiment-appending phrases, we observe how sentiment predictions made by models are influenced.

---

### 📊 Dataset Size and Composition

**Data Source**: Amazon Reviews 2023 dataset (Hou et al., 2024) from [McAuley-Lab/Amazon-Reviews-2023](https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023)

**Reference**:


@article{hou2024bridging,
  title={Bridging Language and Items for Retrieval and Recommendation},
  author={Hou, Yupeng and Li, Jiacheng and He, Zhankui and Yan, An and Chen, Xiusi and McAuley, Julian},
  journal={arXiv preprint arXiv:2403.03952},
  year={2024}
}

---



In [1]:
CATEGORY = "Movies_and_TV"

In [3]:
from DataLoader import AmazonReviews

amazon_reviews = AmazonReviews()

movies_tv_meta = amazon_reviews.get_meta_data(CATEGORY)

movies_tv_meta

Dataset({
    features: ['main_category', 'title', 'average_rating', 'rating_number', 'features', 'description', 'price', 'images', 'videos', 'store', 'categories', 'details', 'parent_asin', 'bought_together', 'subtitle', 'author'],
    num_rows: 748224
})

### Sample Filtering / Stratified Sampling / Category Balancing

#### `Movie`

To reduce bias from inherent sentiment tendencies in source material, we selected  20 movies using stratified sampling from the 1.0–2.5 range (low-rated), 3.0–4.0 range (medium-rated), and 4.5–5.0 range (high-rated). Additionally, we ensured diversity in content categories and only included movies with more than 500 reviews to ensure sufficient data representation.

Select 60 movies with the following criteria:
1. Each movie has rating_number > 500
2. Each movie should have different categories as much as possible
3. average_rating distribution: 1.0-2.5 / 3.0-4.0  / 4.5-5.0 

In [3]:
from MoviesSelector import MovieSelector
selector = MovieSelector()
selected_movies = selector.select_movies(
    save_path=f"{CATEGORY}_selected.json", 
    numbers_of_each_rating=20)

selected_movies

Loading Movies_and_TV dataset...
Filtering data...
Number of movies with rating_number > 500: 170279
Low rating group (1.0-2.5): 101 movies
Medium rating group (3.0-4.0): 9204 movies
High rating group (4.5-5.0): 134843 movies
Selected 20 movies from low group
Selected 20 movies from medium group
Selected 20 movies from high group
Saved 60 movies data to Movies_and_TV_selected.json

=== Selection Results Summary ===
Total selected: 60 movies

Low rating group (1.0-2.5): 20 movies
  - Battle Earth (Rating: 2.3, Reviews: 709, Categories: Movies & TV, Independently Distributed, Action & Adventure)
  - The Neon Demon (Rating: 2.5, Reviews: 2963, Categories: Suspense, Horror, Downbeat)
  - Battle Earth (Rating: 2.3, Reviews: 709, Categories: Science Fiction, Action, Military and War)
  - Borat’s American Lockdown & Debunking Borat (Rating: 2.3, Reviews: 837, Categories: Comedy, Talk Show and Variety, Documentary)
  - None (Rating: 2.3, Reviews: 836, Categories: )
  - None (Rating: 2.0, Revie

[{'main_category': 'Movies & TV',
  'title': 'Battle Earth',
  'average_rating': 2.3,
  'rating_number': 709,
  'features': [],
  'description': ["Confirmation of extraterrestrial life appears on television screens across the world as a massive spacecraft breaks through the atmosphere on a crash course into the Atlantic Ocean. A young paramedic, Greg Baker, signs up to fight for his planet against the invaders. Baker joins Special Forces members as the squad medic as they escort a classified package by chopper over enemy territory. When their chopper is shot down they find themselves surrounded and outnumbered. Desperate to return to his wife's side and haunted by nightmares, Baker find that his role in the war has quickly become much larger than he could have ever imagined. The mysterious package may be the key to turning the tide of the war, and possibly to saving all of humanity, but Baker must decide whether to protect it or sacrifice it to ensure his own survival."],
  'price': '4

In [2]:
from MovieStatsReporter import MovieAnalysisReport

movie_stats_reporter = MovieAnalysisReport(
    json_file=f"{CATEGORY}_selected.json",
    output_file="Report/Selected-Movies-StatsReport.txt"
)
movie_stats_reporter.main()

Generating movie selection analysis report...
Report generated and saved to: Report/Selected-Movies-StatsReport.txt

Report Preview:
MOVIE SELECTION ANALYSIS REPORT

📋 SELECTION CRITERIA SUMMARY
--------------------------------------------------
✓ Total Movies Selected: 60
✓ Rating Number Threshold: > 500
✓ Rating Distribution Target:
  • low (1.0-2.5): 20 movies
  • medium (3.0-4.0): 20 movies
  • high (4.5-5.0): 20 movies
✓ Additional Criteria:
  • Maximum category diversity
  • Balanced platform representation
  • Quality data validation

⭐ RATING DISTRIBUTION ANALYSIS
--------------------------------------------------

Low (1.0-2.5): 20 movies
  Average: 2.28
  Range: 2.0 - 2.5
  • Battle Earth (2.3)
  • The Neon Demon (2.5)
  • Battle Earth (2.3)
  • Borat’s American Lockdown & Debunking Borat (2.3)
  • None (2.3)
  • None (2.0)
  • None (2.0)
  • None (2.0)
  • None (2.5)
  • None (2.3)
  • None (2.4)
  • None (2.5)
  • None (2.0)
  • None (2.5)
  • None (2.3)
  • None (2.2)
  • 

##### `Reviews`

In [4]:
# movies_tv_data = amazon_reviews.get_raw_data(CATEGORY)
# movies_tv_data.save_to_disk("movie_tv_reviews_dataset")
from datasets import load_dataset

movies_tv_data = load_dataset("movie_tv_reviews_dataset", split="train")

movies_tv_data


Using the latest cached version of the dataset since movie_tv_reviews_dataset couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at /Users/linyunyao/.cache/huggingface/datasets/movie_tv_reviews_dataset/default/0.0.0/39991b7edb3cfdb8 (last modified on Fri Jun 20 18:31:56 2025).


Dataset({
    features: ['rating', 'title', 'text', 'images', 'asin', 'parent_asin', 'user_id', 'timestamp', 'helpful_vote', 'verified_purchase'],
    num_rows: 17328314
})

In [5]:
from MovieReviewSelector import MovieReviewSelector

selector = MovieReviewSelector(reviews_dataset=movies_tv_data, min_word_count=30, max_word_count=100)

selected_reviews = selector.run_selection_process(
    reviews_per_movie=50,
    output_path=f"{CATEGORY}_reviews_selected.json")


import json
with open(f'{CATEGORY}_reviews_selected.json', 'r', encoding='utf-8') as f:
    selected_reviews = json.load(f)

selected_reviews

=== Start Optimized Movie Review Selection Process ===
Number of target movies: 60
Target reviews per movie: 50
Selection criteria:
- Review word count: 30-100 words
- Rating distribution: Stratified sampling for even distribution
- Using batch processing for improved performance
--------------------------------------------------
=== Starting optimized batch processing ===
Preprocessing dataset in batch...
Preprocessing completed. Found reviews for 51 movies.
Selected 5 reviews for B00BTFK07I
Selected 7 reviews for B01GU8CL5C
Selected 21 reviews for B00DKIH7PU
Selected 34 reviews for B093JXP1G7
Selected 4 reviews for B093K6VPXX
Selected 29 reviews for B01I24FZ1Y
Selected 41 reviews for B00OHCJ0AS
Selected 1 reviews for B018HJYM3G
Selected 6 reviews for B00RHRJKWS
Selected 2 reviews for B09KNV2HCH
Selected 32 reviews for B00QROFP4E
Selected 39 reviews for B00RHSZ2EM
Selected 50 reviews for B01GU87FNA
Selected 5 reviews for B00RHRJR76
Selected 45 reviews for B00DKGWLGM
Selected 20 review

{'selection_criteria': {'word_count_range': '30-100 words',
  'reviews_per_movie': 30,
  'sampling_method': 'stratified by rating',
  'total_movies': 60},
 'selected_reviews': {'B00BTFK07I': [{'rating': 1.0,
    'title': "BAD bad bad bad movie. Don't bother.",
    'text': "Horrible movie. No plot. Just a bunch of guys running around in the woods, mostly at night. The cover of the DVD shows an epic in a bombed city. None of this is in the movie. Like I said, it's just several bad-acting guys reading nonsense unemotional dialog in the woods. Terrible camera work, it made me dizzy at times as the camera man couldn't keep anything steady.",
    'images': [],
    'asin': 'B00BTFK07I',
    'parent_asin': 'B00BTFK07I',
    'user_id': 'AFXECLEYOOZRFCNLTZOLXYFK5USA',
    'timestamp': 1449359496000,
    'helpful_vote': 0,
    'verified_purchase': False,
    'word_count': 68},
   {'rating': 1.0,
    'title': 'You',
    'text': "Don't,just don't.No matter how much you payed for it,it's too much.ev

In [6]:
from MovieReviewReporter import MovieReviewAnalysisReport

review_reporter = MovieReviewAnalysisReport(
    json_file=f"{CATEGORY}_reviews_selected.json",
    output_file="Report/Selected-Movies-Reviews-StatsReport.txt"
)
review_reporter.main()

with open('Report/Selected-Movies-Reviews-StatsReport.txt', 'r', encoding='utf-8') as f:
    print(f.read())

Loading review data...
Generating statistical report...
Report generated and saved to: Report/Selected-Movies-Reviews-StatsReport.txt

Report Preview:
MOVIE REVIEWS DATA ANALYSIS REPORT

📋 DATA SELECTION CRITERIA SUMMARY
--------------------------------------------------
✓ Word Count Range: 30-100 words
✓ Reviews per Movie: 30
✓ Sampling Method: stratified by rating
✓ Total Movies: 60

⭐ RATING DISTRIBUTION ANALYSIS
--------------------------------------------------

Low Rating (1.0-2.5): 334 reviews
  Average: 1.21
  Range: 1.0 - 2.0

High Rating (4.1-5.0): 126 reviews
  Average: 5.00
  Range: 5.0 - 5.0

Medium Rating (2.6-4.0): 164 reviews
  Average: 3.52
  Range: 3.0 - 4.0

📊 OVERALL RATING STATISTICS
--------------------------------------------------
Total Reviews: 624
Average Rating: 2.58
Median Rating: 2.00
Rating Range: 1.0 - 5.0
Standard Deviation: 1.61

Rating Distribution Details:
  1.0 stars: 265 reviews (42.5%)
  2.0 stars: 69 reviews (11.1%)
  3.0 stars: 79 reviews (12.7%)

### 🧱 Sentence Structure Design

Each contextualized sentence is structured as follows:

```
[Pre-context] + [Original review sentence] + [Post-sentiment phrase]
```

#### Example:

Original review:

> `The cinematography and soundtrack of this movie are excellent.`

Contextual version:

> I had a rough day and happened to watch this movie. `The cinematography and soundtrack of this movie are excellent.` It reignited my passion for life.

---

**Contextual variants**

| Variant Type                         | Prepending Context                                 | Appending Sentiment                      |
|-------------------------------------|----------------------------------------------------|------------------------------------------|
| 😃Positive context + 😃 Positive closing | This film has received a lot of praise, and I was really looking forward to it. | It reignited my passion for life.        |
| 😃Positive context + 🤢 Negative closing | This film has received a lot of praise, and I was really looking forward to it. | But the plot was quite disappointing.    |
| 🤢 Negative context + 😃 Positive closing | This movie was overhyped online.                   | It reignited my passion for life.        |
| 🤢 Negative context + 🤢 Negative closing | This movie was overhyped online.                   | But the plot was quite disappointing.    |



---


In [None]:
from ContextVariantsProcessor import ContextVariantsProcessor

processor = ContextVariantsProcessor(
    llm_model="gpt-4o"  # You can change this to other models
)
    
results = processor.process_all_reviews(
    json_file_path="Movies_and_TV_reviews_selected.json",
    output_file_path="context_variants_results.json",
    max_reviews_per_movie=10,  # Remove this limit for full processing
    use_async=True
)

print(f"Processing completed: {results}")



INFO:ContextVariantsProcessor:Loading reviews from Movies_and_TV_reviews_selected.json
INFO:ContextVariantsProcessor:Processing movie B00BTFK07I
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:ContextVariantsProcessor:Completed movie B00BTFK07I: 1 reviews, 5 variants
INFO:ContextVariantsProcessor:Processing movie B01GU8CL5C
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1

Processing completed: {'total_movies': 60, 'total_reviews': 51, 'total_variants': 255, 'output_file': 'context_variants_results.json'}



### 🔍 Evaluation Methodology

#### 1. Baseline Sentiment Score

* Original reviews without any contextual or closing sentences are rated using a zero-shot model.
* Scoring standard is a 1–5 star scale (1 = very negative, 5 = very positive), emulating typical review ratings.
* Models such as `BART-Large-MNLI` or GPT-based APIs may be used with prompts that ask for overall sentiment scoring.

#### 2. Model-Based Scoring

* All contextual variants are input into a sentiment analysis model.
* Models can include fine-tuned Chinese BERT or GPT-4 via API.
* Output formats may include:

  * Continuous score (0–1): indicating sentiment polarity
  * Transformed star rating (1–5): comparable to the baseline
  * JSON example: `{"text": ..., "score": 0.83}`

#### 3. Difference Analysis

* Compare sentiment score changes across variants for the same original review
* Measure pre/post context sentiment shifts (mean, standard deviation)
* Analyze which context types (e.g., negative endings) have the strongest influence
* Evaluate whether the model understands contextual sarcasm or sentiment contradiction

---




### 📈 Analytical Metrics and Statistics

* Sentiment score variation across different contextual forms of the same review
* Mean score comparisons between groups (ANOVA)
* Most influential context types (t-test)
* Ability of LLMs to detect sarcasm or contextual contradiction (e.g., positive sentence + negative context)

---

### 🧪 Summary of Experimental Goals

* Verify whether contextual framing significantly impacts perceived sentiment
* Create a context-aware Chinese sentiment analysis dataset
* Analyze how positiv