NLP Opinion Search Project

Project Overview

This project builds a small opinion-based search engine using Amazon reviews. The goal is to retrieve reviews that mention a product aspect together with a positive or negative opinion. I implemented a Boolean baseline, a Boolean + lexicon polarity filter (M1), and an SBERT + lexicon semantic method (M2) to compare retrieval performance.

The following items are included as part of this project submission:

Evaluation work: A completed precision table with every retrieved review manually checked and marked as relevant or not.
Code files: Baseline.py, m1.py, and m2.py
README.md: Full documentation explaining setup, methods, files, and execution steps.
Outputs folder: Contains results from Baseline (tests 1–3), Method 1 (test4), and Method 2 (test4).
Data and lexicon files:
- positive-words.txt
- negative-words.txt
- reviews_segment.pkl (Main dataset)
- data.pkl (SBERT embeddings for Method 2)

Setup

You need Python installed to run this project.

Works with Python 3.9 or higher
Method 2 (M2) requires Python 3.10
Do NOT use Python 3.11 for M2 because the pickle file will not load.

Download Python here: https://www.python.org

Install Packages

Run this:

pip install pandas
pip install numpy
pip install scikit-learn
pip install nltk
pip install sentence-transformers
pip install torch

Files Needed for This Project

Make sure the following files are inside the same folder as the Python scripts (Baseline.py, m1.py, m2.py).
If any of these are missing or placed in the wrong folder, the code will NOT run CORRECTLY.

1. Lexicon Files (For M1 and M2)

positive-words.txt
negative-words.txt

These come from the Hu & Liu (2004) opinion lexicon and are used to detect positive or negative words when filtering reviews by polarity.

2. reviews_segment.pkl (Main Dataset)

This is the main Amazon review dataset for the project.
It contains:

review text
review titles
star ratings
product/user metadata

Your Baseline, M1, and M2 load this file using:

df = pd.read_pickle("reviews_segment.pkl")

data.pkl

This file contains precomputed BERT sentence embeddings for every sentence in the review corpus
is required by Method 2 (M2) for semantic similarity search
Credit: data.pkl was generated and provided by TA Navid Ayoobi

M2 uses this file to compare your query embedding with all stored sentence embeddings

NOTE : data.pkl will not be included inside my file zip due to file being too large

How to Run Each Method

All scripts must be run from inside the Codes folder so the output paths resolve correctly. Open a terminal and move into the Codes directory:

python3 Baseline.py
python3 m1.py
python3 m2.py

Output Files

All retrieved results for this project are saved into the Outputs directory.
Each method (Baseline, M1, M2) writes its own output files for the five required queries:

audio_quality
wifi_signal
mouse_button
gps_map
image_quality

Congratulations!!!

If you made it this far, then you have successfully set up the project and run all three methods: Baseline, Method 1, and Method 2. All outputs should now be saved in the Outputs folder for each of the five required queries.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Codes		Codes
Outputs		Outputs
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md
Report.pdf		Report.pdf
Work for Prescison.pdf		Work for Prescison.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Opinion Search Project

Project Overview

The following items are included as part of this project submission:

Setup

Install Packages

Files Needed for This Project

1. Lexicon Files (For M1 and M2)

2. reviews_segment.pkl (Main Dataset)

data.pkl

How to Run Each Method

Output Files

Congratulations!!!

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLP Opinion Search Project

Project Overview

The following items are included as part of this project submission:

Setup

Install Packages

Files Needed for This Project

1. Lexicon Files (For M1 and M2)

2. reviews_segment.pkl (Main Dataset)

data.pkl

How to Run Each Method

Output Files

Congratulations!!!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages