Skip to content

Jonardzz/NLP-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP Opinion Search Project

Project Overview

This project builds a small opinion-based search engine using Amazon reviews. The goal is to retrieve reviews that mention a product aspect together with a positive or negative opinion. I implemented a Boolean baseline, a Boolean + lexicon polarity filter (M1), and an SBERT + lexicon semantic method (M2) to compare retrieval performance.

The following items are included as part of this project submission:

  • Evaluation work: A completed precision table with every retrieved review manually checked and marked as relevant or not.
  • Code files: Baseline.py, m1.py, and m2.py
  • README.md: Full documentation explaining setup, methods, files, and execution steps.
  • Outputs folder: Contains results from Baseline (tests 1–3), Method 1 (test4), and Method 2 (test4).
  • Data and lexicon files:
    • positive-words.txt
    • negative-words.txt
    • reviews_segment.pkl (Main dataset)
    • data.pkl (SBERT embeddings for Method 2)

Setup

You need Python installed to run this project.

  • Works with Python 3.9 or higher
  • Method 2 (M2) requires Python 3.10
  • Do NOT use Python 3.11 for M2 because the pickle file will not load.

Download Python here: https://www.python.org

Install Packages

Run this:

  • pip install pandas
  • pip install numpy
  • pip install scikit-learn
  • pip install nltk
  • pip install sentence-transformers
  • pip install torch

Files Needed for This Project

Make sure the following files are inside the same folder as the Python scripts (Baseline.py, m1.py, m2.py).
If any of these are missing or placed in the wrong folder, the code will NOT run CORRECTLY.

1. Lexicon Files (For M1 and M2)

  • positive-words.txt
  • negative-words.txt

These come from the Hu & Liu (2004) opinion lexicon and are used to detect positive or negative words when filtering reviews by polarity.

2. reviews_segment.pkl (Main Dataset)

This is the main Amazon review dataset for the project.
It contains:

  • review text
  • review titles
  • star ratings
  • product/user metadata

Your Baseline, M1, and M2 load this file using:

df = pd.read_pickle("reviews_segment.pkl")

data.pkl

  • This file contains precomputed BERT sentence embeddings for every sentence in the review corpus
  • is required by Method 2 (M2) for semantic similarity search
  • Credit: data.pkl was generated and provided by TA Navid Ayoobi

M2 uses this file to compare your query embedding with all stored sentence embeddings

NOTE : data.pkl will not be included inside my file zip due to file being too large

How to Run Each Method

All scripts must be run from inside the Codes folder so the output paths resolve correctly. Open a terminal and move into the Codes directory:

python3 Baseline.py
python3 m1.py
python3 m2.py

Output Files

All retrieved results for this project are saved into the Outputs directory.
Each method (Baseline, M1, M2) writes its own output files for the five required queries:

  • audio_quality
  • wifi_signal
  • mouse_button
  • gps_map
  • image_quality

Congratulations!!!

If you made it this far, then you have successfully set up the project and run all three methods: Baseline, Method 1, and Method 2. All outputs should now be saved in the Outputs folder for each of the five required queries.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages