Skip to content

danieljunhee/Amazon-Review-Text-Mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Amazon-Review-Text-Mining

In this task, our goal is to predict the rating score (an integer out of 1, 2, 3, 4, 5) given only the Amazon review text. In other words, we will try to capture the customer's overall satisfaction about the product from the review text that he/she wrote. We will explore diverse types of preprocessing raw text data into a format that can be provided to machine learning models. More specifically, we will try several bag-of-words models to represent the Amazon review text and examine which type of representation of the text yields the best predictive performance.

For the data, we will be using the Amazon reviews on movies and TV ranging from May 1996 to July 2014. Here is the dataset source: http://jmcauley.ucsd.edu/data/amazon/. The original dataset contains information about 1697533 reviews.

Note: The data-related files (original dataset json file, pickle files and scipy sparse matrix npz files that are loaded during the work) are not uploaded here due to their large sizes. But the Jupyter Notebook file contains all the code (commented) used for creating (and saving) those files.

[Dataset citations]

  • Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering
    R. He, J. McAuley
    WWW, 2016

  • Image-based recommendations on styles and substitutes
    J. McAuley, C. Targett, J. Shi, A. van den Hengel
    SIGIR, 2015

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors