Amazon-Review-Text-Mining

In this task, our goal is to predict the rating score (an integer out of 1, 2, 3, 4, 5) given only the Amazon review text. In other words, we will try to capture the customer's overall satisfaction about the product from the review text that he/she wrote. We will explore diverse types of preprocessing raw text data into a format that can be provided to machine learning models. More specifically, we will try several bag-of-words models to represent the Amazon review text and examine which type of representation of the text yields the best predictive performance.

For the data, we will be using the Amazon reviews on movies and TV ranging from May 1996 to July 2014. Here is the dataset source: http://jmcauley.ucsd.edu/data/amazon/. The original dataset contains information about 1697533 reviews.

Note: The data-related files (original dataset json file, pickle files and scipy sparse matrix npz files that are loaded during the work) are not uploaded here due to their large sizes. But the Jupyter Notebook file contains all the code (commented) used for creating (and saving) those files.

[Dataset citations]

Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering
R. He, J. McAuley
WWW, 2016
Image-based recommendations on styles and substitutes
J. McAuley, C. Targett, J. Shi, A. van den Hengel
SIGIR, 2015

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Amazon Reviews Text Mining.ipynb		Amazon Reviews Text Mining.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon-Review-Text-Mining

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Amazon-Review-Text-Mining

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages