NLP Sentiment Analysis with Yelp Review Data Set

As one of the most popular local business information app in North America, Yelp is widely used for review and rating. User reviews can bring insight for business owners for service improvement, and help potential customers find their choice of dining, shopping and other local services.

Therefore, the objective of this project is to identify the polarity (positive or negative) of Yelp reviews, and extract keywords that contribute to the positive or negative reviews. The project would mainly focus on enhancing customers’ understanding of restaurants, educating new business owners on market knowledge, and improving existing merchants’ awareness about their performance.

Tasks

Identify the polarity (positive or negative) of Yelp reviews using machine learning techniques, specifically, NLP sentiment analysis with Keras.
Extract keywords and key features from positive and negative reviews using word cloud and SVM regression.

Data

Yelp reviews polarity dataset

The Yelp reviews dataset consists of reviews from Yelp Dataset Challenge 2015 data. The Yelp reviews polarity dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) and retrieved from https://course.fast.ai/datasets. The Yelp reviews polarity dataset is constructed by considering stars 1 and 2 negative, and 3 and 4 positive. For each polarity 280,000 training samples and 19,000 testing samples are take randomly. In total there are 560,000 trainig samples and 38,000 testing samples. Negative polarity is class 1, and positive class 2.

Yelp reviews sample dataset

The original Yelp reviews dataset is retrieved from Yelp directly: https://www.yelp.com/dataset/download. The json file "yelp_academic_dataset_review.json" is very large, containing 6,990,280 lines. So we read the top 50,000 rows as sample for this tentative analysis.

Algorithms

Convolutional Neural Networks (CNN)
Recurrent Neural Networks (RNN)
Support Vector Machines (SVM)

Packages & docs

The code was written in Jupyter Notebook with Python, using Pandas for data manipulation, Tensorflow and Scikit-learn for machine learning, NLTK for NLP, and wordcloud and matplotlib for visualization.

Yelp_sentiment_analysis.ipynb: data cleaning, processing, model building, training, testing, evaluation, and model deployment on original Yelp sample dataset
Yelp_word_cloud.ipynb: key words extraction and data visualization

Result example

For a sample business with 503 reviews:

Keywords in positive reviews

Keywords in negative reviews

Keywords contributing to positive & negative reviews

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
Yelp_sentiment_analysis.ipynb		Yelp_sentiment_analysis.ipynb
Yelp_word_cloud.ipynb		Yelp_word_cloud.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Sentiment Analysis with Yelp Review Data Set

Tasks

Data

Yelp reviews polarity dataset

Yelp reviews sample dataset

Algorithms

Packages & docs

Result example

About

Releases

Packages

Languages

dailyLi/yelp_da

Folders and files

Latest commit

History

Repository files navigation

NLP Sentiment Analysis with Yelp Review Data Set

Tasks

Data

Yelp reviews polarity dataset

Yelp reviews sample dataset

Algorithms

Packages & docs

Result example

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages