## Jayden Mathew
## 4/23/2024
## Yelp API Sentiment Analysis on Reviews

### Question: How good are different sushi restaurants/businesses in New Brunswick, New Jersey?

#### Through the use of sentiment analysis on Yelp reviews, it can be understood how well consumers enjoy sushi in the New Brunswick area.

#### Import statements

In [13]:
import requests

import yelpkeys # file where client_id and api_key is defined with user's private app information
import nltk
from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from collections import Counter

In [14]:
APIKEY = yelpkeys.api_key
headers = {'Authorization': 'Bearer %s' % yelpkeys.api_key,}

#### Yelp Fusion API code

In [16]:
from __future__ import print_function

import argparse
import json
import pprint
import requests
import sys
import urllib


# This client code can run on Python 2.x or 3.x.  Your imports can be
# simpler if you only need one of those.
try:
    # For Python 3.0 and later
    from urllib.error import HTTPError
    from urllib.parse import quote
    from urllib.parse import urlencode
except ImportError:
    # Fall back to Python 2's urllib2 and urllib
    from urllib import HTTPError
    from urllib import quote
    from urllib import urlencode


# Yelp Fusion no longer uses OAuth as of December 7, 2017.
# You no longer need to provide Client ID to fetch Data
# It now uses private keys to authenticate requests (API Key)
# You can find it on
# https://www.yelp.com/developers/v3/manage_app
API_KEY= yelpkeys.api_key


# API constants, you shouldn't have to change these.
API_HOST = 'https://api.yelp.com'
SEARCH_PATH = '/v3/businesses/search'
BUSINESS_PATH = '/v3/businesses/'  # Business ID will come after slash.


# Defaults for our simple example.
DEFAULT_TERM = 'Sushi'
DEFAULT_LOCATION = 'New Brunswick, NJ'
SEARCH_LIMIT = 20


def request(host, path, api_key, url_params=None):
    """Given your API_KEY, send a GET request to the API.
    Args:
        host (str): The domain host of the API.
        path (str): The path of the API after the domain.
        API_KEY (str): Your API Key.
        url_params (dict): An optional set of query parameters in the request.
    Returns:
        dict: The JSON response from the request.
    Raises:
        HTTPError: An error occurs from the HTTP request.
    """
    url_params = url_params or {}
    url = '{0}{1}'.format(host, quote(path.encode('utf8')))
    headers = {
        'Authorization': 'Bearer %s' % api_key,
    }

    print(u'Querying {0} ...'.format(url))

    response = requests.request('GET', url, headers=headers, params=url_params)

    return response.json()


def search(api_key, term, location):
    """Query the Search API by a search term and location.
    Args:
        term (str): The search term passed to the API.
        location (str): The search location passed to the API.
    Returns:
        dict: The JSON response from the request.
    """

    url_params = {
        'term': term.replace(' ', '+'),
        'location': location.replace(' ', '+'),
        'limit': SEARCH_LIMIT
    }
    return request(API_HOST, SEARCH_PATH, api_key, url_params=url_params)


def get_business(api_key, business_id):
    """Query the Business API by a business ID.
    Args:
        business_id (str): The ID of the business to query.
    Returns:
        dict: The JSON response from the request.
    """
    business_path = BUSINESS_PATH + business_id

    return request(API_HOST, business_path, api_key)


def query_api(term, location):
    """Queries the API by the input values from the user.
    Args:
        term (str): The search term to query.
        location (str): The location of the business to query.
    """
    response = search(API_KEY, term, location)

    businesses = response.get('businesses')

    if not businesses:
        print(u'No businesses for {0} in {1} found.'.format(term, location))
        return

    business_id = businesses[0]['id']

    print(u'{0} businesses found, querying business info ' \
        'for the top result "{1}" ...'.format(
            len(businesses), business_id))
    response = get_business(API_KEY, business_id)

    print(u'Result for business "{0}" found:'.format(business_id))
    pprint.pprint(response, indent=2)

### Prints response, the variable containing all the businesses, reviews, etc., from the search factors on Yelp API.

In [17]:
response = search(API_KEY,DEFAULT_TERM, 'New Brunswick, NJ')
print(response)

Querying https://api.yelp.com/v3/businesses/search ...
{'businesses': [{'id': 'LdWYqYi-EdMayOMMjbNWjA', 'alias': 'sakana-sushi-and-japanese-cuisine-new-brunswick', 'name': 'Sakana Sushi & Japanese Cuisine', 'image_url': 'https://s3-media3.fl.yelpcdn.com/bphoto/SKnlWogarSHsQgmNeGo-Fg/o.jpg', 'is_closed': False, 'url': 'https://www.yelp.com/biz/sakana-sushi-and-japanese-cuisine-new-brunswick?adjust_creative=Xr01iyL_bx8Dr_7vMcc3Hw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=Xr01iyL_bx8Dr_7vMcc3Hw', 'review_count': 120, 'categories': [{'alias': 'sushi', 'title': 'Sushi Bars'}, {'alias': 'japanese', 'title': 'Japanese'}], 'rating': 4.0, 'coordinates': {'latitude': 40.493972, 'longitude': -74.444027}, 'transactions': ['restaurant_reservation', 'delivery', 'pickup'], 'price': '$$', 'location': {'address1': '338 George St', 'address2': None, 'address3': '', 'city': 'New Brunswick', 'zip_code': '08901', 'country': 'US', 'state': 'NJ', 'display_address': ['338 George St

### Prints business names, rating, review count, and three reviews (with the sentiment of these reviews) for all processed businesses from the search.

In [19]:
for business in response['businesses']:
    print('Business Name: ' + business['name'])
    print('Rating: ' + str(business['rating']))
    print('Review Count: ' + str(business['review_count']))
    print('Reviews: ')
    BUS_REVIEW='/v3/businesses/{}/reviews'.format(business['id'])
    #print('https://api.yelp.com/v3/businesses/{}/reviews?limit=20&sort_by=yelp_sort'.format(business['id']))
    reviews = request(API_HOST,BUS_REVIEW, API_KEY)
    for review in reviews['reviews']:
        print(' - ' + review['text'])
        # Perform sentiment analysis on the review using TextBlob
        blob = TextBlob(review['text'])
        sentiment = blob.sentiment.polarity
        if sentiment > 0:
            print('    Sentiment: Positive')
        elif sentiment < 0:
            print('    Sentiment: Negative')
        else:
            print('    Sentiment: Neutral')
        print('\n================\n\n\n')

Business Name: Sakana Sushi & Japanese Cuisine
Rating: 4.0
Review Count: 120
Reviews: 
Querying https://api.yelp.com/v3/businesses/LdWYqYi-EdMayOMMjbNWjA/reviews ...


KeyError: 'reviews'

### Downloads the stop words

In [6]:
# Download NLTK resources
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to C:\Users\Jayden
[nltk_data]     Mathew\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

### Saves and prints only the english stop words found in the nltk download.

In [7]:
#sets english stopwords
from nltk.corpus import stopwords

stops = set(stopwords.words('english'))
stops

{'a',
 'about',
 'above',
 'after',
 'again',
 'against',
 'ain',
 'all',
 'am',
 'an',
 'and',
 'any',
 'are',
 'aren',
 "aren't",
 'as',
 'at',
 'be',
 'because',
 'been',
 'before',
 'being',
 'below',
 'between',
 'both',
 'but',
 'by',
 'can',
 'couldn',
 "couldn't",
 'd',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'down',
 'during',
 'each',
 'few',
 'for',
 'from',
 'further',
 'had',
 'hadn',
 "hadn't",
 'has',
 'hasn',
 "hasn't",
 'have',
 'haven',
 "haven't",
 'having',
 'he',
 'her',
 'here',
 'hers',
 'herself',
 'him',
 'himself',
 'his',
 'how',
 'i',
 'if',
 'in',
 'into',
 'is',
 'isn',
 "isn't",
 'it',
 "it's",
 'its',
 'itself',
 'just',
 'll',
 'm',
 'ma',
 'me',
 'mightn',
 "mightn't",
 'more',
 'most',
 'mustn',
 "mustn't",
 'my',
 'myself',
 'needn',
 "needn't",
 'no',
 'nor',
 'not',
 'now',
 'o',
 'of',
 'off',
 'on',
 'once',
 'only',
 'or',
 'other',
 'our',
 'ours',
 'ourselves',
 'out',
 'over',
 'own',
 'r

### Processes all the reviews and performs sentiment analysis using Textblob and NaiveBayesAnalyzer.

In [8]:
# Perform sentiment analysis with TextBlob and NaiveBayesAnalyzer
positive_textblob, negative_textblob, neutral_textblob = 0, 0, 0
positive_naive_bayes, negative_naive_bayes, neutral_naive_bayes = 0, 0, 0

wordsc = []
# Loop through each business in the response
for business in response['businesses']:
    reviews = request(API_HOST, '/v3/businesses/{}/reviews'.format(business['id']), API_KEY)
    for review in reviews['reviews']:
        # Perform stopword removal
        processed_review = [word for word in TextBlob(review['text']).words if word.lower() not in stops]
        wordsc.append(' '.join(processed_review))
        # Sentiment analysis with TextBlob
        sentiment_tb = TextBlob(' '.join(processed_review)).sentiment.polarity
        if sentiment_tb > 0:
            positive_textblob += 1
        elif sentiment_tb < 0:
            negative_textblob += 1
        else:
            neutral_textblob += 1
        
        # Sentiment analysis with NaiveBayesAnalyzer
        sentiment_nb = TextBlob(' '.join(processed_review), analyzer=NaiveBayesAnalyzer()).sentiment.classification
        if sentiment_nb == 'pos':
            positive_naive_bayes += 1
        elif sentiment_nb == 'neg':
            negative_naive_bayes += 1
        else:
            neutral_naive_bayes += 1
            
print (positive_textblob)

Querying https://api.yelp.com/v3/businesses/IOyhF0TRcIFm5dwdIsUjVA/reviews ...


KeyError: 'reviews'

### Prints the positive, neutral, and negative sentiment analyses from the Textblob and NaiveBayesAnalyzer results.

In [9]:
print (positive_textblob)
print (negative_textblob)
print (neutral_textblob)
print (positive_naive_bayes)
print (negative_naive_bayes)
print (neutral_naive_bayes)

0
0
0
0
0
0


### Displays Textblob Donut Chart (representing positive, neutral, and negative sentiment analyses).

In [10]:
# Plot donut chart for TextBlob sentiment analysis
labels = list(["positive", "negative", "neutral"])
sizes = list([positive_textblob, negative_textblob, neutral_textblob])

fig, ax = plt.subplots()
wedges, texts, autotexts = ax.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90, colors=['lightgreen', 'lightcoral', 'lightskyblue'], wedgeprops={'width': 0.4})
ax.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

# Add a circle at the center to transform the pie chart into a donut chart
centre_circle = plt.Circle((0,0),0.70,fc='white')
fig.gca().add_artist(centre_circle)

ax.set_title('TextBlob Sentiment Analysis')

plt.show()


  x = x / sx


ValueError: cannot convert float NaN to integer

posx and posy should be finite values
posx and posy should be finite values
posx and posy should be finite values
posx and posy should be finite values


ValueError: need at least one array to concatenate

<Figure size 640x480 with 1 Axes>

### Displays NaiveBayesAnalyzer Donut Chart (representing positive, neutral, and negative sentiment analyses).

In [11]:
# Plot donut chart for NaiveBayes sentiment analysis
labels = list(["positive", "negative", "neutral"])
sizes = list([positive_naive_bayes, negative_naive_bayes, neutral_naive_bayes])

fig, ax = plt.subplots()
wedges, texts, autotexts = ax.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90, colors=['lightgreen', 'lightcoral', 'lightskyblue'], wedgeprops={'width': 0.4})
ax.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

# Add a circle at the center to transform the pie chart into a donut chart
centre_circle = plt.Circle((0,0),0.70,fc='white')
fig.gca().add_artist(centre_circle)

ax.set_title('NaiveBayes Sentiment Analysis')

plt.show()


ValueError: cannot convert float NaN to integer

posx and posy should be finite values
posx and posy should be finite values
posx and posy should be finite values
posx and posy should be finite values


ValueError: need at least one array to concatenate

<Figure size 640x480 with 1 Axes>

### Unfortunately, no conclusions were made due to an error found in the code of the project. The inability to locate the key 'reviews' made it so that the reviews of each company could not be processed, and therefore no sentiment analysis was done. However, the code for the sentiment analysis portions as well as the word cloud were programmed so that if the key was found the code would properly work and display all necessary information to draw conclusions. 

### Word Cloud of the top 20 words

In [12]:
def generate_wordcloud(review_texts):
    words = ' '.join(review_texts)
    wordFreq = Counter(words.split())
    wordsc = WordCloud(width=800, height=400, background_color='white', colormap='prism', max_words=20)
    wordsc.generate_from_frequencies(wordFreq)

    plt.figure(figsize=(10, 5))
    plt.imshow(wordsc, interpolation='bilinear')
    plt.axis('off')
    plt.show()

# Generate the word cloud
print(wordsc)
generate_wordcloud(wordsc)

[]


ValueError: We need at least 1 word to plot a word cloud, got 0.