# Yelp.ca Scraping Project

In this project, we will be scraping restaurant information from Yelp toronto website(https://www.yelp.ca/toronto). Make inferences from the data collected and finally going a sentiment analysis of the reviews collected and implementing Natural Language Processing (NLP) techniques. 

## Table of Contents
- Imports
- Fetching URL
- Creating Scraper

### Imports

In [1]:
import os
import requests
import re
from bs4 import BeautifulSoup
from csv import writer

### Fetching the url

In [5]:
#"https://www.yelp.ca/biz/khao-san-road-toronto?page_src=related_bizes"
url = input("Enter url of a yelp restaurant: ")
page = requests.get(url)

if not re.match(r'https?://www.yelp.ca/',url):
    print('Please enter a valid yelp.ca url')
    sys.exit(1)

soup = BeautifulSoup(page.content, 'html.parser')

for div in soup.find_all("div", class_ = 'block-quote__09f24__nMk2G padding-l3__09f24__IOjKY border-color--default__09f24__NPAKY'): 
    div.decompose()

Enter url of a yelp restaurant: https://www.yelp.ca/biz/jollibee-toronto-16


Using Jolibee Restaurant url on yelp.ca

### Creating Scraper

Extracting Restaurant Name

In [9]:
title = soup.find('h1', class_ = 'css-1se8maq')
title_text = title.text
title_text

'Jollibee'

Extracting Total Reviews

In [11]:
no_reviews = soup.find('span', class_ = 'css-1fdy0l5')
no_reviews.text
review_count=no_reviews.text.split()
tot_reviews=review_count[0]
tot_reviews

'8'

Exracting Reviews Text

In [13]:
review_text = soup.find_all('p', class_ = 'comment__09f24__gu0rG css-qgunke')
review_text

tot_review_text = []

for i in review_text:
    tot_review_text.append(i.text)
    
tot_review_text

["The food is good here!!!! The friendly quick and efficient staff big thanks to them. Can't go wrong with chicken and spaghetti so goood. The peach mango pie is a must have. The pineapple juice was also very good. Enjoyed food here and take out. Would come back again A++++. Also this location doesn't have crazy kuku line ups like the other ones",
 'I love coming here when I visit Toronto :) fast and friendly service and the food is always fresh and yummy. I love the jolly spaghetti, the chicken and peach mango pie \xa0also the fresh pineapple juice is super refreshing. I love JOLLIBEE',
 "I'm Filipino and biased, but I firmly believe with all my heart and tastebuds that Jollibee's spicy fried chicken is the best fastfood fried chicken out there. (I am objective enough to also know this isn't the case for the original fried chicken).Other go-to's:- Jolly Spaghetti: Sweet spaghetti. Sounds weird? Yes it is. But it pairs sooooo well with the spicy fried chicken! And at the very least, it

Extracting Reviewer Names

In [15]:
reviewer = soup.find_all('span', class_="fs-block css-ux5mu6")
reviewer_name = []

for i in reviewer:
    reviewer_name.append(i.text)
    
reviewer_name = reviewer_name[1:]
reviewer_name

['Jonny T.',
 'T S.',
 'David Y.',
 'Lisa B.',
 'Amelia J.',
 '佳琛',
 'David N.',
 'Daisy W.']

Extracting the Ratings given by each User

In [17]:
rating = soup.find_all('div', class_ = 'five-stars__09f24__mBKym five-stars--regular__09f24__DgBNj display--inline-block__09f24__fEDiJ border-color--default__09f24__NPAKY')

tot_rating =[]

for i in rating:
    p=i.get('aria-label')
    tot_rating.append(p)
tot_rating = tot_rating[:10]
tot_rating

['5 star rating',
 '5 star rating',
 '4 star rating',
 '5 star rating',
 '4 star rating',
 '4 star rating',
 '3 star rating',
 '5 star rating']

Extracting User Location

In [20]:
loc = soup.find_all('span', class_ ='css-qgunke')
loc_text = []

for i in loc:
    loc_text.append(i.text)
loc_text=loc_text[6:-1]
loc_text

['Toronto, ON',
 'Grosse Pointe, United States',
 'Toronto, ON',
 'Toronto, ON',
 'Toronto, ON',
 'Toronto, ON',
 'Toronto, ON',
 'Old Toronto, Toronto, ON']

Overall Rating

In [26]:
o_r = soup.find_all('div', class_ = 'five-stars__09f24__mBKym five-stars--large__09f24__Waiqf display--inline-block__09f24__fEDiJ border-color--default__09f24__NPAKY')
overall_rating = []
for i in o_r:    
    g=i.get('aria-label')
    overall_rating.append(g)
overall_rating = overall_rating[0]
overall_rating

'4.5 star rating'

Opening/ closing information, restaurant website url and contact details 

In [27]:
info = soup.find_all('p', class_ = 'css-1p9ibgf')
info_list = []
for i in info:
    info_list.append(i.text)
info_list

['Mon',
 '10:00 AM - 10:00 PM',
 '10:00 AM - 10:00 PM',
 'Wed',
 '10:00 AM - 10:00 PM',
 'Thu',
 '10:00 AM - 10:00 PM',
 'Fri',
 '10:00 AM - 10:00 PM',
 'Sat',
 '10:00 AM - 10:00 PM',
 'Sun',
 '10:00 AM - 10:00 PM',
 'http://jollibeecanada.com',
 '(647) 424-4772',
 'Get Directions']

In [29]:
phone = info_list[-2]
website = info_list[-3]
print(phone, website)

(647) 424-4772 http://jollibeecanada.com


Writing a CSV file to store the information acquired

In [32]:
try:
    with open(f'{title_text}.csv', 'w', encoding = 'utf8', newline = '') as f:
        thewriter = writer(f)
        header = ['User_Review', 'Name', 'Given_Rating', 'User_Location']
        thewriter.writerow(header)    
        for i in range(0,len(tot_rating)):
            info = [tot_review_text[i], reviewer_name[i], tot_rating[i], loc_text[i]]
            thewriter.writerow(info)
    print(f"The file {title_text}.csv created.")
except FileNotFoundError:
    print(f"The file {title_text} does not exist.")
except:
    print(f"An error occurred while writing to {title_text}.")

The file Jollibee.csv created.
