### Web Scraping Lab

Welcome!  In today's lab we're going to build a web scraper that's going to build a dataset from restaurant listings on a yelp page.  

You can find the web page here: https://www.yelp.com/search?find_desc=Restaurants&find_loc=London%2C%20United%20Kingdom&ns=1

The lab questions listed herein will naturally build from the material discussed in class.

In [1]:
import pandas as pd
import requests
import bs4

In [2]:
url = 'https://www.yelp.com/search?find_desc=Restaurants&find_loc=London%2C%20United%20Kingdom&ns=1'
req = requests.get(url)

In [3]:
scraper = bs4.BeautifulSoup(req.text)

In [4]:
type(scraper)

bs4.BeautifulSoup

#### Step 1:  Scrape the number of reviews for each restaurant

Using a similar methodology we used to find the title of the restaurant find the number of reviews for the 30 restaurants listed on this web page.

**Hint:** the `isdigit()` string method will be helpful to detect if a string contains a number.

In [14]:

page_links = scraper.find_all('span')



In [15]:
page_links[0]

<span class="pseudo-input-text__09f24__EWsQz display--inline__09f24__3iACj border-color--default__09f24__R1nRO"><div class="pseudo-input-icon-container__09f24__31kP9 margin-r1-5__09f24__3fQB8 border-color--default__09f24__R1nRO"><style data-emotion-css="1mpk29p">.css-1mpk29p{width:24px;height:24px;display:inline-block;vertical-align:middle;position:relative;overflow:hidden;top:-0.1em;fill:rgba(43,39,60,1);}.css-1mpk29p::before{position:absolute;display:block;left:0;}.css-1mpk29p::after{content:'';display:block;position:absolute;left:0;right:0;top:0;bottom:0;}.css-1mpk29p svg{position:absolute;width:100%;height:100%;fill:inherit;display:block;left:0;top:0;right:0;bottom:0;}</style><span aria-hidden="false" aria-label="Find" class="icon--24-search-v2 pseudo-input-icon__09f24__3Pmw2 css-1mpk29p"><svg class="icon_svg" height="24" width="24"><path d="M21.853 20.355l-3.444-3.443a9.428 9.428 0 10-16.761-6.171 9.428 9.428 0 0015.348 7.586l3.443 3.442a1 1 0 101.414-1.414zM5.82 16.245a7.429 7.42

In [16]:
len(page_links)

248

In [26]:
reviews_of_each = [link.text for link in page_links if 'reviewCount' in str(link)]

In [27]:
len(review_links)

10

In [28]:
review_links

['269', '279', '703', '1842', '108', '205', '468', '205', '380', '118']

#### Step 2:  Find the price range for each restaurant

Let's create data for the price range of each restaurant as well, as denoted by the $ \unicode{x00A3} $ symbol listed on each website.

**Hint 1:** The information that you need for this was selected using the same criteria to find the number of reviews.

**Hint 2:** You type the $ \unicode{x00A3} $ symbol in python with the expression `\xA3`

In [None]:
# your code here

In [32]:
page_links2 = scraper.find_all('div')

In [45]:
class_selector_container = {'class':'container__09f24__21w3G'} # Most useful! 

In [46]:
page_links2 = scraper.find_all('div',class_selector_container)

In [39]:
print('\xA3')

£


In [48]:
type(page_links2[0])

bs4.element.Tag

In [49]:
pricing = [link.text for link in page_links2 if '\xA3' in str(link)]

In [55]:
pricing_level = [item.count('\xA3') for item in pricing]


In [56]:
pricing_level

[2, 2, 3, 2, 2, 4, 2, 2, 2, 2]

In [57]:
pricing_level_pounds = ['\xA3'*i for i in pricing_level]

In [58]:
pricing_level_pounds

['££', '££', '£££', '££', '££', '££££', '££', '££', '££', '££']

#### Step 3:  Build a Dataframe For the Title, # of Ratings and Price Range of Each Restaurant

You will need to create a dictionary structured in the following way:

` {
    'Title': [list with the titles of each restaurant],
    'NumRatings': [list with the number of ratings of each restaurant],
    'Price Range': [list with the price range of each restaurant]
}`

In [None]:
# Step to get names of restaurants

In [66]:
names_of_restaurants = ['a','a','a','a','a','a','a','a','a','a']

In [69]:
page_links2[0]

<div class="container__09f24__21w3G hoverable__09f24__2nTf3 margin-t3__09f24__5bM2Z margin-b3__09f24__1DQ9x padding-t3__09f24__-R_5x padding-r3__09f24__1pBFG padding-b3__09f24__1vW6j padding-l3__09f24__1yCJf border--top__09f24__1H_WE border--right__09f24__28idl border--bottom__09f24__2FjZW border--left__09f24__33iol border-color--default__09f24__R1nRO"><div class="border-color--default__09f24__R1nRO"><div class="arrange__09f24__AiSIM border-color--default__09f24__R1nRO"><div class="arrange-unit__09f24__1gZC1 border-color--default__09f24__R1nRO"><div class="border-color--default__09f24__R1nRO"><div class="display--inline-block__09f24__FsgS4 margin-r3__09f24__2CcgQ border-color--default__09f24__R1nRO"><div class="border-color--default__09f24__R1nRO"><div class="container__09f24__2BlDc dontTriggerCardClick__09f24__2GWZF container--responsive__09f24__1BXoY container__09f24__1SkkV border-color--default__09f24__R1nRO" data-slidenum="0"><div aria-label="Slideshow" class="container__09f24__15L

In [None]:
# your code here

In [67]:
my_data = {
    'Title': names_of_restaurants,
    'NumRating': review_links,
    'Price Range':pricing_level_pounds,
}

In [68]:
pd.DataFrame(my_data)

Unnamed: 0,Title,NumRating,Price Range
0,a,269,££
1,a,279,££
2,a,703,£££
3,a,1842,££
4,a,108,££
5,a,205,££££
6,a,468,££
7,a,205,££
8,a,380,££
9,a,118,££
