### Web Scraping Lab

Welcome!  In today's lab we're going to build a web scraper that's going to build a dataset from restaurant listings on a yelp page.  

You can find the web page here: https://www.yelp.com/search?find_desc=Restaurants&find_loc=London%2C%20United%20Kingdom&ns=1

The lab questions listed herein will naturally build from the material discussed in class.

In [1]:
import pandas as pd
import requests
import bs4

In [2]:
url = 'https://www.yelp.com/search?find_desc=Restaurants&find_loc=London%2C%20United%20Kingdom&ns=1'
req = requests.get(url)

In [3]:
scraper = bs4.BeautifulSoup(req.text)

In [4]:
type(scraper)

bs4.BeautifulSoup

#### Step 1:  Scrape the number of reviews for each restaurant

Using a similar methodology we used to find the title of the restaurant find the number of reviews for the 30 restaurants listed on this web page.

**Hint:** the `isdigit()` string method will be helpful to detect if a string contains a number.

In [5]:

page_links = scraper.find_all('span')



In [6]:
page_links[0]

<span class="pseudo-input-text__09f24__EWsQz display--inline__09f24__3iACj border-color--default__09f24__R1nRO"><div class="pseudo-input-icon-container__09f24__31kP9 margin-r1-5__09f24__3fQB8 border-color--default__09f24__R1nRO"><style data-emotion-css="1mpk29p">.css-1mpk29p{width:24px;height:24px;display:inline-block;vertical-align:middle;position:relative;overflow:hidden;top:-0.1em;fill:rgba(43,39,60,1);}.css-1mpk29p::before{position:absolute;display:block;left:0;}.css-1mpk29p::after{content:'';display:block;position:absolute;left:0;right:0;top:0;bottom:0;}.css-1mpk29p svg{position:absolute;width:100%;height:100%;fill:inherit;display:block;left:0;top:0;right:0;bottom:0;}</style><span aria-hidden="false" aria-label="Find" class="icon--24-search-v2 pseudo-input-icon__09f24__3Pmw2 css-1mpk29p"><svg class="icon_svg" height="24" width="24"><path d="M21.853 20.355l-3.444-3.443a9.428 9.428 0 10-16.761-6.171 9.428 9.428 0 0015.348 7.586l3.443 3.442a1 1 0 101.414-1.414zM5.82 16.245a7.429 7.42

In [13]:
len(page_links)

250

In [38]:
reviews_of_each = [link.text for link in page_links if 'reviewCount' in link.text]

In [17]:
review_links = [link.text for link in page_links if 'reviewCount' in str(link)]

In [18]:
len(review_links)

10

In [19]:
review_links

['279', '270', '1842', '205', '380', '118', '205', '108', '703', '241']

#### Step 2:  Find the price range for each restaurant

Let's create data for the price range of each restaurant as well, as denoted by the $ \unicode{x00A3} $ symbol listed on each website.

**Hint 1:** The information that you need for this was selected using the same criteria to find the number of reviews.

**Hint 2:** You type the $ \unicode{x00A3} $ symbol in python with the expression `\xA3`

In [None]:
# your code here

In [20]:
page_links2 = scraper.find_all('div')

In [21]:
class_selector_container = {'class':'container__09f24__21w3G'} # Most useful! 

In [22]:
page_links2 = scraper.find_all('div',class_selector_container)

In [23]:
print('\xA3')

£


In [24]:
type(page_links2[0])

bs4.element.Tag

In [25]:
pricing = [link.text for link in page_links2 if '\xA3' in str(link)]

In [26]:
pricing_level = [item.count('\xA3') for item in pricing]


In [27]:
pricing_level

[2, 2, 2, 4, 2, 2, 2, 2, 3, 2]

In [28]:
pricing_level_pounds = ['\xA3'*i for i in pricing_level]

In [29]:
pricing_level_pounds

['££', '££', '££', '££££', '££', '££', '££', '££', '£££', '££']

#### Step 3:  Build a Dataframe For the Title, # of Ratings and Price Range of Each Restaurant

You will need to create a dictionary structured in the following way:

` {
    'Title': [list with the titles of each restaurant],
    'NumRatings': [list with the number of ratings of each restaurant],
    'Price Range': [list with the price range of each restaurant]
}`

In [None]:
# Step to get names of restaurants

In [45]:
targ_class = {
    'class':'link__09f24__1kwXV',
    'class':'link-color--inherit__09f24__3PYlA',
    'class':'link-size--inherit__09f24__2Uj95',
}

In [47]:
page_links2 = scraper.find_all('a',targ_class)


In [49]:
len(page_links2)

29

In [50]:
page_links2[0]

<a class="link__09f24__1kwXV link-color--inherit__09f24__3PYlA link-size--inherit__09f24__2Uj95" href="/biz/the-mayfair-chippy-london-2?osq=Restaurants" name="The Mayfair Chippy" rel="" target="">The Mayfair Chippy</a>

In [51]:
page_links2

[<a class="link__09f24__1kwXV link-color--inherit__09f24__3PYlA link-size--inherit__09f24__2Uj95" href="/biz/the-mayfair-chippy-london-2?osq=Restaurants" name="The Mayfair Chippy" rel="" target="">The Mayfair Chippy</a>,
 <a class="link__09f24__1kwXV link-color--blue-dark__09f24__2DRa0 link-size--inherit__09f24__2Uj95" href="/biz/the-mayfair-chippy-london-2?hrid=vnmW77PgAFRDmiN9PqIW9Q&amp;osq=Restaurants" name="" rel="" target="">more</a>,
 <a class="link__09f24__1kwXV link-color--inherit__09f24__3PYlA link-size--inherit__09f24__2Uj95" href="/biz/ffionas-restaurant-london?osq=Restaurants" name="Ffiona’s Restaurant" rel="" target="">Ffiona’s Restaurant</a>,
 <a class="link__09f24__1kwXV link-color--blue-dark__09f24__2DRa0 link-size--inherit__09f24__2Uj95" href="/biz/ffionas-restaurant-london?hrid=IixyLOwPCR5t9TAoJcPFoQ&amp;osq=Restaurants" name="" rel="" target="">more</a>,
 <a class="link__09f24__1kwXV link-color--inherit__09f24__3PYlA link-size--inherit__09f24__2Uj95" href="/biz/disho

In [52]:
restaurant_names = [link.text for link in page_links2 if 'osq=Restaurants' in str(link)]

In [53]:
len(restaurant_names)

20

In [54]:
restaurant_names

['The Mayfair Chippy',
 'more',
 'Ffiona’s Restaurant',
 'more',
 'Dishoom',
 'more',
 'Restaurant Gordon Ramsay',
 'more',
 'Flat Iron',
 'more',
 'The Queens Arms',
 'more',
 'Padella',
 'more',
 'The Golden Chippy',
 'more',
 'Duck & Waffle',
 'more',
 'The Grazing Goat',
 'more']

In [55]:
restaurant_names_2 = [r for r in restaurant_names if 'more' not in r]

In [56]:
restaurant_names_2

['The Mayfair Chippy',
 'Ffiona’s Restaurant',
 'Dishoom',
 'Restaurant Gordon Ramsay',
 'Flat Iron',
 'The Queens Arms',
 'Padella',
 'The Golden Chippy',
 'Duck & Waffle',
 'The Grazing Goat']

In [57]:
restaurant_names[0]

'The Mayfair Chippy'

In [None]:
# your code here

In [62]:
my_data = {
    'Title': restaurant_names_2,
    'NumRating': review_links,
    'Price Range':pricing_level_pounds,
    'Price Range Level': pricing_level,
}

In [63]:
output_df = pd.DataFrame(my_data)

In [64]:
output_df

Unnamed: 0,Title,NumRating,Price Range,Price Range Level
0,The Mayfair Chippy,279,££,2
1,Ffiona’s Restaurant,270,££,2
2,Dishoom,1842,££,2
3,Restaurant Gordon Ramsay,205,££££,4
4,Flat Iron,380,££,2
5,The Queens Arms,118,££,2
6,Padella,205,££,2
7,The Golden Chippy,108,££,2
8,Duck & Waffle,703,£££,3
9,The Grazing Goat,241,££,2
