### Web Scraping Lab

Welcome!  In today's lab we're going to build a web scraper that's going to build a dataset from restaurant listings on a yelp page.  

You can find the web page here: https://www.yelp.com/search?find_desc=Restaurants&find_loc=London%2C%20United%20Kingdom&ns=1

The lab questions listed herein will naturally build from the material discussed in class.

#### Step 1:  Scrape the number of reviews for each restaurant

Using a similar methodology we used to find the title of the restaurant find the number of reviews for the 30 restaurants listed on this web page.

**Hint:** the `isdigit()` string method will be helpful to detect if a string contains a number.

In [41]:
from bs4 import BeautifulSoup
import requests
i = 0
#for i in [0,30,60,90]:
req = requests.get(f"https://www.yelp.com/search?find_desc=Restaurants&find_loc=London%2C%20United%20Kingdom&ns=1&start={i}")
scraper = BeautifulSoup(req.text, 'html.parser')


In [42]:
restaurant_names = scraper('a', { 'class':'lemon--a__373c0__IEZFH','class': 'link__373c0__1G70M', 
                                 'class': 'link-color--inherit__373c0__3dzpk',
                                 'class': 'link-size--inherit__373c0__1VFlE'})
restaurant_names = [str(restaurant) for restaurant in restaurant_names]
restaurant_names = [restaurant.replace('</a>','') for restaurant in restaurant_names]
restaurant_names = [restaurant.split('>')[-1] for restaurant in restaurant_names]
restaurant_names = [restaurant for restaurant in restaurant_names if len(restaurant)>1 and restaurant != 'more']
all_restaurants = restaurant_names [:]
all_restaurants

['Dishoom',
 'Ffiona’s Restaurant',
 'Restaurant Gordon Ramsay',
 'The Golden Chippy',
 'The Mayfair Chippy',
 'Duck &amp; Waffle',
 'Flat Iron',
 'Mother Mash',
 'The Queens Arms',
 'Padella',
 'The Grazing Goat',
 'BAO - Soho',
 'Sketch',
 'Lanzhou Noodle Bar',
 'NOPI',
 'Piebury Corner',
 'Daebak',
 'Ye Olde Cheshire Cheese',
 'Chee Mc',
 'The Pig and Butcher',
 'Dishoom',
 'Korean Bbq and Vegan Restaurant',
 'Bailey’s Fish and Chips',
 'Pho House',
 'Dominique Ansel Treehouse',
 'Naru',
 'Wright Brothers - South Kensington',
 'Ken’s Fish Bar',
 'Savoir Faire',
 'Bibimbap']

In [21]:
num_reviews = scraper.find_all('span', {'class': 'lemon--span__373c0__3997G', 
                                        'class': 'text__373c0__2Kxyz',
                                        'class': 'reviewCount__373c0__2r4xT',
                                        'class': 'text-color--black-extra-light__373c0__2OyzO'})
num_reviews = [str(review) for review in num_reviews]
num_reviews = [review.replace('</span>','') for review in num_reviews]
num_reviews = [review.split('>')[-1] for review in num_reviews]
num_reviews = [int(review) for review in num_reviews if review.isdigit()]
all_reviews = num_reviews [:]
all_reviews

[277,
 1842,
 377,
 204,
 270,
 468,
 119,
 202,
 239,
 108,
 183,
 701,
 353,
 544,
 109,
 30,
 830,
 271,
 358,
 101,
 212,
 78,
 375,
 381,
 26,
 292,
 342,
 194,
 482,
 22]

In [23]:
import pandas as pd
dict1= {"restaurants": all_restaurants, "reviews": all_reviews}
print (dict1)
df=pd.DataFrame(dict1)
df

{'restaurants': ['The Mayfair Chippy', 'Dishoom', 'Flat Iron', 'Restaurant Gordon Ramsay', 'Ffiona’s Restaurant', 'Mother Mash', 'The Queens Arms', 'Padella', 'The Grazing Goat', 'The Golden Chippy', 'BAO - Soho', 'Duck &amp; Waffle', 'Ye Olde Cheshire Cheese', 'Dishoom', 'The Pig and Butcher', 'Gordon Ramsay Street Pizza', 'Sketch', 'NOPI', 'The Churchill Arms', 'Abeno', 'The Victoria', 'The Shed', 'Shoryu Ramen', 'Busaba Soho', 'The Colonel Fawcett', 'Burger &amp; Lobster', 'Hawksmoor Seven Dials', 'Savoir Faire', 'Yauatcha', 'London House by Gordon Ramsay'], 'reviews': [277, 1842, 377, 204, 270, 468, 119, 202, 239, 108, 183, 701, 353, 544, 109, 30, 830, 271, 358, 101, 212, 78, 375, 381, 26, 292, 342, 194, 482, 22]}


Unnamed: 0,restaurants,reviews
0,The Mayfair Chippy,277
1,Dishoom,1842
2,Flat Iron,377
3,Restaurant Gordon Ramsay,204
4,Ffiona’s Restaurant,270
5,Mother Mash,468
6,The Queens Arms,119
7,Padella,202
8,The Grazing Goat,239
9,The Golden Chippy,108


#### Step 2:  Find the price range for each restaurant

Let's create data for the price range of each restaurant as well, as denoted by the $ \unicode{x00A3} $ symbol listed on each website.

**Hint 1:** The information that you need for this was selected using the same criteria to find the number of reviews.

**Hint 2:** You type the $ \unicode{x00A3} $ symbol in python with the expression `\xA3`

In [26]:
price_ranges = scraper.find_all('span', {'class': 'lemon--span__373c0__3997G', 
                                        'class': 'text__373c0__2Kxyz',
                                        'class': 'reviewCount__373c0__2r4xT',
                                        'class': 'text-color--black-extra-light__373c0__2OyzO'})

price_ranges = [str(range_) for range_ in price_ranges]
price_ranges = [range_.replace('</span>', '') for range_ in price_ranges]
price_ranges = [range_.split('>')[1] for range_ in price_ranges]
price_ranges = [range_ for range_ in price_ranges if '\xA3' in range_]

#### Step 3:  Build a Dataframe For the Title, # of Ratings and Price Range of Each Restaurant

You will need to create a dictionary structured in the following way:

` {
    'Title': [list with the titles of each restaurant],
    'NumRatings': [list with the number of ratings of each restaurant],
    'Price Range': [list with the price range of each restaurant]
}`

In [29]:
dict2 = {
    'Name': all_restaurants,
    'NumReviews': all_reviews,
    'PriceRange': price_ranges
}

df = pd.DataFrame(dict2)
df

Unnamed: 0,Name,NumReviews,PriceRange
0,The Mayfair Chippy,277,££
1,Dishoom,1842,££
2,Flat Iron,377,££
3,Restaurant Gordon Ramsay,204,££££
4,Ffiona’s Restaurant,270,££
5,Mother Mash,468,££
6,The Queens Arms,119,££
7,Padella,202,££
8,The Grazing Goat,239,££
9,The Golden Chippy,108,££


### HOMEWORK 5 colomns and 100 rows

In [50]:
#<a class="lemon--a__373c0__IEZFH link__373c0__1G70M link-color--inherit__373c0__3dzpk link-size--default__373c0__7tls6" href="/search?cflt=french&amp;find_desc=Restaurants&amp;find_loc=London%2C+United+Kingdom" target="" name="" rel="" role="link">French</a>
#<a class="lemon--a__373c0__IEZFH link__373c0__1G70M link-color--inherit__373c0__3dzpk link-size--default__373c0__7tls6" href="/search?cflt=british&amp;find_desc=Restaurants&amp;find_loc=London%2C+United+Kingdom" target="" name="" rel="" role="link">British</a>

cuisine = scraper.find_all ('a',{'class': 'lemon--a__373c0__IEZFH link__373c0__1G70M',
                            'class': 'link-color--inherit__373c0__3dzpk', 
                            'class': 'link-size--default__373c0__7tls6',
                            'role': 'link',})
cuisine = [str(cat) for cat in cuisine]
cuisine = [cat.replace('</a>',"") for cat in cuisine]
cuisine = [cat.split('>')[-1] for cat in cuisine]
#cuisine = [cat for cat in cuisine if cat != 'See all' and len(cat)>1]
cuisine

['Yelp',
 'Fish &amp; Chips',
 'Indian',
 'Steakhouses',
 'French',
 'British',
 'British',
 'British',
 'British',
 'Pubs',
 'Gastropubs',
 'Italian',
 'Gastropubs',
 'British',
 'Fish &amp; Chips',
 'Taiwanese',
 'Modern European',
 'Bars',
 'British',
 'Pubs',
 'British',
 'Indian',
 'British',
 'Gastropubs',
 'Pizza',
 'Bars',
 'French',
 'Modern European',
 'Cocktail Bars',
 'Mediterranean',
 'Pubs',
 'Thai',
 'Japanese',
 'Pubs',
 'British',
 'British',
 'Japanese',
 'Thai',
 'Gastropubs',
 'Seafood',
 'Burgers',
 'British',
 'Steakhouses',
 'Cocktail Bars',
 'French',
 'Dim Sum',
 'Seafood',
 'Noodles',
 'Modern European']

In [10]:
where = scraper.find_all('p',{'class': 'lemon--p__373c0__3Qnnj',
                             'class': 'text__373c0__2Kxyz',
                             'class': 'text-color--black-extra-light__373c0__2OyzO', 
                             'class': 'text-align--right__373c0__1f0KI', 
                             'class': 'text-size--small__373c0__3NVWO'})
where = [str(loc) for loc in where]
where =

[<p class="lemon--p__373c0__3Qnnj text__373c0__2Kxyz text-color--blue-dark__373c0__1jX7S text-align--left__373c0__2XGa- text-weight--semibold__373c0__2l0fe text-size--small__373c0__3NVWO"><a class="lemon--a__373c0__IEZFH link__373c0__1G70M link-color--inherit__373c0__3dzpk link-size--default__373c0__7tls6" href="" name="" rel="button" role="button" target="">See all</a></p>,
 <p class="lemon--p__373c0__3Qnnj text__373c0__2Kxyz text-color--blue-dark__373c0__1jX7S text-align--left__373c0__2XGa- text-weight--semibold__373c0__2l0fe text-size--small__373c0__3NVWO"><a class="lemon--a__373c0__IEZFH link__373c0__1G70M link-color--inherit__373c0__3dzpk link-size--default__373c0__7tls6" href="" name="" rel="button" role="button" target="">See all</a></p>,
 <p class="lemon--p__373c0__3Qnnj text__373c0__2Kxyz text-color--blue-dark__373c0__1jX7S text-align--left__373c0__2XGa- text-weight--semibold__373c0__2l0fe text-size--small__373c0__3NVWO"><a class="lemon--a__373c0__IEZFH link__373c0__1G70M link