Part One - Data Preparation Module:
Using the website from Yelp: https://www.yelp.com/biz/eleven-madison-park-new-york?osq=Eleven+Madison+Park

For this project, we used the website Yelp to scrape data for a restaurant called Eleven Madison Park. The information we retrieved is the following: name, address, phone number, website, operating hours, a summary of the restaurant, the most popular dishes and their pictures, the number of reviews, and the top 10 reviews. We stored all the information in a dictionary data type, which we later converted to a JSON file.

1. Read URL by using beautifulsoup and requests

In [1]:
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests 

In [2]:
yelp_link = requests.get('https://www.yelp.com/biz/eleven-madison-park-new-york')
yelp_link_text = yelp_link.text

soup = BeautifulSoup(yelp_link_text,'html.parser')
# print(soup.prettify())

2. Creating a empty dictionnary to store the data

In [34]:
restaurant_info = {}

3. Finding the restaurant's name

In [35]:
restaurant_name=soup.find_all('h1')
for c in restaurant_name:
    bus_name = c.get_text()
print(bus_name)
restaurant_info['Restaurant name'] = bus_name
print(restaurant_info)

Eleven Madison Park
{'Restaurant name': 'Eleven Madison Park'}


4. Getting the phone number

In [36]:
all_p_tags=soup.find_all('p', class_='css-1p9ibgf')
phone_num = all_p_tags[16].get_text()
print(phone_num)

restaurant_info['Phone number'] = phone_num
print(restaurant_info)

(212) 889-0905
{'Restaurant name': 'Eleven Madison Park', 'Phone number': '(212) 889-0905'}


5. Getting the address

In [37]:
get_location = soup.find_all('span',class_='raw__09f24__T4Ezm')

bus_address = get_location[0].text+', ' + get_location[1].text #revise here to make it looks like a sentence
print(bus_address)
restaurant_info['Address'] = bus_address
print(restaurant_info)

11 Madison Ave, New York, NY 10010
{'Restaurant name': 'Eleven Madison Park', 'Phone number': '(212) 889-0905', 'Address': '11 Madison Ave, New York, NY 10010'}


6. Getting the operating hours

In [38]:
# using select method to find where it located.
OH_list = soup.select(
        '#location-and-hours > section > div.arrange__09f24__LDfbs.gutter-4__09f24__dajdg.border-color--default__09f24__NPAKY > div.arrange-unit__09f24__rqHTg.arrange-unit-fill__09f24__CUubG.border-color--default__09f24__NPAKY > div > div > table > tbody > tr.css-29kerx')

# create a new dictionary for operating hours
OH_dict = {}

# using for loop to get the hours and weekdays by using enumerate function to get monday-sunday

for index, ele in enumerate(OH_list):
    if index % 2 == 0: #there is a blank line in the webpage
            continue
    title = ele.select_one('th > p').get_text()
    value = ele.select_one('td > ul > li > p').get_text()
    OH_dict[title] = value
restaurant_info['Operating Hours'] = OH_dict
print(restaurant_info)

{'Restaurant name': 'Eleven Madison Park', 'Phone number': '(212) 889-0905', 'Address': '11 Madison Ave, New York, NY 10010', 'Operating Hours': {'Mon': '5:30 PM - 10:00 PM', 'Tue': '5:30 PM - 10:00 PM', 'Wed': '5:30 PM - 10:00 PM', 'Thu': '5:30 PM - 10:30 PM', 'Fri': '5:30 PM - 10:30 PM', 'Sat': '5:30 PM - 10:30 PM', 'Sun': 'Closed'}}


7. Getting the restaurant's website 

In [39]:
y = soup.find_all('a',class_='css-1um3nx')
webpage_link = y[1].text  # I revise here from 2 to 1.
webpage_link = webpage_link[:-1] + 'om' #using replace method.
print(webpage_link)
restaurant_info['Website'] = webpage_link

http://www.elevenmadisonpark.com


8. Getting the most popular dishes and their pictures

In [42]:
pic = soup.find_all('img',class_='dishImageV2__09f24__VT6Je')
pictures = {}
for i in range(0,len(pic)) :
    pictures[pic[i].get('alt')] = pic[i].get('src')
restaurant_info['Most Popular Dishes Pictures'] = pictures
print(restaurant_info)

{'Restaurant name': 'Eleven Madison Park', 'Phone number': '(212) 889-0905', 'Address': '11 Madison Ave, New York, NY 10010', 'Operating Hours': {'Mon': '5:30 PM - 10:00 PM', 'Tue': '5:30 PM - 10:00 PM', 'Wed': '5:30 PM - 10:00 PM', 'Thu': '5:30 PM - 10:30 PM', 'Fri': '5:30 PM - 10:30 PM', 'Sat': '5:30 PM - 10:30 PM', 'Sun': 'Closed'}, 'Website': 'http://www.elevenmadisonpark.com/', 'Most Popular Dishes Pictures': {'Foie Gras': 'https://s3-media0.fl.yelpcdn.com/bphoto/uFegKwI_fHYPs7e3quXwHQ/258s.jpg', 'Take Home Granola': 'https://s3-media0.fl.yelpcdn.com/bphoto/b3T3Pu4_SVxqbm4qUy2BmA/258s.jpg', 'Black Truffle': 'https://s3-media0.fl.yelpcdn.com/bphoto/X6aHF0ozGRmJ1KjK4BiErQ/258s.jpg', 'Picnic Basket': 'https://s3-media0.fl.yelpcdn.com/bphoto/8N-qcarXdUBDsolCP6ayxQ/258s.jpg', 'Carrot Tartare': 'https://s3-media0.fl.yelpcdn.com/bphoto/gsPhfeFETNHfeF2g4BeAIw/258s.jpg', 'Smoked Sturgeon': 'https://s3-media0.fl.yelpcdn.com/bphoto/qANd9DsqmLoF6Wie30MZqg/258s.jpg', 'Eggs Benedict': 'https://

9. Getting the nearby restaurants

In [10]:
# new website address:
bars_near_by = urlopen('https://www.yelp.com/search?cflt=bars&find_near=eleven-madison-park-new-york')
bars_soup = BeautifulSoup(bars_near_by, 'html.parser')
bars_list=bars_soup.find_all('a', class_="css-1m051bw")
bar_name = []
for name in bars_list:
  bar_name.append(name.get_text())
near_bar_name = bar_name[3:]
print(near_bar_name)
restaurant_info['Nearby Bar'] = near_bar_name

['Chapel Bar', 'Thyme Bar', 'Lobby Bar', 'Undercote', 'The Sentry - Flatiron', 'Swingers NoMad', 'Apotheke Nomad', 'Patent Pending', 'Harding’s', 'Ampersand']


9. Getting the business summary 

In [46]:
rest_about = soup.find_all('section',class_='margin-t4__09f24__G0VVf padding-t4__09f24__Y6aGL border--top__09f24__exYYb border-color--default__09f24__NPAKY')
#find_all('div',class_=' border-color--default__09f24__NPAKY')

about_the_business = rest_about[3].find('span').text
print(about_the_business)
restaurant_info['About restaurant'] = about_the_business[:-1] # there is a '...' at the end


Established in 1998. Eleven Madison Park embodies an urbane sophistication serving Chef Daniel Humm's modern, sophisticated French cuisine that emphasizes purity, simplicity and seasonal flavors and ingredients.  Daniel's delicate and precise cooking style is experienced through a constantly evolving menu. The restaurant's dramatically high ceilings and magnificent art deco dining room offer guests lush views of historic Madison Square Park and the Flatiron building. In addition to the main dining room, guests may also enjoy wine, beer, and cocktails, as well as an extensive bar menu in the restaurant's bar and Flatiron Lounge.
In November 2008, Eleven Madison Park was designated Grand Chef Relais & Châteaux, joining the ranks of one of the world's most exclusive associations of hotels and gourmet restaurants. In 2009, Eleven Madison Park received a Four Star Review from The New York Times. The restaurant was also awarded one Michelin star.…


10. Getting the number of reviews

In [48]:
rev = soup.find_all('a',class_='css-1m051bw')

reviews = rev[0].text
reviews = reviews.split()
reviews = int(reviews[0])
restaurant_info['Number of Reviews'] = reviews
print(restaurant_info)

{'Restaurant name': 'Eleven Madison Park', 'Phone number': '(212) 889-0905', 'Address': '11 Madison Ave, New York, NY 10010', 'Operating Hours': {'Mon': '5:30 PM - 10:00 PM', 'Tue': '5:30 PM - 10:00 PM', 'Wed': '5:30 PM - 10:00 PM', 'Thu': '5:30 PM - 10:30 PM', 'Fri': '5:30 PM - 10:30 PM', 'Sat': '5:30 PM - 10:30 PM', 'Sun': 'Closed'}, 'Website': 'http://www.elevenmadisonpark.com/', 'Most Popular Dishes Pictures': {'Foie Gras': 'https://s3-media0.fl.yelpcdn.com/bphoto/uFegKwI_fHYPs7e3quXwHQ/258s.jpg', 'Take Home Granola': 'https://s3-media0.fl.yelpcdn.com/bphoto/b3T3Pu4_SVxqbm4qUy2BmA/258s.jpg', 'Black Truffle': 'https://s3-media0.fl.yelpcdn.com/bphoto/X6aHF0ozGRmJ1KjK4BiErQ/258s.jpg', 'Picnic Basket': 'https://s3-media0.fl.yelpcdn.com/bphoto/8N-qcarXdUBDsolCP6ayxQ/258s.jpg', 'Carrot Tartare': 'https://s3-media0.fl.yelpcdn.com/bphoto/gsPhfeFETNHfeF2g4BeAIw/258s.jpg', 'Smoked Sturgeon': 'https://s3-media0.fl.yelpcdn.com/bphoto/qANd9DsqmLoF6Wie30MZqg/258s.jpg', 'Eggs Benedict': 'https://

11. Getting the top 10 reviews

In [51]:
comm = soup.find_all('p',class_='comment__09f24__gu0rG css-qgunke')
comments = []
for i in range(0,len(comm)):
    comments.append(comm[i].text)
    
print(comments)   
restaurant_info['Top 10 reviews'] = comments

["I was most excited for this experience during my visit to NYC and I was not disappointed. \xa0Completely plant-based menu in very creative ways.We started out with a few cocktails from their seasonal menu. \xa0I had the pear, and pumpkin seed cocktail. \xa0Very unique and I appreciate them using seasonal ingredients.The hot pot broth was so flavor possibly one of my favorites. \xa0Other dishes were Rice porridge, Tonburi with chia seed and potato, and Bok Choy with white truffle. \xa0The mushroom 'steak' was charred and was very well seasoned. \xa0We were also served pumpkin seed bread. \xa0It was so flaky and warm. \xa0We were also able to get a second serving.For dessert, we had apple and apple cider donuts and chocolate pretzels. \xa0Nothing too impressive but I was also very full at that point.The dinner took about 3 hours and included a kitchen tour! \xa0Service was outstanding and very well trained on the food. \xa0All in all, the dinner cost a pretty penny but was a once in a 

12. Printing the information

In [52]:
print(restaurant_info)

{'Restaurant name': 'Eleven Madison Park', 'Phone number': '(212) 889-0905', 'Address': '11 Madison Ave, New York, NY 10010', 'Operating Hours': {'Mon': '5:30 PM - 10:00 PM', 'Tue': '5:30 PM - 10:00 PM', 'Wed': '5:30 PM - 10:00 PM', 'Thu': '5:30 PM - 10:30 PM', 'Fri': '5:30 PM - 10:30 PM', 'Sat': '5:30 PM - 10:30 PM', 'Sun': 'Closed'}, 'Website': 'http://www.elevenmadisonpark.com/', 'Most Popular Dishes Pictures': {'Foie Gras': 'https://s3-media0.fl.yelpcdn.com/bphoto/uFegKwI_fHYPs7e3quXwHQ/258s.jpg', 'Take Home Granola': 'https://s3-media0.fl.yelpcdn.com/bphoto/b3T3Pu4_SVxqbm4qUy2BmA/258s.jpg', 'Black Truffle': 'https://s3-media0.fl.yelpcdn.com/bphoto/X6aHF0ozGRmJ1KjK4BiErQ/258s.jpg', 'Picnic Basket': 'https://s3-media0.fl.yelpcdn.com/bphoto/8N-qcarXdUBDsolCP6ayxQ/258s.jpg', 'Carrot Tartare': 'https://s3-media0.fl.yelpcdn.com/bphoto/gsPhfeFETNHfeF2g4BeAIw/258s.jpg', 'Smoked Sturgeon': 'https://s3-media0.fl.yelpcdn.com/bphoto/qANd9DsqmLoF6Wie30MZqg/258s.jpg', 'Eggs Benedict': 'https://

13. Converting the dictionnary into a JSON file

In [53]:
import json
with open("RestaurantReview.json",'w') as writeJSON:
    json.dump(restaurant_info,writeJSON)