# __Scraping Restaurants in Delhi From EazyDiner__

***

![](https://i.imgur.com/ZAl12g3.png)

***

[EazyDiner](https://www.eazydiner.com/static/about-us) provides a guide to eating out that offers insider tips, discount, exclusive and expert reviews by top critics. The platform has all the answers for the most enjoyable, authentic and friction-free table booking experience, with over 10,000 restaurants in over 150 cities in India & Dubai.

***

![Imgur](https://i.imgur.com/CC2FfCF.png)

It also hosts list of top restaurants in a city and we will be using this list to scrape the top restaurants in the region [Delhi-NCR](https://www.eazydiner.com/restaurants?location=delhi-ncr&pax=2&total=281&page=1) using [_web scraping_](https://www.geeksforgeeks.org/what-is-web-scraping-and-how-to-use-it/), an automatic methord of obtaining large amounts of data from websites using coding languages such as `Python`,`C++`,`Node.js`,`Ruby`,`PHP` etc. 

For this project we will be using Python libraries [requests](https://pypi.org/project/requests/) and [Beautifulsoup4](https://pypi.org/project/beautifulsoup4/) for scraping data from this page.

***

#### Here is an outline of the steps we will follow

##### 1. Setting up the environment
##### 2.  Downloading the page using `requests` & parsing it with `BeautifulSoup` 
##### 3. Extracting restaurant information & appending to a dictionary
##### 4. Compiling the data from multiple pages into a single file using lists and dictionaries
##### 5. Exporting the data to a .CSV file

***

![Imgur](https://i.imgur.com/KwwD5KA.png)

__By the end of the project we would have created a .CSV file in this format__

***


Use the "Run" button to execute the code.

## Setting up the environment

We will be using the `requests` ,`BeautifulSoup` and `pandas` libraries in this project

In [1]:
# Requests will help us with fetching the HTML page from a website.

import requests

# Next,we will use BeautifulSoup to process the HTML formated text file for data extraction.

from bs4 import BeautifulSoup

# After the data has been parsed and stored, we will use pandas to extract it into a '.csv' file

import pandas as pd

#
import jovian

__The environment is all set up now and we can call any function from these libraries.__
***

## Downloading the page using `requests` & parsing it with `BeautifulSoup` 

Here we will define function get_page() that takes a URL as input and with the help of `requests` & `BeautifulSoup` returns a BS4 doc.

In [2]:
def get_page(url):
    
    # requests.get returns a response object containing the data from the web page.
    response = requests.get(url)
    
    # status_code is used to check if the request was successful and if it's not then we will raise an exception.
    if response.status_code != 200:
        
        # Exception will be raised if the status code is not 200
        raise Exception ("Unable to fetch page " + url)
    
    # At the end of function it will return a beautifulsoup doc
    return BeautifulSoup(response.text,'html.parser')

__Now the function can be called using `get_page(xyz.com)` and it will return a beautifulsoup doc.__
***

## Extracting restaurant information & Writing it in a dictionary

##### Here we will extract the data from the restaurant listings & append it to a dictionary 
Restaurant Name, Restaurant Location, Cost for two, Cuisines, Restaurant Rating, Restaurant Image, Link to restaurant page


***
__Inspect Element Page__
![Imgur](https://i.imgur.com/mNCLTg9.png)

__For parsing data from a HTML page using beautiful soup we will need the [CSS selectors](https://developer.mozilla.org/en-US/docs/Learn/CSS/Building_blocks/Selectors) of the elements, to get these selectors we will use the [inspect element](https://devmountain.com/blog/how-to-use-inspect-element-jump-into-what-makes-a-web-page-tick/) function available in the web browser.__

***




### Restaurant Listings

![Imgur](https://i.imgur.com/5zO4cT6.png)

The restaurant listings are inside a `div` tag with `padding-10 radius-4 bg-white restaurant margin-b-10` class

In [3]:
def get_restaurant_listings(doc):
    ''''''
    # Declaring a variable selector that contains class for name tag.
    selector = 'padding-10 radius-4 bg-white restaurant margin-b-10'
    
    # Returning the restaurant lising tags
    return  doc.find_all('div',class_=selector)

### Extract Restaurant Details

Similarly by using the inspect element we can find tags for all the fields


    
`restaurant_name = listing.find('h3',class_='grey res_name font-20 bold inline-block')`

`restaurant_location = listing.find('h3',class_='margin-t-5 res_loc')`

`cost_for_2 = listing.find('span',class_='padding-l-10 grey cost_for_two')`

`cusisine = listing.find('div',class_='grey padding-l-10 res_cuisine')`

`rating = listing.find('span',class_='critic')`

`restaurant_img = listing.find('img',class_='radius-4 res_name lazy')`

`restaurant_href = listing.find('a',class_='btn btn-primary height-40 block bold padding-10 font-14 apxor_click')`

In [4]:
def get_restaurant_data(listing,info,base_url):
    
    # Parsing the information from lising
    restaurant_name = listing.find('h3',class_='grey res_name font-20 bold inline-block')
    restaurant_location = listing.find('h3',class_='margin-t-5 res_loc')
    cost_for_2 = listing.find('span',class_='padding-l-10 grey cost_for_two')
    cusisine = listing.find('div',class_='grey padding-l-10 res_cuisine')
    rating = listing.find('span',class_='critic')
    restaurant_img = listing.find('img',class_='radius-4 res_name lazy')
    restaurant_href = listing.find('a',class_='btn btn-primary height-40 block bold padding-10 font-14 apxor_click')

    # Appending the extracted info to dictionary
    info['restaurant_names'].append(restaurant_name.text.strip() if restaurant_name else 'N/A')
    info['restaurant_locations'].append(restaurant_location.text.strip() if restaurant_location else 'N/A')
    info['costs_for_2'].append(cost_for_2.text[:-7] if cost_for_2 else 'N/A')
    info['cusisines'].append(cusisine.text.strip() if cusisine else 'N/A')
    info['ratings'].append(rating.text.strip() if rating else 'N/A')
    info['restaurant_imgs'].append(restaurant_img['data-src'] if restaurant_img else 'N/A')
    info['restaurant_pages'].append(base_url+restaurant_href['href'] if restaurant_href else 'N/A')
    
    return info

__We have defined 2 functions `get_restaurant_listings` & `get_restaurant_data` , first for getting listing tags form the page and second for extracting data from the listing and appending it to a dictionary.__
***

## Compiling the data from multiple pages into a single file using lists and dictionaries.

`intialize_dictionary()` will be used create a empty dictionary for every the data is scraped

`page_parser()` to get the data from every listing on the page

`website_scraper()` to get every page on the website

In [5]:
def intialize_dictionary():
    
    # Intitalizing a new dictionary for stong the values
    info = {
        'restaurant_names':[],
        'restaurant_locations':[],
        'costs_for_2':[],
        'cusisines':[],
        'ratings':[],
        'restaurant_imgs': [],
        'restaurant_pages':[]
    }
              
    return info

In [6]:
def page_parser(x,max_page,website_page_base_url,info,base_url):
    
    while True: # Keep the loop running till a break condition is met
        
        # Adding if condition to stop the function if the requested number of pages have been scraped
        if x > int(max_page):         
            print("Process completed!, No more data to scrape after page {}".format(x-1)) # printing a confirmation message
            break
        
        page_url = website_page_base_url + str(x) # Creating page url
        print("Scraping Page {}.".format(x))  # printing a confirmation message
        doc = get_page(page_url) # Calling function to get bs4 doc
        doc_listings = get_restaurant_listings(doc) # geting all the listing on the page
        
        if len(doc_listings) < 1: # Stop the loop if no data to scrape
            print("No more listings left to scrape, pages scraped successfully {}".format(x-1)) # printing a confirmation message
            break
        
        # Starting a for loop to get data from page
        for listing in doc_listings:           
            info = get_restaurant_data(listing,info,base_url) # Extracting data from all listing using a for loop

        print("{} listings scraped".format(len(info['restaurant_names']))) # printing a confirmation 
        print("Page {} completed \n".format(x)) # printing a confirmation 
        
        # Increasing the page number
        x = x + 1
        
    return info

In [7]:
def website_scraper():
    
    # Assigning values to variables that will be used in the function
    x = 1 # x is the starting page number
    base_url = "https://www.eazydiner.com"  
    website_page_base_url = base_url + '/restaurants?location=delhi-ncr&pax=2&total=281&page='
    
    # Intitalizing a new dictionary for stong the values
    info = intialize_dictionary()
    print("Initializing a new database \n")
        
    # Asking user for input
    max_page = input("Please enter the number of pages you want to scrape: ")
    
    # Calling previously defined function to get the data from the page
    info = page_parser(x,max_page,website_page_base_url,info,base_url)
    
    return info

## Exporting the data to a .CSV file

After parsing the data from the web page and storing it in a dictionary , we will use `pandas` to export it to a .csv file

In [8]:
def get_restaurant_csv():
    
    # Variable ourput calls the function get_restaurants() and stores it's value inside it.
    output = website_scraper()
    
    # Convering the dictionary outpt to a pandas dataframe
    df = pd.DataFrame(output)
    
    # Adding current date 
    
    # returning restaurants.csv file
    return df.to_csv('restaurants.csv', index=None),print("Task Completed")

In [9]:
Run_This_Function = get_restaurant_csv()

Initializing a new database 

Please enter the number of pages you want to scrape: 3
Scraping Page 1.
18 listings scraped
Page 1 completed 

Scraping Page 2.
36 listings scraped
Page 2 completed 

Scraping Page 3.
52 listings scraped
Page 3 completed 

Process completed!, No more data to scrape after page 3
Task Completed


***
![Final Result](https://i.imgur.com/Fh3D2Co.png)

You can find the output file in your jupyter directory.

## Final Result

Mounting restaurants.csv to a dataframe so it can be analyzed further.

In [10]:
# Reading the csv file and storing it in df dataframe.
df = pd.read_csv('restaurants.csv')

In [11]:
# Pring the top 5 rows of the data.
df.head()

Unnamed: 0,restaurant_names,restaurant_locations,costs_for_2,cusisines,ratings,restaurant_imgs,restaurant_pages
0,Lord of the Drinks,"Connaught Place (CP), Central Delhi",₹ 1800,"Chinese,European,Finger Food,Italian,North Ind...",4.2,https://d4t7t8y8xqo0t.cloudfront.net/resized/1...,https://www.eazydiner.com/delhi-ncr/lord-of-th...
1,Fifty9,"Radisson Blu Marina, New Delhi",₹ 2500,Multicuisine,4.2,https://d4t7t8y8xqo0t.cloudfront.net/resized/1...,https://www.eazydiner.com/delhi-ncr/fifty9-rad...
2,Kinbuck 2 Cafe & Bar,"Connaught Place (CP), Central Delhi",₹ 1200,"Chinese,Italian,Lebanese,Mexican,North Indian",4.2,https://d4t7t8y8xqo0t.cloudfront.net/resized/1...,https://www.eazydiner.com/delhi-ncr/kinbuck-2-...
3,Mist,"The Park, New Delhi",₹ 2100,Casual Eclectic,4.6,https://d4t7t8y8xqo0t.cloudfront.net/resized/1...,https://www.eazydiner.com/delhi-ncr/mist-the-p...
4,Pind Balluchi,"Netaji Subhash Place, North Delhi",₹ 1000,North Indian,4.1,https://d4t7t8y8xqo0t.cloudfront.net/resized/1...,https://www.eazydiner.com/delhi-ncr/pind-ballu...


## Summary

- The Scraping was done using Python libraries Requests, BeautifulSoup for extracting the data and Pandas for exporting it.

- Scrape multiple pages for Restaurant Name, Restaurant Location, Cost for two, Cuisines, Restaurant Rating, Restaurant Image, Link to restaurant page from any number of available pages.

- Parsed all the scraped data into a .csv file containing total of 284 rows and 7 columns for each restaurant.


## Future Updates

- Add more option such as select city
- Capture the reviews of these restaurants and perform analysis.
- Code optimization.
- Further documentation.

## References

- [https://www.eazydiner.com/](https://www.eazydiner.com/)

In [None]:
jovian.commit(files=['restaurants.csv'])

<IPython.core.display.Javascript object>