# **WEB SCRAPING**

<p><h3>Web scraping is a powerful technique used for extracting data from websites. Here, we will use some popular libraries for web scraping like, <code>requests</code>, <code>BeautifulSoup</code>, and <code>selenium</code>.</h3></p>

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

import datetime

import smtplib #This library is used to send an email

<p><h3>The web scraping is being performed on Address Guru website, link in <code>baseurl</code> </h3><p>

In [2]:
baseurl = 'https://www.addressguru.in/'

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/117.0"
}

## *Scraping the Cafe & Restaurant Category*

In [3]:
r =requests.get('https://www.addressguru.in/Cafe-&-Restaurants/Dehradun/MTM=')

soupy = BeautifulSoup(r.content, 'lxml')
soup = BeautifulSoup(soupy.prettify(), 'lxml')

product_list = soup.find_all('div', class_ = 'search-top')

print(len(product_list))

23


In [4]:
product_link = []

#Range of 3 pages
for x in range(1,3):
    #Format with a f-string
    r = requests.get(f"https://www.addressguru.in/Cafe-&-Restaurants/Dehradun/MTM=?page={x}", headers = headers)
    
    soupy = BeautifulSoup(r.content, 'lxml')
    soup = BeautifulSoup(soupy.prettify(), 'lxml')
    
    product_list = soup.find_all('div', class_ = 'search-top')
    
    for item in product_list:
        for link in item.find_all('a', href = True):
            product_link.append(baseurl + link['href'])

print(len(product_link))

200


### **Test Page**

In [5]:
testlink = 'https://www.addressguru.in/my-wife-s-place'

r = requests.get(testlink, headers = headers)

soup1 = BeautifulSoup(r.content, 'lxml')
soup2 = BeautifulSoup(soup1.prettify(), 'lxml')

name = soup2.find('h1', style = "margin-top:10px;font-size:25px;").get_text().strip()

rating = soup2.find('span', style = "font-size:16px!important;").get_text().strip()[:1]

review = soup2.find('span', style = "font-size:16px!important;").get_text().strip()[-13:]

day = datetime.date.today()

CafeAndRestaurant = {
    'name': name,
    'rating': rating,
    'review': review,
    'date': day
}

print(name, rating, review, day)

My Wife's Place 5 ( 2 Reviews ) 2023-09-27


## **Finalise Category Scraping**

<p>Collecting all data through the product links and converting into a <code>DataFrame</code> and saving as a spreadsheet file.</p>

In [None]:
#CafeAndRestaurant = []

for link in product_link:
    r = requests.get(link, headers = headers)

    soup1 = BeautifulSoup(r.content, 'lxml')
    soup2 = BeautifulSoup(soup1.prettify(), 'lxml')

    name = soup2.find('h1', style = "margin-top:10px;font-size:25px;").get_text().strip()

    rating = soup2.find('span', style = "font-size:16px!important;").get_text().strip()[:1]

    review = soup2.find('span', style = "font-size:16px!important;").get_text().strip()[-13:]
    
    day = datetime.date.today()

    CafeAndRestaurant = {
        'name': name,
        'rating': rating,
        'review': review,
        'date': day
    }
    
    print(CafeAndRestaurant)