There are several ways to extract data from the web. A few websites provide with an API to access data in a structured format. However not all websites provide an API. What will we do in such cases where we do not have an API? So in these cases we can extract information from the web. This technique of extracting information from the web is known as Web Scraping. The idea behind Web Scraping is transformation of unstructured data (HTML format) on the web into structured data 

To extract data from the web by scraping following are the steps to be followed:
<li>Retrieving HTML data from a domain name</li> 
<li>Parsing that data for target information</li>
<li>Storing the target information</li>
<li>Optionally, moving to another page to repeat the process</li>

In [2]:
import requests               # requests will make a GET Request to a web server which will download the contents of HTML for us.
from bs4 import BeautifulSoup # package for parsing HTML documents
import sched
import time
import datetime
import csv
import pandas as pd

In [3]:
scheduler = sched.scheduler(time.time, time.sleep)

Function Name : scrap_chron_job<br>
Functionality : Extracts the data we want from the web and schedules the extraction at a particular time<br>
Parameter : url of the page,whose data we want to extract from the web 

In [5]:
def scrap_chron_job(url):
    
    # Make a GET request to a web server
    
    requested = requests.get(url) 
    
    #Download the content of HTML
    
    data = requested.text
    
    # Parse the html using beautiful soup and store in variable soup
    
    soup = BeautifulSoup(data, 'html.parser')
    
    # Extract the target information via Inspect Element
    
    price = soup.find(class_='_3iZgFn').find(class_="_2i1QSc")
    
    final_price = price.text
    
    # Target information which we extracted here is final_price
    
    print(final_price)
    
    # Writes a file named price.csv
    from datetime import datetime
    with open('price.csv', 'a') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow([final_price, datetime.now()])
    
    #haults the flow of code for a period of time
    
    time.sleep(0.05)
    
    # Schedules an event
    scheduler.enter(10, 1, scrap_chron_job, (url,))
    scheduler.enter(0, 1, scrap_chron_job, (url,))
    
    # Runs all scheduled events
    scheduler.run()



In [6]:
url = 'https://www.flipkart.com/apple-iphone-x-silver-64-gb/p/itmexrgvze5as67e?pid=MOBEXRGVF8NHMGXJ&srno=b_1_1&otracker=browse&lid=LSTMOBEXRGVF8NHMGXJSD2QYW&fm=organic&iid=e700e0d0-b9e1-4d83-be85-c7af8927c2fe.MOBEXRGVF8NHMGXJ.SEARCH'
scrap_chron_job(url)

₹89,000
₹89,000
₹89,000
₹89,000
₹89,000
₹89,000
₹89,000
₹89,000
₹89,000
₹89,000
₹89,000
₹89,000
₹89,000
₹89,000
₹89,000
₹89,000
₹89,000


KeyboardInterrupt: 

Word of caution :
<li>The layout of a website may change from time to time, so make sure to revisit the site and rewrite your code as needed</li>
<li>Do check the terma and conditions of the website regarding the usage of the data. Usually the data you scrape should not be used for commercial purpose</li>