### Web Scrapping  

request- This is used to extract the HTML code from the given URL.

BeautifulSoup- Format and Scrap the data from the HTML.

### Steps
1 Identify URL

2 Inspect HTML code

3 Find the HTML tag for the element that you want to extract.

4 Write some code to scrap this data

In [1]:
# Loading required libraries

import numpy as np
import pandas as pd

import requests
from bs4 import BeautifulSoup

In [2]:
# Identify the URL

URL ='https://www.flipkart.com/search?q=mobiles&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off'

In [3]:
# Loading the WebPage in Memory using requests library

page = requests.get(URL)


In [4]:
# Check the Status Code of the Page

page.status_code

200

In [5]:
# Extracting the HTML Code of the WebPage

htmlCode = page.text
htmlCode

'<!doctype html><html lang="en"><head><link href="https://rukminim1.flixcart.com" rel="preconnect"/><link rel="stylesheet" href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app_modules.chunk.905c37.css"/><link rel="stylesheet" href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app.chunk.104e9a.css"/><meta http-equiv="Content-type" content="text/html; charset=utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=Edge"/><meta property="fb:page_id" content="102988293558"/><meta property="fb:admins" content="658873552,624500995,100000233612389"/><meta name="robots" content="noodp"/><link rel="shortcut icon" href="https://static-assets-web.flixcart.com/www/promos/new/20150528-140547-favicon-retina.ico"/><link type="application/opensearchdescription+xml" rel="search" href="/osdd.xml?v=2"/><meta property="og:type" content="website"/><meta name="og_site_name" property="og:site_name" content="Flipkart.com"/><link rel="apple-touch-icon" sizes="57x57" h

Lets identify the below mentioned features and based on them we will try to scrape out the relavant data from FlipKart website.

URL = '?'

Price = '?'

Rating = '?'

Title = '?'

Feature = '?'

URL = https://www.flipkart.com/search?q=mobiles&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off

In [6]:
# Format the HTML code using bs4 library

soup = BeautifulSoup(htmlCode)

In [7]:
help(soup)

Help on BeautifulSoup in module bs4 object:

class BeautifulSoup(bs4.element.Tag)
 |  BeautifulSoup(markup='', features=None, builder=None, parse_only=None, from_encoding=None, exclude_encodings=None, element_classes=None, **kwargs)
 |  
 |  A data structure representing a parsed HTML or XML document.
 |  
 |  Most of the methods you'll call on a BeautifulSoup object are inherited from
 |  PageElement or Tag.
 |  
 |  Internally, this class defines the basic interface called by the
 |  tree builders when converting an HTML/XML document into a data
 |  structure. The interface abstracts away the differences between
 |  parsers. To write a new tree builder, you'll need to understand
 |  these methods as a whole.
 |  
 |  These methods will be called by the BeautifulSoup constructor:
 |    * reset()
 |    * feed(markup)
 |  
 |  The tree builder may call these methods from its feed() implementation:
 |    * handle_starttag(name, attrs) # See note about return value
 |    * handle_endtag(n

In [8]:
print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <link href="https://rukminim1.flixcart.com" rel="preconnect"/>
  <link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app_modules.chunk.905c37.css" rel="stylesheet"/>
  <link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app.chunk.104e9a.css" rel="stylesheet"/>
  <meta content="text/html; charset=utf-8" http-equiv="Content-type"/>
  <meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
  <meta content="102988293558" property="fb:page_id"/>
  <meta content="658873552,624500995,100000233612389" property="fb:admins"/>
  <meta content="noodp" name="robots"/>
  <link href="https://static-assets-web.flixcart.com/www/promos/new/20150528-140547-favicon-retina.ico" rel="shortcut icon"/>
  <link href="/osdd.xml?v=2" rel="search" type="application/opensearchdescription+xml"/>
  <meta content="website" property="og:type"/>
  <meta content="Flipkart.com" name="og_site_name" property="og:site_name"/>
  <li

Price -> div class = _30jeq3 _1_WHN1

Rating -> div class = _3LWZlK

Title -> div class = _4rR01T

Feature List -> ul class = _1xgFaf

## find()

In [9]:
# Price

price = soup.find('div', attrs={'class' : '_30jeq3 _1_WHN1'})

print(price.text)

₹9,499


In [10]:
# Brand

title = soup.find('div', attrs={'class' : '_4rR01T'})

print(title.text)

SAMSUNG Galaxy F13 (Nightsky Green, 64 GB)


In [11]:
# Rating

rating = soup.find('div', attrs={'class' : '_3LWZlK'})

print(rating.text)

4.4


In [12]:
# Feature List

feature_list = soup.find('ul', attrs = {'class' : '_1xgFaf'})

print(feature_list.text)

4 GB RAM | 64 GB ROM | Expandable Upto 1 TB16.76 cm (6.6 inch) Full HD+ Display50MP + 5MP + 2MP | 8MP Front Camera6000 mAh Lithium Ion BatteryExynos 850 Processor1 Year Warranty Provided By the Manufacturer from Date of Purchase


## find all()

In [13]:
# Find All Prices

soup.find_all('div', attrs={'class' : '_30jeq3 _1_WHN1'})

[<div class="_30jeq3 _1_WHN1">₹9,499</div>,
 <div class="_30jeq3 _1_WHN1">₹9,499</div>,
 <div class="_30jeq3 _1_WHN1">₹41,990</div>,
 <div class="_30jeq3 _1_WHN1">₹9,499</div>,
 <div class="_30jeq3 _1_WHN1">₹42,990</div>,
 <div class="_30jeq3 _1_WHN1">₹57,990</div>,
 <div class="_30jeq3 _1_WHN1">₹35,990</div>,
 <div class="_30jeq3 _1_WHN1">₹36,990</div>,
 <div class="_30jeq3 _1_WHN1">₹6,666</div>,
 <div class="_30jeq3 _1_WHN1">₹8,999</div>,
 <div class="_30jeq3 _1_WHN1">₹41,990</div>,
 <div class="_30jeq3 _1_WHN1">₹7,777</div>,
 <div class="_30jeq3 _1_WHN1">₹8,249</div>,
 <div class="_30jeq3 _1_WHN1">₹8,249</div>,
 <div class="_30jeq3 _1_WHN1">₹8,249</div>,
 <div class="_30jeq3 _1_WHN1">₹8,999</div>,
 <div class="_30jeq3 _1_WHN1">₹8,999</div>,
 <div class="_30jeq3 _1_WHN1">₹6,666</div>,
 <div class="_30jeq3 _1_WHN1">₹8,999</div>,
 <div class="_30jeq3 _1_WHN1">₹8,999</div>,
 <div class="_30jeq3 _1_WHN1">₹35,990</div>,
 <div class="_30jeq3 _1_WHN1">₹6,110</div>,
 <div class="_30jeq3 _1_W

In [14]:
# Find All Ratings

soup.find_all('div', attrs={'class' : '_3LWZlK'})

[<div class="_3LWZlK">4.4</div>,
 <div class="_3LWZlK">4.4</div>,
 <div class="_3LWZlK">4.6<img class="_1wB99o" src="

In [15]:
price = soup.find('div', attrs = {'class' : '_30jeq3 _1_WHN1'})

print(price)

print(type(price))

print(price.text)

<div class="_30jeq3 _1_WHN1">₹9,499</div>
<class 'bs4.element.Tag'>
₹9,499


In [16]:
prices = soup.find_all('div', attrs = {'class' : '_30jeq3 _1_WHN1'})

print(prices)

print(type(prices))

print(type(prices[1]))

for tag in prices:
    print(tag.text)

[<div class="_30jeq3 _1_WHN1">₹9,499</div>, <div class="_30jeq3 _1_WHN1">₹9,499</div>, <div class="_30jeq3 _1_WHN1">₹41,990</div>, <div class="_30jeq3 _1_WHN1">₹9,499</div>, <div class="_30jeq3 _1_WHN1">₹42,990</div>, <div class="_30jeq3 _1_WHN1">₹57,990</div>, <div class="_30jeq3 _1_WHN1">₹35,990</div>, <div class="_30jeq3 _1_WHN1">₹36,990</div>, <div class="_30jeq3 _1_WHN1">₹6,666</div>, <div class="_30jeq3 _1_WHN1">₹8,999</div>, <div class="_30jeq3 _1_WHN1">₹41,990</div>, <div class="_30jeq3 _1_WHN1">₹7,777</div>, <div class="_30jeq3 _1_WHN1">₹8,249</div>, <div class="_30jeq3 _1_WHN1">₹8,249</div>, <div class="_30jeq3 _1_WHN1">₹8,249</div>, <div class="_30jeq3 _1_WHN1">₹8,999</div>, <div class="_30jeq3 _1_WHN1">₹8,999</div>, <div class="_30jeq3 _1_WHN1">₹6,666</div>, <div class="_30jeq3 _1_WHN1">₹8,999</div>, <div class="_30jeq3 _1_WHN1">₹8,999</div>, <div class="_30jeq3 _1_WHN1">₹35,990</div>, <div class="_30jeq3 _1_WHN1">₹6,110</div>, <div class="_30jeq3 _1_WHN1">₹6,110</div>, <di

In [17]:
ratings = soup.find_all('div', attrs={'class' : '_3LWZlK'})

# print(ratings)

for tag in ratings:
    print(tag.text)

4.4
4.4
4.6
4.4
4.6
4.7
4.6
4.6
4.3
4.3
4.6
4.3
4.5
4.5
4.5
4.6
4.6
4.3
4.3
4.3
4.6
4.4
4.4
4.4


In [18]:
ratings = soup.find('div', attrs={'class' : '_3LWZlK'})

print(ratings.text)

4.4


In [19]:
for i in range(1, 31):
    print('https://www.flipkart.com/search?q=mobiles&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page={}'. format(i))

https://www.flipkart.com/search?q=mobiles&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page=1
https://www.flipkart.com/search?q=mobiles&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page=2
https://www.flipkart.com/search?q=mobiles&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page=3
https://www.flipkart.com/search?q=mobiles&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page=4
https://www.flipkart.com/search?q=mobiles&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page=5
https://www.flipkart.com/search?q=mobiles&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page=6
https://www.flipkart.com/search?q=mobiles&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page=7
https://www.flipkart.com/search?q=mobiles&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page=8
https://www.flipkart.com/search?