## How Do You Scrape Data From A Website?

When you run the code for web scraping, a request is sent to the URL that you have mentioned. As a response to the request, the server sends the data and allows you to read the HTML or XML page. The code then, parses the HTML or XML page, finds the data and extracts it.

To extract data using web scraping with python, you need to follow these basic steps:

_**1) Find the URL that you want to scrape.**_

_**2) Inspecting the Page.**_

_**3) Find the data you want to extract.**_

_**4) Write the code.**_

_**5) Run the code and extract the data.**_

_**6) Store the data in the required format.**_

Now let us see how to extract data from the Flipkart website using Python.

## Libraries used for Web Scraping:

As we know, Python is has various applications and there are different libraries for different purposes. In our further demonstration, we will be using the following libraries:

#### 1) Selenium:
Selenium is a web testing library. It is used to automate browser activities.

#### 2) BeautifulSoup:
Beautiful Soup is a Python package for parsing HTML and XML documents. It creates parse trees that is helpful to extract the data easily.

#### 3) Pandas:
Pandas is a library used for data manipulation and analysis. It is used to extract the data and store it in the desired format.

So, with this, we will get started with writing our own code for **Laptops for gaming under 80000/-** on Flipkart website ✌️

"https://www.flipkart.com/search?q=best%20laptops%20under%2080000&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"

### Install bs4:

In [1]:
pip install bs4

Note: you may need to restart the kernel to use updated packages.


### Let's Import all the libraries:

In [2]:
import urllib.request
from bs4 import BeautifulSoup
from urllib.request import urlopen

### Input the URL to be scrapped:

In [3]:
url = "https://www.flipkart.com/search?q=best%20laptops%20under%2080000&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"

Use urlopen to open the url and read the details in it.

In [4]:
page = urlopen(url)
page_html = page.read()
page.close()
page_soup = BeautifulSoup(page_html, "html.parser")

Checking for all the containers available.

In [5]:
containers = page_soup.findAll("div", { "class": "_2kHMtA"})
print(len(containers))

24


Using the BeautifulSoup, get all the parsed tree into a nicely formatted Unicode string, with a separate line for each tag and each string

In [6]:
print(BeautifulSoup.prettify(containers[3]))

<div class="_2kHMtA">
 <a class="_1fQZEK" href="/hp-pavilion-x360-2023-intel-core-i5-13th-gen-16-gb-1-tb-ssd-windows-11-home-14-ek1010tu-thin-light-laptop/p/itm9f3a53215418f?pid=COMGNJMCCHSS98VZ&amp;lid=LSTCOMGNJMCCHSS98VZVUTRZQ&amp;marketplace=FLIPKART&amp;q=best+laptops+under+80000&amp;store=search.flipkart.com&amp;srno=s_1_4&amp;otracker=search&amp;otracker1=search&amp;fm=organic&amp;iid=a46bda4a-ca36-4ad6-b471-b51fd1253b0b.COMGNJMCCHSS98VZ.SEARCH&amp;ppt=None&amp;ppn=None&amp;ssid=pwfb0i1xy80000001686398536938&amp;qH=cdff68aec3053e4d" rel="noopener noreferrer" target="_blank">
  <div class="MIXNux">
   <div class="_2QcLo-">
    <div>
     <div class="CXW8mj" style="height:200px;width:200px">
      <img alt="HP Pavilion x360 (2023) Intel Core i5 13th Gen - (16 GB/1 TB SSD/Windows 11 Home) 14-ek1010TU Thin and..." class="_396cs4" loading="eager" src="https://rukminim1.flixcart.com/image/312/312/xif0q/computer/d/9/g/-original-imagp7pgh3rbnwbh.jpeg?q=70"/>
     </div>
    </div>
   </d

### Get the Name of the produt:

In [7]:
container = containers[4]
prod_name = container.div.img["alt"]
print(prod_name)

HP Pavilion Plus Creator OLED Eyesafe (2023) Intel H-Series Core i5 12th Gen - (16 GB/512 GB SSD/Windo...


### Get the Original Price of the product:

In [8]:
original_price = container.findAll("div", {"class": "_3I9_wc _27UcVY"})
print(original_price[0].text)

₹99,533


### Get the Discount Percentage on the product:

In [9]:
discount_percent = container.findAll("div", {"class": "_3Ay6Sb"})
print(discount_percent[0].text)

19% off


### Get the Discounted Price of the product:

In [10]:
discounted_price = container.findAll("div", {"class": "_30jeq3 _1_WHN1"})
print(discounted_price[0].text)

₹79,990


### Get the Product Ratings:

In [11]:
prod_ratings = container.findAll("span", {"class": "_1lRcqv"})
print(prod_ratings[0].text)

4.6


### Get the Number of Product Reviews:

In [12]:
reviews = container.findAll("span", {"class" : "_2_R_DZ"})
print(reviews[0].text)

18 Ratings & 3 Reviews


### Make a csv file and then edit in it the details:

### Getting the Detailes in Summary Form:

In [13]:
for container in containers[:5]:
    product_name = container.findAll("div", {"class": "_4rR01T"})
    if product_name:
        prod_name = product_name[0].text.strip()
    else:
        prod_name = "N/A"

    original_price = container.findAll("div", {"class": "_3I9_wc _27UcVY"})
    if original_price:
        original = original_price[0].text.strip()
    else:
        original = "N/A"

    discount_percent = container.findAll("div", {"class": "_3Ay6Sb"})
    if discount_percent:
        percent = discount_percent[0].text.strip()
    else:
        percent = "N/A"

    discounted_price = container.findAll("div", {"class": "_30jeq3 _1_WHN1"})
    if discounted_price:
        discount = discounted_price[0].text.strip()
    else:
        discount = "N/A"

    rating_container = container.findAll("span", {"class": "_1lRcqv"})
    if rating_container:
        prod_rating = rating_container[0].text.strip()
    else:
        prod_rating = "N/A"

    reviews_container = container.findAll("span", {"class": "_2_R_DZ"})
    if reviews_container:
        reviews_rating = reviews_container[0].text
    else:
        reviews_rating = "N/A"

    print("\033[1mProduct Name:\n" + '\033[0m' + str(prod_name), "\n")
    print("\033[1mOriginal Price:\n" + '\033[0m' + str(original), "\n")
    print("\033[1mDiscount Percentage:\n" + '\033[0m' + str(percent), "\n")
    print("\033[1mDiscounted Price:\n" + '\033[0m' + str(discount), "\n")
    print("\033[1mRatings:\n" + '\033[0m' + prod_rating, "\n")
    print("\033[1mNumber of Reviews:\n" + '\033[0m' + reviews_rating, "\n")
    print("------------------------------------------------------------------------------------------------------------------")

[1mProduct Name:
[0mAPPLE 2020 Macbook Air M1 - (8 GB/256 GB SSD/Mac OS Big Sur) MGN63HN/A 

[1mOriginal Price:
[0m₹99,900 

[1mDiscount Percentage:
[0m21% off 

[1mDiscounted Price:
[0m₹77,990 

[1mRatings:
[0m4.7 

[1mNumber of Reviews:
[0m10,018 Ratings & 869 Reviews 

------------------------------------------------------------------------------------------------------------------
[1mProduct Name:
[0mAPPLE 2020 Macbook Air M1 - (8 GB/256 GB SSD/Mac OS Big Sur) MGN93HN/A 

[1mOriginal Price:
[0m₹99,900 

[1mDiscount Percentage:
[0m21% off 

[1mDiscounted Price:
[0m₹77,990 

[1mRatings:
[0m4.7 

[1mNumber of Reviews:
[0m10,018 Ratings & 869 Reviews 

------------------------------------------------------------------------------------------------------------------
[1mProduct Name:
[0mMSI Core i5 12th Gen - (16 GB/512 GB SSD/Windows 11 Home/4 GB Graphics/NVIDIA GeForce RTX 3050) Katana... 

[1mOriginal Price:
[0m₹95,990 

[1mDiscount Percentage:
[0m20% off

### Insight:

These are the first 5 laptops you scrapped from the URL used.