# <center> Flipkart website scrapping

## Project targets
- scrap names and prices of labtops from Flipkart website (first page only)
- Create function to extract multible pages of the same website

- Scrape the **Flipkart** website to extract the **Name**, **Price**, and **Rating** of Laptops.
<br>
- Page link:  https://www.flipkart.com/search?q=laptops&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&as-pos=1&as-type=HISTORY

### Import libraries

In [1]:
import pandas as pd
from bs4 import BeautifulSoup as bs
import requests as req

### Request page url

In [2]:
url="https://www.flipkart.com/search?q=laptops&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&as-pos=1&as-type=HISTORY"

In [3]:
page = req.get(url)
page

<Response [200]>

## Extracting HTML content

In [4]:
# Extract html content (unorganised form)
html=page.content
# Extract html content (organised form)
soup = bs(html,"html.parser")

The data is usually nested in tags. So, we inspect the page to see, under which tag the data we want to scrape is nested.
- choose element to inspect
-  right click on the element and click on “Inspect”
-  “Browser Inspector Box”
- Search for **Tag** and **class** where the **data / information** we want to extract is nested


###  Names

In [5]:
names_scrap=soup.find_all("div",{"class":"_4rR01T"})
names_scrap[0].text, type(names_scrap)

('Lenovo IdeaPad Gaming 3 Core i5 11th Gen - (8 GB/512 GB SSD/Windows 10 Home/4 GB Graphics/NVIDIA GeFor...',
 bs4.element.ResultSet)

In [6]:
Names=[]
for name in range(len(names_scrap)):
    Names.append(names_scrap[name].text)

### Price

In [7]:
Price_scrap=soup.find_all("div",{"class":"_30jeq3 _1_WHN1"})
Price_scrap[0].text, len(Price_scrap)

('₹57,990', 24)

In [8]:
Price=[]
for i in range(len(Price_scrap)):
    Price.append(Price_scrap[i].text)

### Dataframe

In [9]:
df=pd.DataFrame()
df['Names']=Names
df['Price']=Price
print(df.shape)
df.head()

(24, 2)


Unnamed: 0,Names,Price
0,Lenovo IdeaPad Gaming 3 Core i5 11th Gen - (8 ...,"₹57,990"
1,Lenovo IdeaPad Gaming 3 Core i5 10th Gen - (8 ...,"₹54,990"
2,ASUS VivoBook 15 Core i3 10th Gen - (8 GB/1 TB...,"₹29,990"
3,ASUS VivoBook 15 (2022) Core i5 11th Gen - (8 ...,"₹48,990"
4,Lenovo IdeaPad 3 Core i3 10th Gen - (8 GB/256 ...,"₹35,490"


## <center> Scraping Multible pages

In [15]:
def tot_url(page_limit):
    soup_list=[]
    for i in range(1,page_limit+1):
        url = f"https://www.flipkart.com/search?q=laptops&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&as-pos=1&as-type=HISTORY&page={i}"
        page = req.get(url)
        html=page.content
        soup = bs(html, "lxml")
        soup_list.append(soup)
    print(len(soup_list))
    return soup_list

In [18]:
def scrap(soup_list,tag_name,class_name):
    scrap=[]
    for soup in soup_list:
        R_scrap=soup.find_all(tag_name,{"class":class_name})
        for i in range(len(R_scrap)):
            scrap.append(R_scrap[i].text)
    return scrap

In [19]:
soup_list=tot_url(10)
Name_scrap=scrap(soup_list,"div","_4rR01T")
Price_scrap=scrap(soup_list,"div","_30jeq3 _1_WHN1")
Name_scrap[0],Price_scrap[0]

10


('acer Extensa 15 Core i3 11th Gen - (4 GB/256 GB SSD/Windows 11 Home) EX215-54 Thin and Light Laptop',
 '₹31,990')

In [20]:
Names=[]
Price=[]
for i in range(len(Name_scrap)):
    Names.append(Name_scrap[i])
    Price.append(Price_scrap[i])

In [21]:
df=pd.DataFrame()
df['Names']=Names
df['Price']=Price
print(df.shape)
df.head()

(240, 2)


Unnamed: 0,Names,Price
0,acer Extensa 15 Core i3 11th Gen - (4 GB/256 G...,"₹31,990"
1,acer Aspire 3 Ryzen 3 Dual Core 3250U - (8 GB/...,"₹35,990"
2,ASUS VivoBook 15 Core i3 10th Gen - (8 GB/1 TB...,"₹29,990"
3,ASUS VivoBook 15 (2022) Core i5 11th Gen - (8 ...,"₹48,990"
4,Lenovo IdeaPad 3 Core i3 10th Gen - (8 GB/256 ...,"₹35,490"


## sources
- Tutorial of multible page scrapping
https://data36.com/scrape-multiple-web-pages-beautiful-soup-tutorial/
- A guide to web scraping in Python using Beautiful Soup
https://opensource.com/article/21/9/web-scraping-python-beautiful-soup
- tutorial of web scraping
https://getpocket.com/read/3581062331

### Summary of important steps
    1-Request url of page 
        -page = req.get(url)
    2-getting the html of page (unorganised formate)
        -html=page.content
    3- getting soup of page (organised html)
        -soup = bs(html,"html.parser")
    4- inspect the page to know which tag and class related to your data
    5-Extract your data
        -names_scrap=soup.find_all("div",{"class":"_4rR01T"})
        -names_scrap[0].text
    6- loop for all elements 
        -