# Flipkart Web Scraping
### Project Type - Web Scraping Data Analysis
### Contribution - Individual
### Name - Nikhil Dasgupta

__- Project Overview:__
In this web scraping project, I chose a diverse set of websites ranging from e-commerce platforms. The variety of data sources allowed me to explore different techniques, challenges, and ethical considerations involved in web scraping.

__- Tools and Technologies:__
Python,BeautifulSoup,tqdm,Pandas,Matplotlib and Seaborn.

__- Project Workflow:__

    1. Planning: I started by defining the objectives of the project and identifying the websites that would provide the most relevant and interesting data.

    2. Data Collection: Using Python , I created custom web crawlers to navigate through the websites and extract the required data, including text, images, and metadata.

    3. Data Cleaning: After scraping the data, I processed it using BeautifulSoup and Pandas to clean and preprocess the information. This step involved handling missing values, removing duplicates, and converting data into a consistent format.

    4. Data Analysis: With the cleaned data, I performed exploratory data analysis (EDA) to gain initial insights into the data and identify any notable patterns or correlations.

    5. Data Visualization: Utilizing Matplotlib and Seaborn, I created visualizations to present the findings in a clear and understandable manner.

    6. Ethical Considerations: Throughout the project, I ensured that the web scraping was conducted ethically and respectfully, adhering to the website's terms of service and not causing any harm or disruption.

# Let's Begin

![Thumbnail.PNG](attachment:Thumbnail.PNG)

__Webiste link :__
https://www.flipkart.com/all/~cs-89a4af89beda2604cb8435b235dee6f1/pr?sid=clo%2Cash%2Cank%2Cedy&marketplace=FLIPKART&restrictLocale=true&fm=personalisedRecommendation%2FC5&iid=R%3Ag%3Bpt%3Ahp%3Buid%3Aaa42a81f-2f9d-11ee-a724-2950022bf357%3B.cid%3AS_F_N_clo_ash_ank_edy__d_50-100__NONE_ALL%3Bnid%3Aclo_ash_ank_edy_%3Bet%3AS%3Beid%3Aclo_ash_ank_edy_%3Bmp%3AF%3Bct%3Ad%3Bat%3ADEFAULT%3B&ppt=clp&ppn=mobile-phones-store&ssid=n98k6w0zwg0000001690806450162&otracker=hp_reco_End%2Bof%2BSeason%2BBestsellers_3_8.dealCard.OMU_cid%3AS_F_N_clo_ash_ank_edy__d_50-100__NONE_ALL%3Bnid%3Aclo_ash_ank_edy_%3Bet%3AS%3Beid%3Aclo_ash_ank_edy_%3Bmp%3AF%3Bct%3Ad%3Bat%3ADEFAULT%3B_6&otracker1=hp_reco_WHITELISTED_personalisedRecommendation%2FC5_End%2Bof%2BSeason%2BBestsellers_DESKTOP_HORIZONTAL_dealCard_cc_3_NA_view-all_6&cid=cid%3AS_F_N_clo_ash_ank_edy__d_50-100__NONE_ALL%3Bnid%3Aclo_ash_ank_edy_%3Bet%3AS%3Beid%3Aclo_ash_ank_edy_%3Bmp%3AF%3Bct%3Ad%3Bat%3ADEFAULT%3B&sort=popularity=

In [1]:
# Importing Libraries
import pandas as pd 
from bs4 import BeautifulSoup 
import requests
from tqdm import tqdm

In [2]:
data = []

for i in tqdm(range(0,26)):
    link='https://www.flipkart.com/all/~cs-89a4af89beda2604cb8435b235dee6f1/pr?sid=clo%2Cash%2Cank%2Cedy&marketplace=FLIPKART&restrictLocale=true&fm=personalisedRecommendation%2FC5&iid=R%3Ag%3Bpt%3Ahp%3Buid%3Aaa42a81f-2f9d-11ee-a724-2950022bf357%3B.cid%3AS_F_N_clo_ash_ank_edy__d_50-100__NONE_ALL%3Bnid%3Aclo_ash_ank_edy_%3Bet%3AS%3Beid%3Aclo_ash_ank_edy_%3Bmp%3AF%3Bct%3Ad%3Bat%3ADEFAULT%3B&ppt=clp&ppn=mobile-phones-store&ssid=n98k6w0zwg0000001690806450162&otracker=hp_reco_End%2Bof%2BSeason%2BBestsellers_3_8.dealCard.OMU_cid%3AS_F_N_clo_ash_ank_edy__d_50-100__NONE_ALL%3Bnid%3Aclo_ash_ank_edy_%3Bet%3AS%3Beid%3Aclo_ash_ank_edy_%3Bmp%3AF%3Bct%3Ad%3Bat%3ADEFAULT%3B_6&otracker1=hp_reco_WHITELISTED_personalisedRecommendation%2FC5_End%2Bof%2BSeason%2BBestsellers_DESKTOP_HORIZONTAL_dealCard_cc_3_NA_view-all_6&cid=cid%3AS_F_N_clo_ash_ank_edy__d_50-100__NONE_ALL%3Bnid%3Aclo_ash_ank_edy_%3Bet%3AS%3Beid%3Aclo_ash_ank_edy_%3Bmp%3AF%3Bct%3Ad%3Bat%3ADEFAULT%3B&sort=popularity='+str(i)
    res=requests.get(link)
    soup=BeautifulSoup(res.text,'html.parser')

    for sp in soup.find_all("div",class_="_2B099V"):
        Product_names     = sp.find("div",class_="_2WkVRV").text
        Product_price     = sp.find("div",class_="_30jeq3").text
        Percentage_off   = sp.find("span").text
        Product_link = 'https://www.flipkart.com' + sp.find_all('a',class_='_3bPFwb')[-1].get('href')         
        
        data.append([Product_names,Product_price,Percentage_off,Product_link])

100%|██████████| 26/26 [00:33<00:00,  1.27s/it]


In [3]:
df_1 = pd.DataFrame(data,columns=['Product Names','Product Price','Percentage Off','Product Link'])

In [4]:
df_1.head()

Unnamed: 0,Product Names,Product Price,Percentage Off,Product Link
0,METRONAUT,₹299,85% off,https://www.flipkart.com/metronaut-solid-men-r...
1,METRONAUT By Flipkart,₹319,84% off,https://www.flipkart.com/metronaut-solid-men-p...
2,AUSK,₹199,86% off,https://www.flipkart.com/ausk-striped-men-roun...
3,VeBNoR,₹279,79% off,https://www.flipkart.com/vebnor-checkered-men-...
4,AUSK,₹199,86% off,https://www.flipkart.com/ausk-striped-men-roun...


In [5]:
data = []
for link in tqdm(df_1['Product Link']):
    res  = requests.get(link)
    soup = BeautifulSoup(res.text,'html.parser')
    
    # Product Type
    prod_type = soup.find('div',class_='_2NKhZn').text
    # Special Price
    prod_specialPrice = soup.find('div',class_='_1V_ZGU pZkvcx')
    # Product Rating
    prod_rating = soup.find('div',class_='_3LWZlK _3uSWvT')
    # Total Customer Rated
    prod_CustomerRating = soup.find('span',class_='_2_R_DZ')
    # Product Size
    prod_Size = []
    for prod_size in soup.find_all('a',class_='_1fGeJ5 _2UVyXR _31hAvz'):
        prod_Size.append(prod_size.text)
    prod_Size = ','.join(prod_Size)
    # Checking Null Values
    if prod_specialPrice and prod_rating and prod_CustomerRating is not None:
        prod_specialPrice = prod_specialPrice.text
        prod_rating = prod_rating.text
        prod_CustomerRating = prod_CustomerRating.text 
    else:
        prod_specialPrice = None
        prod_CustomerRating = None
        prod_rating = None
    
    data.append([prod_type,prod_specialPrice,prod_rating,prod_CustomerRating,prod_Size])

100%|██████████| 1040/1040 [17:36<00:00,  1.02s/it]


In [6]:
df_2=pd.DataFrame(data,columns = ['Product Type','Special Product','Product Rating','Customer Rated','Sizes'])
df_2.head()

Unnamed: 0,Product Type,Special Product,Product Rating,Customer Rated,Sizes
0,METRONAUT Solid Men Round Neck Dark Green T-Sh...,Special price,3.8,"11,386 ratings and 841 reviews","S,M,L,XL,XXL"
1,METRONAUT Solid Men Polo Neck Black T-Shirt\n,,,,"S,M,L,XL,XXL"
2,"AUSK Striped Men Round Neck Blue, Black T-Shirt\n",Special price,4.0,"11,254 ratings and 709 reviews","S,M,L,XXL"
3,VeBNoR Checkered Men Round Neck Grey T-Shirt\n,Special price,4.0,"13,375 ratings and 989 reviews","S,XL"
4,"AUSK Striped Men Round Neck Light Blue, White ...",Special price,4.0,"11,254 ratings and 686 reviews","S,M,L,XL,XXL"


In [7]:
df=pd.concat([df_1,df_2])
df.head()

Unnamed: 0,Product Names,Product Price,Percentage Off,Product Link,Product Type,Special Product,Product Rating,Customer Rated,Sizes
0,METRONAUT,₹299,85% off,https://www.flipkart.com/metronaut-solid-men-r...,,,,,
1,METRONAUT By Flipkart,₹319,84% off,https://www.flipkart.com/metronaut-solid-men-p...,,,,,
2,AUSK,₹199,86% off,https://www.flipkart.com/ausk-striped-men-roun...,,,,,
3,VeBNoR,₹279,79% off,https://www.flipkart.com/vebnor-checkered-men-...,,,,,
4,AUSK,₹199,86% off,https://www.flipkart.com/ausk-striped-men-roun...,,,,,


In [8]:
df.to_csv('final_data.csv',index=False)