# Web Scraping using "Beautiful Soup" and "Request Module"

### Scraping Website = Flipkart
##### Link - https://www.flipkart.com/search?q=casual%20shirts&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off

### Let's import All the necessary Packages

In [1]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from urllib.request import urlopen 
import requests

### Now let's request the HTML from the URL

In [66]:
headers_std = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36',
'Content-Type': 'text/html',
}
url = "https://www.flipkart.com/search?q=casual%20shirts&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"
html = requests.get(url,headers=headers_std).text

### Okay, Now we will create an object of Beautiful Soup class and pass the HTML data to the created object

In [68]:
soup = BeautifulSoup(html,'lxml')

#### After getting an idea of the structure of the webpage , now we will set the variables to the class names of the data we want to scrap

In [69]:
product_name_class = "_2mylT6"
product_brand_class = "_2B_pmu"
selling_price_class = "_1vC4OE" 
actual_price_class = "_3auQ3N"
product_link_class = "_2mylT6"

#### Now we will scrap all the required data from the webpage using the Beautiful Soup object

In [70]:
product_names = soup.find_all("a", {"class":product_name_class})
product_brands = soup.find_all("div",{"class":product_brand_class})
selling_prices = soup.find_all("div",{"class":selling_price_class})
actual_prices = soup.find_all("div",{"class":actual_price_class})
product_links = soup.find_all('a',{'class':product_link_class})

#### Let us check for any product in our scraped data

In [75]:
print(product_names[5].text.strip())
print(product_brands[5].text.strip())
print(selling_prices[5].text.strip())
print(actual_prices[5].text.strip())
print( 'https://www.flipkart.com'+product_links[0].get('href').strip())

Men Military Camouflage Casual Spread Shirt
Highlander
₹1,399
₹999
https://www.flipkart.com/unistreet-men-striped-casual-blue-shirt/p/itmcc38face427ab?pid=SHTFPFHPQAXQTJQF&lid=LSTSHTFPFHPQAXQTJQFFMZWW0&marketplace=FLIPKART&srno=s_1_1&otracker=search&otracker1=search&fm=organic&iid=en_mq1pJDbWU3mGq%2BmyTrKDNBmBazTrHS20zrr3sza50GA1AfP2FO%2Bf%2Bh4NP27hGAsLkBt441QFcj3iDPaU9cnWdA%3D%3D&ssid=0n3tstrqj40000001591428987283&qH=c898b17e35b9f66c


#### Great! 
#### Before creating the data frames , first let's check the length of all the columns which we have scraped

In [49]:
print(len(product_names))
print(len(product_brands))
print(len(selling_prices))
print(len(actual_prices))
print(len(product_links))

40
40
45
38
40


#### Now let's create the dataframes and insert our data into them , for blank entries we will just append None

In [72]:
product_names_df = []
product_brands_df = []
selling_prices_df = []
actual_prices_df = []
product_links_df = []

In [73]:
for i in range(len(selling_prices)):
        selling_prices_df.append(selling_prices[i].text.strip())
        try:
            product_names_df.append(product_names[i].text.strip())
        except:
            product_names_df.append(None)
        try:
            product_brands_df.append(product_brands[i].text.strip())
        except:
            product_brands_df.append(None)
        try:
            actual_prices_df.append(actual_prices[i].text.strip())
        except:
            actual_prices_df.append(None)
        try:
            product_links_df.append('flipkart.com'+product_links[i].get('href').strip())
        except:
            product_links_df.append(None)
        



#### Let's join these dataframes into a single one

In [61]:
df = pd.DataFrame({'product_name':product_names_df,'product_brand':product_brands_df,'actual_price':actual_prices_df,'selling_price':selling_prices_df,'product_link':product_links_df})

In [62]:
len(df)

45

In [63]:
df.head()

Unnamed: 0,product_name,product_brand,actual_price,selling_price,product_link
0,Men Dyed Casual Mandarin Shirt,Pepe Jeans,"₹2,199","₹1,319",flipkart.com/pepe-jeans-men-dyed-casual-white-...
1,Men Self Design Casual Button Down Shirt,IndoPrimo,₹999,₹649,flipkart.com/indoprimo-men-self-design-casual-...
2,Men Checkered Casual Hood Shirt,Highlander,₹999,"₹1,449",flipkart.com/highlander-men-checkered-casual-m...
3,Men Solid Casual Mandarin Shirt,Highlander,"₹1,399",₹519,flipkart.com/highlander-men-solid-casual-blue-...
4,Men Color Block Casual Shirt,Tripr,"₹1,499",₹589,flipkart.com/tripr-men-color-block-casual-whit...


#### Writing the scraped data to a CSV file

In [74]:
df.to_csv('flipkart_web_scraping.csv',index=False)

##### Created By - Maninder Singh Bakshi