# Web Scraping using "Beautiful Soup" and "Request Module"

### Scraping Website = Gofresh Enterprices
##### Link -https://www.gofresh-anupam-enterprises.com

### Let's import All the necessary Packages

In [18]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from urllib.request import urlopen 
import requests

### Now let's request the HTML from the URL

In [19]:
headers_std = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36',
'Content-Type': 'text/html',
}

url = "https://www.gofresh-anupam-enterprises.com/"
html = requests.get(url,headers=headers_std).text

### Okay, Now we will create an object of Beautiful Soup class and pass the HTML data to the created object

In [20]:
soup = BeautifulSoup(html,'lxml')

#### After getting an idea of the structure of the webpage , now we will set the variables to the class names of the data we want to scrap

In [4]:
product_class = "_2BULo"
discount_price_class = "product-item-price-to-pay" 
actual_price_class = "product-item-price-before-discount"
#image_class = "false"
product_link_class = "_34sIs"

#### Now we will scrap all the required data from the webpage using the Beautiful Soup object

In [5]:
product_names = soup.find_all("h3", {"class":product_class})
actual_product_prices = soup.find_all("span",{"data-hook":actual_price_class})
discounted_product_prices = soup.find_all("span",{"data-hook":discount_price_class})
#images = soup.find_all("img",{"aria-hidden":image_class})
product_links = soup.find_all('a',{'class':product_link_class})

#### Let us check for any product in our scraped data

In [6]:
print(product_names[0].text.strip())
print(actual_product_prices[0].text.strip())
print(discounted_product_prices[0].text.strip())
#print(images[0].get('src').strip())
print( product_links[0].get('href').strip())

Go Cheese Plain Spread, 200g (soft and creamy)
₹120.00
₹105.00
https://www.gofresh-anupam-enterprises.com/product-page/go-cheese-plain-spread-200g-soft-and-creamy


#### Great! 
#### Before creating the data frames , first let's check the length of all the columns which we have scraped

In [7]:
print(len(product_names))
print(len(actual_product_prices))
print(len(discounted_product_prices))
print(len(product_links))

36
33
36
36


#### Now let's create the dataframes and insert our data into them , for blank entries we will just append None

In [10]:
product_names_df = []
actual_prices_df = []
discounted_prices_df = []
product_links_df = []

In [11]:
for i in range(len(product_names)):
 
    
    
        product_names_df.append(product_names[i].text.strip())
        discounted_prices_df.append(discounted_product_prices[i].text.strip())
        product_links_df.append(product_links[i].get('href').strip())
        try:
            actual_prices_df.append(actual_product_prices[i].text.strip())
        except:
             actual_prices_df.append(None)

#### Let's join these dataframes into a single one

In [21]:
df = pd.DataFrame({'product_name':product_names_df,'actual_price':actual_prices_df,'discounted_price':discounted_prices_df,'product_link':product_links_df})

In [22]:
df.head()

Unnamed: 0,product_name,actual_price,discounted_price,product_link
0,"Go Cheese Plain Spread, 200g (soft and creamy)",₹120.00,₹105.00,https://www.gofresh-anupam-enterprises.com/pro...
1,GO CHEESE CUBES 200GM,₹125.00,₹112.50,https://www.gofresh-anupam-enterprises.com/pro...
2,GO CHEESE SLICE 200GM,₹145.00,₹130.50,https://www.gofresh-anupam-enterprises.com/pro...
3,"FROZEN VEG SOY CHAP, 500gm",₹150.00,₹85.00,https://www.gofresh-anupam-enterprises.com/pro...
4,"Go Cheese Plain Spread, 200g (soft and creamy)",₹120.00,₹105.00,https://www.gofresh-anupam-enterprises.com/pro...


#### Writing the scraped data to a CSV file

In [24]:
df.to_csv('web_scraping.csv',index=False)

##### Created By - Maninder Singh Bakshi