# Web Scraper To Generate Product Database From E-Commerce Site 
<p align = "justified"> This project is aimed to implement basic web scraping using Python's BeautifulSoup library to create an informative dataset of available products. The e-commerce website targetted in the notebook is laptopsdirect.uk, with primary focus on the available laptops being sold. The project is divided into two parts - the first part scrapes only the first result page of the concerned product while the second part fetches all the available results from multiple pages. The scraped data is formatted using pandas library and exported in .xlsx format. </p>   

The following data points were fetched in this project:
* Product Name
* Price
* Rating
* Review Count
* Product Details
* Relative URL

In [None]:
from bs4 import BeautifulSoup             #Importing necessary Libraries
import requests
import pandas as pd
import urllib.parse

## Part 1 - Scraping the First Result Page


In [77]:
website  = "https://www.laptopsdirect.co.uk/st/new-laptops"   #Storing Website's target URL

In [3]:
response = requests.get(website)

In [78]:
response.status_code   #Check HTTP Status

200

In [6]:
soup = BeautifulSoup(response.content, 'html.parser')           #Creating a Soup object for scraping

In [21]:
results = soup.find_all('div', {'class': 'OfferBox'})                  #storing the first page's target results

In [None]:
soup

In [22]:
len(results)


23

In [23]:
results[0]                           #Checking the first 

<div class="OfferBox">
<div class="merchTopofCentreColumn top5icon" onload="dataLayer.push({'event':'top5IconShown'});" style="display:none; z-index: 20; position: absolute; top: 50px; left: 10px; cursor: pointer;  pointer-events: none; width: 70px;">
<img alt="Top 5" border="0" src="https://www.appliancesdirect.co.uk/images/top5-apd.png">
</img></div>
<div class="sr_image">
<div class="sr_compare">
<div class="sli_compare" id="1573430_compare">
<input class="compare" onclick="AddRemoveFromProductCompareCookie(this, event);" type="checkbox" value="1573430"/>
<input id="comphid_1573430" type="hidden" value="Laptops">
<span class="compareText">Compare</span>
</input></div>
<div class="comparedialog" id="msg_1573430" style="display:none;" title="Compare products">
<p>Sorry, you can only compare a maximum of 4 items per category.</p>
</div>
<div class="compBtn" id="compBtn_1573430">
<a href="/Compare" type="button">Compare products »</a>
</div>
</div>
<a href="/acer-aspire-5-core-i5-1035g1

In [26]:
results[0].find('a', {'class':'offerboxtitle'}).get_text()         #finding the product name

'Acer Aspire 5 Core i5-1035G1 8GB 512GB SSD 15.6 Inch Windows 10 Home Laptop - NX.HSLEK.007'

In [27]:
results[0].find('span', {'class':'offerprice'}).get_text()         #finding the product price

'£579.97'

In [28]:
results[0].find('star-rating').get('rating-value')                 #finding the rating value

'8.5'

In [29]:
results[0].find('star-rating').get('ratings-count')                #Counting number of reviews

'2'

In [32]:
relative_url = results[0].find('a', {'class':'offerboxtitle'}).get('href')    #fetching relative URL

In [42]:
root_url = 'https://www.laptopsdirect.co.uk'
url_combine = root_url + relative_url

In [43]:
url_combine

'https://www.laptopsdirect.co.uk/acer-aspire-5-core-i5-1035g1-8gb-512gb-ssd-15.6-inch-windows-10-laptop-nx.hslek.007/version.asp'

In [41]:
results[0].find('div', {'class': 'productInfo'}).get_text().strip().replace('\n', ', ')

'Intel Core i5 1035G1 Processor, UHD Graphics 620 Graphics card, 15.6 Inch Full HD Screen, 8GB RAM, 512GB SSD'

In [72]:
product_name = []
product_price = []
review_rating = []
review_count = []
relative_url = []
product_details = []

for result in results:
    
    #Product Name    
    try:
        product_name.append(result.find('a', {'class':'offerboxtitle'}).get_text())
    except:
        product_name.append('NA')
    
    #Product Price    
    try:
        product_price.append(result.find('span', {'class':'offerprice'}).get_text())
    except:
        product_price.append('NA')
        
    #Product Rating    
    try:
        review_rating.append(result.find('star-rating').get('rating-value'))
    except:
        review_rating.append('NA')
        
    #Review Count
    try:
        review_count.append(result.find('star-rating').get('ratings-count'))
    except:
        review_count.append('NA') 
        
    #Relative URL
    try:
        relative_url.append(result.find('a', {'class':'offerboxtitle'}).get('href'))
    except:
        relative_url.append('NA') 
        
    #Product Details
    try:
        product_details.append(result.find('div', {'class': 'productInfo'}).get_text().strip().replace('\n', ', '))
    except:
        product_details.append('NA') 

In [74]:
url_combined = []
for link in relative_url:
    url_combined.append(urllib.parse.urljoin(root_url, link))

In [62]:
url_combined

['https://www.laptopsdirect.co.uk/acer-aspire-5-core-i5-1035g1-8gb-512gb-ssd-15.6-inch-windows-10-laptop-nx.hslek.007/version.asp',
 'https://www.laptopsdirect.co.uk/-core-i5-1035g1-8gb-256gb-ssd-15.6-inch-windows-10-laptop-30030444/version.asp',
 'https://www.laptopsdirect.co.uk/asus-c223na-intel-celeron-n3350-4gb-32gb-emmc-11.6-inch-windows-10-chromebo-c223na-gj0040/version.asp',
 'https://www.laptopsdirect.co.uk/coda-1.2-intel-celeron-n4020-4gb-64gb-emmc-12.5-inch-windows-10-laptop-inc-coda037/version.asp',
 'https://www.laptopsdirect.co.uk/-c523na-br0067/version.asp',
 'https://www.laptopsdirect.co.uk/lenovo-v14-ada-amd-ryzen-3-3250u-8-gb-ram-256-gb-ssd-14-inch-windows-home-1-82c6006cuk/version.asp',
 'https://www.laptopsdirect.co.uk/dell-latitude-3520-core-i5-1135g7-8gb-256gb-ssd-15.6-inch-fhd-windows-10-pr-thy6r/version.asp',
 'https://www.laptopsdirect.co.uk/lenovo-v15-iil-core-i5-1035u-8gb-256gb-ssd-15.6-inch-fhd-windows-10-laptop-82c50075uk/version.asp',
 'https://www.laptopsd

In [75]:
product_overview = pd.DataFrame({'Name': product_name, 'Price': product_price, 'Rating': review_rating, 'Review Count': review_count, 'Link': url_combined, 'Details': product_details})

In [79]:
product_overview

Unnamed: 0,Name,Price,Rating,Review Count,Link,Details
0,Acer Aspire 5 Core i5-1035G1 8GB 512GB SSD 15....,£579.97,8.5,2.0,https://www.laptopsdirect.co.uk/acer-aspire-5-...,"Intel Core i5 1035G1 Processor, UHD Graphics 6..."
1,Medion Akoya E15407 Core i5-1035G1 8GB 256GB S...,£459.97,8.0,42.0,https://www.laptopsdirect.co.uk/-core-i5-1035g...,"Intel Core i5 1035G1 Processor, Iris Xe Graphi..."
2,Asus C223NA Intel Celeron N3350 4GB 32GB eMMC ...,£179.97,,,https://www.laptopsdirect.co.uk/asus-c223na-in...,"Intel Celeron N3350 Processor, 11.6 Inch 1366 ..."
3,CODA 1.2 Intel Celeron N4020 4GB 64GB eMMC 12....,£199.97,,,https://www.laptopsdirect.co.uk/coda-1.2-intel...,"Intel Celeron N4020 Processor, UHD Graphics 62..."
4,Asus C523 Intel Celeron N3350 4GB 64GB eMMC 15...,£227.97,9.0,11.0,https://www.laptopsdirect.co.uk/-c523na-br0067...,"Intel Celeron N3350 Processor, 15.6 Inch 1366 ..."
5,Lenovo V14-ADA AMD Ryzen 3 3250U 8GB 256GB SSD...,£399.97,8.8,25.0,https://www.laptopsdirect.co.uk/lenovo-v14-ada...,"AMD Ryzen 3 3250U Processor, Radeon Graphics G..."
6,Dell Latitude 3520 Core i5-1135G7 8GB 256GB SS...,£719.97,,,https://www.laptopsdirect.co.uk/dell-latitude-...,"Intel Core i5 1135G7 Processor, UHD Graphics 6..."
7,Lenovo V15-IIL Core i5-1035G1 8GB 256GB SSD 15...,£549.97,8.7,903.0,https://www.laptopsdirect.co.uk/lenovo-v15-iil...,"Intel Core i5 1035G1 Processor, UHD Graphics 6..."
8,Asus Vivobook14 Core i3-1005G1 8GB 256GB SSD 1...,£399.97,10.0,2.0,https://www.laptopsdirect.co.uk/asus-core-i3-1...,"Intel Core i3 1005G1 Processor, UHD Graphics 6..."
9,Acer Swift 3 SF313-53 Core i7-1165G7 8GB 512GB...,£882.97,9.8,4.0,https://www.laptopsdirect.co.uk/acer-swift-3-s...,"Intel Core i7 1165G7 Processor, Iris Xe Graphi..."


In [80]:
product_overview.to_excel("ResultSingle.xlsx", index = False)

---

## Part 2 -  Scraping Multiple Result Pages

In [86]:
product_name = []
product_price = []
review_rating = []
review_count = []
relative_url = []
product_details = []

for i in range(1, 21):
        website = "https://www.laptopsdirect.co.uk/st/new-laptops?pageNumber=" + str(i)
        response = requests.get(website)
        soup = BeautifulSoup(response.content, 'html.parser')
        results = soup.find_all('div', {'class': 'OfferBox'})
        for result in results:
            
            try:
                product_name.append(result.find('a', {'class':'offerboxtitle'}).get_text())
            except:
                product_name.append('NA')

            #Product Price    
            try:
                product_price.append(result.find('span', {'class':'offerprice'}).get_text())
            except:
                product_price.append('NA')

            #Product Rating    
            try:
                review_rating.append(result.find('star-rating').get('rating-value'))
            except:
                review_rating.append('NA')

            #Review Count
            try:
                review_count.append(result.find('star-rating').get('ratings-count'))
            except:
                review_count.append('NA') 

            #Relative URL
            try:
                relative_url.append(result.find('a', {'class':'offerboxtitle'}).get('href'))
            except:
                relative_url.append('NA') 

            #Product Details
            try:
                product_details.append(result.find('div', {'class': 'productInfo'}).get_text().strip().replace('\n', ', '))
            except:
                product_details.append('NA') 

In [87]:
url_combined = []
for link in relative_url:
    url_combined.append(urllib.parse.urljoin(root_url, link))

In [88]:
product_overview = pd.DataFrame({'Name': product_name, 'Price': product_price, 'Rating': review_rating, 'Review Count': review_count, 'Link': url_combined, 'Details': product_details})

In [90]:
product_overview

Unnamed: 0,Name,Price,Rating,Review Count,Link,Details
0,Acer Aspire 5 Core i5-1035G1 8GB 512GB SSD 15....,£579.97,8.5,2,https://www.laptopsdirect.co.uk/acer-aspire-5-...,"Intel Core i5 1035G1 Processor, UHD Graphics 6..."
1,Dell Latitude 3520 Core i5-1135G7 8GB 256GB SS...,£719.97,,,https://www.laptopsdirect.co.uk/dell-latitude-...,"Intel Core i5 1135G7 Processor, UHD Graphics 6..."
2,Medion Akoya E15407 Core i5-1035G1 8GB 256GB S...,£459.97,8,42,https://www.laptopsdirect.co.uk/-core-i5-1035g...,"Intel Core i5 1035G1 Processor, Iris Xe Graphi..."
3,Asus C223NA Intel Celeron N3350 4GB 32GB eMMC ...,£179.97,,,https://www.laptopsdirect.co.uk/asus-c223na-in...,"Intel Celeron N3350 Processor, 11.6 Inch 1366 ..."
4,CODA 1.2 Intel Celeron N4020 4GB 64GB eMMC 12....,£199.97,,,https://www.laptopsdirect.co.uk/coda-1.2-intel...,"Intel Celeron N4020 Processor, UHD Graphics 62..."
5,Asus C523 Intel Celeron N3350 4GB 64GB eMMC 15...,£227.97,9,11,https://www.laptopsdirect.co.uk/-c523na-br0067...,"Intel Celeron N3350 Processor, 15.6 Inch 1366 ..."
6,Lenovo V14-ADA AMD Ryzen 3 3250U 8GB 256GB SSD...,£399.97,8.8,25,https://www.laptopsdirect.co.uk/lenovo-v14-ada...,"AMD Ryzen 3 3250U Processor, Radeon Graphics G..."
7,Lenovo V15-IIL Core i5-1035G1 8GB 256GB SSD 15...,£549.97,8.7,903,https://www.laptopsdirect.co.uk/lenovo-v15-iil...,"Intel Core i5 1035G1 Processor, UHD Graphics 6..."
8,Asus Vivobook14 Core i3-1005G1 8GB 256GB SSD 1...,£399.97,10,2,https://www.laptopsdirect.co.uk/asus-core-i3-1...,"Intel Core i3 1005G1 Processor, UHD Graphics 6..."
9,Lenovo V15 Althlon Silver 3050U 4GB 128GB SSD ...,£299.97,8.7,903,https://www.laptopsdirect.co.uk/lenovo-v15-alt...,"Athlon Silver 3050U Processor, Radeon Graphics..."


In [None]:
product_overview.to_excel("ResultAll.xlsx", index = False)  #Exporting results to MS Excel file (.xlsx format)