

**Description of Project**

I have developed a project based on two Python modules: Selenium and BeautifulSoup. These modules are used for automation and web scraping, respectively. The sources that I referred to while working on this project include *"Master Python Web Scraping and Automation using bs4 and Selenium"* by Hussain Mustafa on Udemy, ChatGPT, and official documentations.

The main objective of my project was to scrape information from a website, specifically from [https://www.stylecentrewholesale.co.uk/](https://www.stylecentrewholesale.co.uk/). The website conveniently had a page containing all the products, which reduced the need for extensive automation.

Due to the large amount of data and the limitations of my laptop and internet connection, I divided my project into two separate files:

1. `next_page`: In this file, I utilized Selenium to visit each page and extract its link. There were a total of 123 pages.

2. `scrapping`: In this file, I used BeautifulSoup to scrape the necessary product information and saved it as a CSV file.



To manage the enormous amount of data, I divided the links into four parts and executed the code for each file accordingly. In the end, I successfully scraped data from 123 pages, with each page containing around 24 products, resulting in a single CSV file containing information on 2,256 products.

---



In [22]:
#importing necessary modules
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException

import time
from datetime import date
import os
import json


In [30]:
#initializing selenium chrome driver
DRIVER = webdriver.Chrome()
#the function main() initiate the code and call other function and save all the data into a list
def main():
    try:
        DRIVER.get("https://www.stylecentrewholesale.co.uk/collections/all")
        page_list = next_page()
        input("Bot Operation Completed. Press any key...")
        DRIVER.close()
        return page_list
    except Exception as e:
        print(e)
        DRIVER.close()
    return page_list
#the function next_page goes through each page and collects the source url of that page
def next_page():
    
    page_list = [] 
    while True:
        try:
            #find the path for the next page and click it
            main = DRIVER.find_element(By.XPATH, '//ul[@class="pagination-custom"]/child::li/child::a[contains(text(),"→")]')
            main.click()
            #save the current url into a list
            page_list.append(DRIVER.current_url)
            time.sleep(3)
        except NoSuchElementException:
            # If the "→" element is not found, it means no more pages are left, so break the loop
            break
    return page_list
#intiliaze the code 
if __name__ == "__main__":
    page_list_result = main()
    print(page_list_result)



Bot Operation Completed. Press any key...
['https://www.stylecentrewholesale.co.uk/collections/all?page=2', 'https://www.stylecentrewholesale.co.uk/collections/all?page=3', 'https://www.stylecentrewholesale.co.uk/collections/all?page=4', 'https://www.stylecentrewholesale.co.uk/collections/all?page=5', 'https://www.stylecentrewholesale.co.uk/collections/all?page=6', 'https://www.stylecentrewholesale.co.uk/collections/all?page=7', 'https://www.stylecentrewholesale.co.uk/collections/all?page=8', 'https://www.stylecentrewholesale.co.uk/collections/all?page=9', 'https://www.stylecentrewholesale.co.uk/collections/all?page=10', 'https://www.stylecentrewholesale.co.uk/collections/all?page=11', 'https://www.stylecentrewholesale.co.uk/collections/all?page=12', 'https://www.stylecentrewholesale.co.uk/collections/all?page=13', 'https://www.stylecentrewholesale.co.uk/collections/all?page=14', 'https://www.stylecentrewholesale.co.uk/collections/all?page=15', 'https://www.stylecentrewholesale.co.uk/c

In [41]:
import pandas as pd
#insert the link of first page
page_list_result.insert(0,'https://www.stylecentrewholesale.co.uk/collections/all')
#save the list as a pandas DataFrame
df_dict = {'page_link':page_list_result}
df= pd.DataFrame(df_dict)
df

Unnamed: 0,page_link
0,https://www.stylecentrewholesale.co.uk/collect...
1,https://www.stylecentrewholesale.co.uk/collect...
2,https://www.stylecentrewholesale.co.uk/collect...
3,https://www.stylecentrewholesale.co.uk/collect...
4,https://www.stylecentrewholesale.co.uk/collect...
...,...
119,https://www.stylecentrewholesale.co.uk/collect...
120,https://www.stylecentrewholesale.co.uk/collect...
121,https://www.stylecentrewholesale.co.uk/collect...
122,https://www.stylecentrewholesale.co.uk/collect...


In [42]:
df.to_csv('pages_link.csv',index=False)