Skip to content

Latest commit

 

History

History
17 lines (14 loc) · 1.4 KB

README.md

File metadata and controls

17 lines (14 loc) · 1.4 KB

Web Scraping With Selenium WebDriver

Introduction

The following program takes the url of a specific consumer website that leads to the customer reviews section for a particular product and starts the automatic process of scraping reviews. I (Author) do not own any of the gathered data, hence I will not upload any of the sets of gathered data. To further prevent sharing of data, I have also removed some lines of code to prevent others from directly executing the program to gather data. This project is purely for educational purposes and self-entertainment. I highly encourage other parties interested in data scraping to write their own program that adheres with the targeted website's TOS (especially robots.txt) and other ethical scraping practices.

Requirements

For best compatibility the following versions are recommended

  • Python v3.x
  • Selenium v4.3.x
  • Chrome and ChromeDriver v101.0.4951.41
  • Numpy v1.19.5
  • Pandas v1.2.4

Usage

Only use source code for reference and educational purposes. Inside this project there are two scrapers with the .py extension. The result of the scrapers is written to a txt file with uncleaned data. To further clean the gathered data, a viable option is to use numpy and pandas. An example of data cleaning can be found in the jupyter notebook (cleaner.ipynb).

Contributing

Do not push any changes to the repo. Instead, create an issue in github to suggest changes/addition.