The following program takes the url of a specific consumer website that leads to the customer reviews section for a particular product and starts the automatic process of scraping reviews. I (Author) do not own any of the gathered data, hence I will not upload any of the sets of gathered data. To further prevent sharing of data, I have also removed some lines of code to prevent others from directly executing the program to gather data. This project is purely for educational purposes and self-entertainment. I highly encourage other parties interested in data scraping to write their own program that adheres with the targeted website's TOS (especially robots.txt) and other ethical scraping practices.
For best compatibility the following versions are recommended
- Python v3.x
- Selenium v4.3.x
- Chrome and ChromeDriver v101.0.4951.41
- Numpy v1.19.5
- Pandas v1.2.4
Only use source code for reference and educational purposes. Inside this project there are two scrapers with the .py extension. The result of the scrapers is written to a txt file with uncleaned data. To further clean the gathered data, a viable option is to use numpy and pandas. An example of data cleaning can be found in the jupyter notebook (cleaner.ipynb).
Do not push any changes to the repo. Instead, create an issue in github to suggest changes/addition.