Install the following packages:
- selenium: To automate scraping. You can download using pip through command line as:
pip install selenium
- webdriver-manager: Install the chrome driver inplace so no need to download explicitly.
You can download it through command line as:
pip install webdriver_manager
- pandas: For file manipulation (saving data to csv). You can download using:
pip install pandas
- word2number: Convert words to numbers. You can download using pip through command line as:
pip install word2number
Currently this script works on Chrome browser.
File structure:
--AuthorProfileConfigConfig.py: Contains user-defined functions to retrieve data.
--DriverSetup.py: Defines and initiate webdriver object of selenium.
--main.py: Run this file to scrape data for author profile.
--ProductMain.py: Run this file to scrape data for all the subprodcuts related to each author.
To run:
- run main.py. Data will be scraped from main_product folder containing all the main product data.
Data will be stored in reviewers folder. - run ProductMain.py. Data will be scraped from reviewers folder containing all the author profile data. Data will be store in reviews folder. For example: \data_scraping_v2\