Books scraper for Google Scholar and Goodreads
- Install Python for your operating. You can download Python 3.8.2 from here.
- This program makes use of Selenium WebDriver for fetching GoodReads book shelf data. You should have a driver installed for your browser. Currently supported browsers are: Chrome, Firefox, Edge and Safari. We have tested with Firefox and Safari (on macOSX 10.14.6).
- Open a shell
cd some_folder_where_you_want_this_code
git clone https://github.com/bsodhi/books_scraper.git
cd books_scraper
python3 -m venv give_some_name
source give_some_name/bin/activate
pip install -r requirements.txt
python3 books_scraper/scraper.py -h
The output is written as a csv file. For Goodreads data following columns are
written to the csv file:
["author", "title", "isbn", "language", "avg_rating", "ratings", "pub_year", "book_format", "pages", "genre"]
For Google Scholar data, the columns are:
["author", "title", "citedby", "url", "abstract"]
This code is written by taking lot of help from StackOverflow community and Python API documentation. Greatly appreciated!