Skip to content
This repository has been archived by the owner on Jun 17, 2022. It is now read-only.

dimitrismistriotis/goodreads_categories_scrapping

Repository files navigation

README

Wanted to investigate Goodreads' categories numbers and play a little bit with Python's html parsing libraries (Beautiful soup in this case)

To download book categories html from Goodreads:

./download_script

Then to retrieve data and popuate a CSV with these data:

./assemble_csv

or to do both:

./download_script && ./assemble_csv

Folders

examples: In the examples folder diagrams with most and least popular categories (after placing generated CSV to Google Doc's spreadsheet.

list_html: Downloaded files. Commited folder's content in case anyone wants to experiment without retrieving data.

Notes

Did not explore Goodreads API as was more interested in experimenting with web scraping.

About

First web scraping experiment (shell and Python)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published