First web scraping experiment (shell and Python)
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
examples
list_html
.gitignore
README.md
assemble_csv
cagetories_books.csv
diagram.png
download_script
grok.py
plot_barchart.py
urls.txt

README.md

README

Wanted to investigate Goodreads' categories numbers and play a little bit with Python's html parsing libraries (Beautiful soup in this case)

To download book categories html from Goodreads:

./download_script

Then to retrieve data and popuate a CSV with these data:

./assemble_csv

or to do both:

./download_script && ./assemble_csv

Folders

examples: In the examples folder diagrams with most and least popular categories (after placing generated CSV to Google Doc's spreadsheet.

list_html: Downloaded files. Commited folder's content in case anyone wants to experiment without retrieving data.

Notes

Did not explore Goodreads API as was more interested in experimenting with web scraping.