Skip to content

MengtingWan/goodreads

Repository files navigation

NOTE: Our datasets have been moved! Please see our new webpage about how to download these datasets.

The datasets were collected in late 2017 from goodreads. Details of the datasets are described in the dataset website

We collected these datasets for academic use only! Please do not redistribute them or use for commercial purposes.

Citations

If you are using our dataset, please cite the following papers:

Notebooks/Code Samples

We've created several notebooks (in python 3.7) to illustrate how to download/read these datasets, and provide some basic explorations of the data.

  • download.ipynb: If you prefer to download datasets without GUI. This notebook will show how to download files in bash/python.
  • samples.ipynb: This notebook will show how to read '.json.gz' files line-by-line and display sample records of each file.
  • statistics.ipynb: This notebook will calculate some basic statistics of the datasets (except the largest complete interaction file 'goodreads_interactions.csv'). Running this notebook may take a while.
  • distributions.ipynb: This notebook will operate on the complete interaction file 'goodreads_interactions.csv' and provide some explorations of the distributions of these interactions. Note: Run this notebook only when you have LARGE memory (recommend 32g+)!!
  • reviews.ipynb: This notebook will calculate some statistics of the review datasets.