Fed-Goodreads ✍️

Federated version of Goodreads spoiler subset dataset Goodreads is a publicly available dataset and commonly used for text classification with DL due to its large volume. Fed-Goodreads contains the book reviews subset with parsed spoiler tags is used (1.38m reviews) to perform a binary classification task of predicting if a review sentence contains a spoiler or not. This dataset provides an ideal peronsalised setting for a federated dataset as the data is organised by individual users, where a user will have different quantities of data and different users have different patterns of writing sentences.

Fed-Goodreads contains 100 unique clients; the number of samples per client is limited to 2-10 to enforce statistical heterogeneity; and each data sample contains 2517 features.

Setup

Download the goodreads_reviews_spoiler.json.gz from here and extract in the root directory.

To generate the dataset follow the generate jupyter notebook instructions.

Usage

This dataset is compatible with the FedProx, FedSim implemenations. If you are using the same experiment setup simply add the following into the main.py,

DATASETS = [....., 'goodreads']

MODEL_PARAMS = { ..... , 
    'goodreads.mclr': (2,), 
}

Reference models are availble in the FedSim implementation here.

Other datasets

Fed-MEx

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
README.md		README.md
generate.ipynb		generate.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

README.md

README.md

generate.ipynb

generate.ipynb

Repository files navigation

Fed-Goodreads ✍️

Setup

Usage

Other datasets

About

Releases

Packages

Languages

chamathpali/Fed-Goodreads

Folders and files

Latest commit

History

Repository files navigation

Fed-Goodreads ✍️

Setup

Usage

Other datasets

About

Resources

Stars

Watchers

Forks

Languages