2018 Reddit Suspicious Accounts Release
To give you more insight into our findings, here is a link to all 944 accounts. We have decided to keep them visible for now, but after a period of time the accounts and their content will be removed from Reddit. We are doing this to allow moderators, investigators, and all of you to see their account histories for yourselves.
This dataset is an archive all public comments, submissions and user data beloging to these accounts, retrieved on Aprile 11, 2018 at ~17:00 CEST (GMT+2) from the Reddit API and stored as CSV. As of the extraction, one of the 944 accounts had been taken down and a 404 status was returned for its profile. Each sheet contains selected and relevant fields from the objects returned by the API.
The data has been harvested through the excellent PRAW library. A
log.txt file is made available.
The following files are made available:
seed.csv: the original user list as released by Reddit
data/users.csv: user data related to each of the released accounts
data/comments.csv: comment history from each of the released accounts
data/subissions.csv: submissions from each of the released accounts
data/log.txt: data extraction log
Comment releases have been a thing for a long time. All information provided were publicly available and searchable as of the extraction. No data is apparently classifiable as PPI. Nothing in the Reddit API TOS explicitly prohibits harvesting publicly available data. All legal stuff should be directed at inbox [/at/] albertocoscia [/dot/] me.
Where applicable, this data is released under a CC0 license and is public domain.