A simple python script that archives all messages from a public Yahoo Group
Yahoo! Groups will be discontinuing most services as of 14/Dec/2019.
Due to the urgent need to archive content from Yahoo! Groups, Emphasis will be on basic function of the script; however, feel free to submit issues.
yahooGroupsArchiver archives all the files in a Yahoo! Group. Messages are downloaded in a JSON format, with one .json file per message. This script supports cookie import from Firefox; to allow for archiving of private groups.
Requirements: Python 3, with json, requests, os, time, sys, shutil, and sqlite3
Uncomment the appropriate line in the code block following line 36 to reflect you OS and Firefox profile path.
Before each use, sign into Yahoo! Groups to make sure your cookies are current.
python3 yahooGroupsArchiver.py <groupName> [options] [nologs]
<groupName> is the name of the group you wish to archive (e.g: hypercard)
Options (One only)
- `update' - the default., Archive all new messages since the last time the script was run
retry- Archive any new messages, and attempt to archive any messages that could not be downloaded last time
restart- Delete all previously archived messages and archive again from scratch
By default a log file called .txt is created and stores information such as what messages could not be received. This is entirely for the benefit of the user: it's not needed at all by the script during any re-runs (although re-runs will append new information to the log file). If you don't want a log file to be created or added to, add the
nologs keyword when you call the script.
Yahoo may attempt to block robots and may trottle or block sessions interacting with large numbers of messages. This is temporary, and lasts for less than 2 hours typically.
Add support for group files and photos.
This code is based almost entirely on the works of Andrew Ferguson, and Daniel t. Created with the consultation of Avery Dame-Griff for the Queer Digital History Project. API documentation is from the Archive Team.