-
Notifications
You must be signed in to change notification settings - Fork 0
Description
The dream of the "data hoarder" I am and in light of data sovereignty etc, but also primarily to make AI agents be able to mine such human data!
We have already
- con/tinuous for dumping CI logs; and have good collections of those on drogon, and smaug for some of the repos (not all)
- con/duct - for archival of std outs
- some solution for archiving our matrix internal channel (TODO: link)
- some solutions for 'bash history' dumps
I am also working on
- https://github.com/con/annextube , so that would be also some aspect of social (closed captions and comments), in particular
I also started to archive and orchestrate movement via git-annex of local Zoom session records
There are
- https://github.com/rusq/slackdump (tried and it worked IIRC)
- https://github.com/GeiserX/Telegram-Archive
Unrelated to social/chatter -- with @vmdocua we also collecting data for reprostim from various sources
What sparked
I have Ferdium with bunch of social networks, mostly slacks, some discord, matrix, etc. It is impossible to backup everything BUT it should be feasible to backup all channels on social nets where e.g. any of us posts. The point is to automate listing/discovery of what to backup and then to automate interfacing to the tools so we get all those source of data centralized nicely (think - YODA and datalad subdatasets) to facilitate reuse and discovery. Extra tools then could be crafted on top for agentic access. etc.
Some ideas on implementation
to a degree in design likely be similarish to
as to allow for different sources of listing for social platforms. In my case I would love to just point to my Fedium somehow and it to pick up which services it already has integrated, populate in yaml, potentially identify which channels within each social to backup.
Then such a config should be used to orchestrate aforementioned "archiving tools" (slackdump, Telegram-Archive, etc) and establish nice hierarchical DataLad dataset.
Most likely should be, again, config oriented but with a decent GUI (web UI) over its administration and monitoring!
Ideally should cover all data streams! I am afraid to say to include PACS servers in here, but oh well !
Target user
Any team/project to aggregate all relevant data they have -- YouTube, social nets, git hub issues, CI runs, ...
Name?
- con/flux
- con/serve
??