Skip to content

hoarder to bring them all in! #2

@yarikoptic

Description

@yarikoptic
Image

The dream of the "data hoarder" I am and in light of data sovereignty etc, but also primarily to make AI agents be able to mine such human data!

We have already

  • con/tinuous for dumping CI logs; and have good collections of those on drogon, and smaug for some of the repos (not all)
  • con/duct - for archival of std outs
  • some solution for archiving our matrix internal channel (TODO: link)
  • some solutions for 'bash history' dumps

I am also working on

I also started to archive and orchestrate movement via git-annex of local Zoom session records

There are

Unrelated to social/chatter -- with @vmdocua we also collecting data for reprostim from various sources

What sparked

I have Ferdium with bunch of social networks, mostly slacks, some discord, matrix, etc. It is impossible to backup everything BUT it should be feasible to backup all channels on social nets where e.g. any of us posts. The point is to automate listing/discovery of what to backup and then to automate interfacing to the tools so we get all those source of data centralized nicely (think - YODA and datalad subdatasets) to facilitate reuse and discovery. Extra tools then could be crafted on top for agentic access. etc.

Some ideas on implementation

to a degree in design likely be similarish to

as to allow for different sources of listing for social platforms. In my case I would love to just point to my Fedium somehow and it to pick up which services it already has integrated, populate in yaml, potentially identify which channels within each social to backup.

Then such a config should be used to orchestrate aforementioned "archiving tools" (slackdump, Telegram-Archive, etc) and establish nice hierarchical DataLad dataset.

Most likely should be, again, config oriented but with a decent GUI (web UI) over its administration and monitoring!

Ideally should cover all data streams! I am afraid to say to include PACS servers in here, but oh well !

Target user

Any team/project to aggregate all relevant data they have -- YouTube, social nets, git hub issues, CI runs, ...

Name?

  • con/flux
  • con/serve

??

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions