Extend Lexicon / Feeds / Labels with a common filter operator language for transparent algorithmic filtering? #2975

DGaffney · 2024-11-11T13:20:45Z

DGaffney
Nov 11, 2024

Hello!

I'm relatively new to Bluesky/ATProto so please disregard or move somewhere else if that makes more sense. I've been thinking through how one might design primitive algorithmic operators such that (a) we can describe the processes by which items are filtered in a feed and/or labeled etc, and (b) provide simple front-end tools at some point for constructing feeds based off those definitions. I've been plugging away the past few days on what I've been calling a feed manifest, which is basically a JSON structure declaring a series of operations that are to be done on the firehose that ultimately results in a generated feed. For example, if I wanted to just construct a feed out of my followers, I'd declare the filter as:

{
    "filter": {
        "and": [
            {
                "social_graph": [
                    "brendannyhan.bsky.social",
                    "is_in",
                    "follows"
                ]
            },
        ]
    }
}

And if I wanted to chain that with an ML model I built that classifies post text as being written like dril posts or not, I'd add a further modifier of:

{
    "filter": {
        "and": [
            {
                "social_graph": [
                    "brendannyhan.bsky.social",
                    "is_in",
                    "follows"
                ]
            },
            {
                "model_probability": [
                    {
                        "model_name": "dril_detector"
                    },
                    ">=",
                    0.9
                ]
            }
        ]
    },
    "models": [
        {
            "feature_modules": [
                {
                    "model_name": "all-MiniLM-L6-v2",
                    "type": "vectorizer"
                }
            ],
            "model_name": "dril_detector",
            "training_file": "dril_detector_dataset.json"
        }
    ]
}

I've identified/put together a set of primitive operators (Regex-based, HuggingFace vector similarity based, social-graph/user list/starter pack based, ML probability based, attribute filtering e.g. lang field based, etc), and I'd love to first, get more ideas of what primitives we could add, and second, maybe start firming up some of the language around all this, and perhaps breaking out the code I've written in Sky-Feeder into its own python package so that we can all start working from a shared definition of boolean operations?

The "long-term" view here is that we come up with a shared meaning for what it means to select, reject, filter, promote, etc posts with a shared algorithmic implementation to the fullest extent possible. Over time, we build that out into a full lexicon of operations, and then use that as a basis to build tools that allow users the ability to mix and match filtering techniques and have transparent "cards" that explain the algorithm's logical chain clearly and effectively, to avoid the pitfalls of algorithmic opacity on every other social site.

I don't know if this is already played through in another thread but I couldn't find it immediately - I'm sure this isn't a brand new idea either but I'd love to talk through it and see if we can start forming some rough consensus about whether its worth pursuing!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend Lexicon / Feeds / Labels with a common filter operator language for transparent algorithmic filtering? #2975

{{title}}

Replies: 0 comments

Select a reply

Extend Lexicon / Feeds / Labels with a common filter operator language for transparent algorithmic filtering? #2975

DGaffney Nov 11, 2024

Replies: 0 comments

DGaffney
Nov 11, 2024