Skip to content

discovery: filter shadow-banned hosts from event listings#14297

Closed
dylanjeffers wants to merge 2 commits into
mainfrom
claude/filter-shadowbanned-events
Closed

discovery: filter shadow-banned hosts from event listings#14297
dylanjeffers wants to merge 2 commits into
mainfrom
claude/filter-shadowbanned-events

Conversation

@dylanjeffers
Copy link
Copy Markdown
Contributor

@dylanjeffers dylanjeffers commented May 12, 2026

Summary

get_events (packages/discovery-provider/src/queries/get_events.py) backs the contests discovery list (and any other event-list consumer). It previously surfaced contests whose host had been shadow-banned, because the existing filter chain only excluded Event.is_deleted == True.

This PR adds a host-shadow-ban filter that combines the two parallel shadow-ban signals the discovery-provider already uses elsewhere — applying both so the filter catches the full shadow-banned population.

The two signals (and why both)

Signal Source Catches Used by today
aggregate_user.score < 0 composite account-quality score from compute_user_score.sql Audius-impersonators, low-engagement / bot-like accounts, chat-blocked users handle_save.sql:129, handle_follow.sql:29, handle_repost.sql:94 — feed-action notification suppression
muted_by_karma subquery (sum of muters' follower_count ≥ COMMENT_KARMA_THRESHOLD) community-driven karma muting via the muted_users table Users high-follower accounts have specifically reported / muted get_track_comment_count.py:31-38 — per-user comment hiding

The two catch different (but overlapping) populations. Score-based catches bots and impersonators who may never have been actively reported; karma-mute catches users that influential accounts have explicitly flagged. Applying both means contest discovery hides anyone who falls into either bucket.

Implementation

Inside _get_events, after the existing filter_deleted step:

muted_by_karma = (
    session.query(MutedUser.muted_user_id)
    .join(AggregateUser, MutedUser.user_id == AggregateUser.user_id)
    .filter(MutedUser.is_delete == False)
    .group_by(MutedUser.muted_user_id)
    .having(func.sum(AggregateUser.follower_count) >= COMMENT_KARMA_THRESHOLD)
    .subquery()
)
base_query = base_query.outerjoin(
    AggregateUser, AggregateUser.user_id == Event.user_id
).filter(
    or_(AggregateUser.score >= 0, AggregateUser.score.is_(None)),
    ~Event.user_id.in_(muted_by_karma),
)
  • OUTER JOIN on aggregate_user so users without an aggregate row yet (brand-new accounts) aren't accidentally filtered — the or_(... is_(None)) lets them through; only confirmed negative scores are excluded.
  • muted_by_karma is lifted from get_track_comment_count.py verbatim (same shape, same COMMENT_KARMA_THRESHOLD constant), so the contest discovery filter applies the exact same per-user check the comment system uses.
  • No row duplication: aggregate_user.user_id is a primary key, so the outer join is 1:1 per Event. The IN-subquery doesn't join, so it can't duplicate either.
  • add_query_pagination is plain LIMIT/OFFSET with optional include_count; both work correctly with the dual filter.
  • get_events_by_ids (the by-ID lookup) is unchanged on purpose. Deep-link lookups for known events shouldn't be silently broken by a discovery-time filter — comment/action surfaces on those pages have their own enforcement paths.

Risk / blast radius

get_events is the single entry point for event-list queries (confirmed via grep — no other callers in src/). Any consumer that wanted to see shadow-banned hosts' events explicitly would need a new flag like include_shadowbanned: bool — I didn't add one because no caller asks for it today; happy to extend if there's a known admin/internal need.

Test plan

  • Create two contest events; mark one host's aggregate_user.score = -1. Call get_events(...) → only the non-shadow-banned event is returned.
  • Create a contest by a host who has been muted by a set of users whose combined follower counts ≥ COMMENT_KARMA_THRESHOLD (but whose own score >= 0). Confirm that event is also excluded.
  • Create a contest by a brand-new user with no aggregate_user row. Confirm their event still appears (the outer join's NULL is allowed through; they're not in muted_by_karma because no one has muted them yet).
  • Confirm get_events_by_ids(id=[shadowbanned_event_id]) still returns the event (filter not applied on that path).
  • Smoke the contests discovery list in mobile + web after deploy — shadow-banned users' contests should disappear from the grid.

🤖 Generated with Claude Code

`get_events` (the query backing the contests discovery list, among
other event-list consumers) previously surfaced contests whose host
had been shadow-banned. The SQL feed handlers (handle_save.sql,
handle_follow.sql, handle_repost.sql) already gate notifications on
`aggregate_user.score < 0`; this applies the same check to the event
listing so shadow-banned users' contests don't appear in discovery.

Implementation:
- OUTER JOIN `aggregate_user` on `Event.user_id` so users without an
  aggregate row (e.g. brand-new accounts that haven't been rolled up
  yet) aren't accidentally filtered out — only users with a
  confirmed negative score are excluded.
- Filter sits in `_get_events` alongside the existing `is_deleted`
  check. `get_events_by_ids` (the by-ID lookup) is unchanged on
  purpose: direct deep-link lookups for known events shouldn't be
  silently broken by a discovery-time filter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 12, 2026

⚠️ No Changeset found

Latest commit: f42829b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Per review on #14297: the discovery-provider has two parallel
shadow-ban signals — `aggregate_user.score < 0` (used by the SQL feed
handlers handle_save, handle_follow, handle_repost) and karma-based
muting (used by `get_track_comment_count.py` for per-user comment
hiding). The previous commit applied only the first; this adds the
second so the filter catches the full shadow-banned population:

- Bots / impersonators / low-quality accounts → caught by `score < 0`
- Community-flagged users muted by high-follower-count accounts →
  caught by the `muted_by_karma` subquery

The `muted_by_karma` subquery is lifted from
`get_track_comment_count.py` (same shape, same threshold constant
COMMENT_KARMA_THRESHOLD) so the contest discovery filter applies the
exact same per-user check the comment system uses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pull-request-size pull-request-size Bot added size/M and removed size/S labels May 12, 2026
@dylanjeffers
Copy link
Copy Markdown
Contributor Author

Closing — wrong repo. The discovery-provider's Flask API was removed in #14236 and get_events.py is now dead code on main (zero callers). The actual /v1/events/remix-contests endpoint lives in the Go API repo. Re-opening the dual shadow-ban filter (score < 0 OR muted_by_karma) over there.

dylanjeffers added a commit to AudiusProject/api that referenced this pull request May 12, 2026
…803)

## Summary

`v1EventsRemixContests`
([api/v1_events_remix_contests.go](api/v1_events_remix_contests.go)) —
the endpoint backing the contests discovery page on mobile + web —
previously surfaced contests whose host was shadow-banned.
`v1EventComments` already applies the two-signal shadow-ban filter to
comment authors; this PR mirrors the exact same pair against contest
hosts so the discovery list and comment list stay in lockstep.

This was originally drafted against
\`packages/discovery-provider/src/queries/get_events.py\` in the apps
monorepo (PR AudiusProject/apps#14297), but that Flask API was removed
in #14236 and the file is dead code on main. Reopening here in the
correct repo.

## The two signals

| Signal | What it catches | Source pattern |
|---|---|---|
| `aggregate_user.score < 0` (`low_abuse_score` CTE) | bots,
Audius-impersonators, fast-challenge-runners, low-engagement accounts |
Same CTE used at
[v1_event_comments.go:74-76](api/v1_event_comments.go#L74) |
| `muted_by_karma` CTE | hosts muted by users whose combined
`follower_count` crosses `karmaCommentCountThreshold` | Same CTE used at
[v1_event_comments.go:66-73](api/v1_event_comments.go#L66) |

Both CTEs are lifted verbatim from `v1_event_comments.go` so the filter
is byte-for-byte identical to what the comment system applies to comment
authors. Reuses the existing `karmaCommentCountThreshold` constant
(defined at
[v1_track_comment_count.go:8](api/v1_track_comment_count.go#L8)).

## Implementation

Added two CTEs at the top of the SQL, two `NOT IN` filters to the
existing `filters` slice, and bound the threshold constant:

```sql
WITH
muted_by_karma AS (
    SELECT muted_user_id
    FROM muted_users
    JOIN aggregate_user ON muted_users.user_id = aggregate_user.user_id
    WHERE muted_users.is_delete = false
    GROUP BY muted_user_id
    HAVING SUM(aggregate_user.follower_count) >= @karmaCommentCountThreshold
),
low_abuse_score AS (
    SELECT user_id FROM aggregate_user WHERE score < 0
)
SELECT ...
WHERE ...
  AND e.user_id NOT IN (SELECT user_id FROM low_abuse_score)
  AND e.user_id NOT IN (SELECT muted_user_id FROM muted_by_karma)
```

- The existing `u.is_deactivated = false` and `u.is_available = true`
filters stay in place. Shadow-ban filtering layers on top.
- The contest's parent track filter (`e.entity_type != 'track' OR ...`)
is untouched.
- The sort priority, pagination, status filter, and `entry_counts`
LATERAL subquery are untouched.
- The `users` and `tracks` related lookups downstream are unaffected —
they just see fewer rows.

## Tests

New `TestRemixContestsExcludesShadowbannedHosts` test follows the exact
pattern of `TestRemixContestsExcludesUnavailableContent` already in this
file. Seeds three contests:
- clean host (score=0, no mutes)
- low-score host (score=-1)
- karma-muted host (muted by a high-follower user crossing the
threshold)

Three sub-assertions: only the clean contest is returned; low-score
contest absent; karma-muted contest absent.

\`go build ./api/...\` and \`go vet ./api/...\` both clean locally.
Integration test couldn't be run end-to-end without a local Postgres at
port 21300, but the test compiles fine and CI will run it against a
fresh DB.

## Test plan

- [ ] CI green on `go test ./api/...`
- [ ] Manual smoke after deploy: hit `/v1/events/remix-contests` on
staging, confirm a known shadow-banned account's contest no longer
appears in the response
- [ ] Confirm `useAllRemixContests` on mobile + web still returns the
expected (non-shadowbanned) contests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant