Skip to content

sql/delegate: show changefeed jobs apply db-level filters#156873

Open
log-head wants to merge 1 commit intocockroachdb:masterfrom
log-head:log-head/db-level-changefeed-filter-show
Open

sql/delegate: show changefeed jobs apply db-level filters#156873
log-head wants to merge 1 commit intocockroachdb:masterfrom
log-head:log-head/db-level-changefeed-filter-show

Conversation

@log-head
Copy link
Contributor

@log-head log-head commented Nov 4, 2025

SHOW CHANGEFEED JOBS did not filter the tables in full_table_names with INCLUDE/EXCLUDE TABLES (e.g. CREATE CHANGEFEED FOR DATABASE movr INCLUDE TABLES users). Because of this, it did not accurately reflect the tables being watched by the db-level changefeed.

This change filters the table list, either including only specified tables or else excluding specified tables.

Epic: CRDB-1421
Fixes: #156798

Release note (sql change): For DB-level changefeeds with a table filter (e.g. CREATE CHANGEFEED FOR DATABASE movr INCLUDE TABLES users), apply this filter to the results of SHOW CHANGEFEED JOBS full_table_names.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@log-head log-head self-assigned this Nov 4, 2025
@log-head log-head marked this pull request as ready for review November 4, 2025 18:28
@log-head log-head requested review from a team as code owners November 4, 2025 18:28
@log-head log-head requested review from aerfrei and kev-cao and removed request for a team November 4, 2025 18:28
FROM unnest(targets.names) AS x
WHERE NOT (x = ANY(COALESCE(jt.filter_tables, ARRAY[]::STRING[])))
)
END AS full_table_names,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pointing out what we discussed sync. Because our source of watched tables is this sql query, it's possible in the event of a bug with the table watcher or something that this query shows what tables we should be watching, even if we're not actually watching them.

Maybe more concretely: Let's say we have a db level feed that includes table foo and it's lagging. Then we rename that table. We are in some sense continuing to emit events for it until it catches up, but we will immediately stop showing it in the list of watched tables. Maybe that's expected.

Maybe we can put some kind of note somewhere to point out some of this nuance. Seems like the most important thing for us to remember, if we're ever debugging a db-level feed issue with what set of tables is being watched is not to use this query to show us what tables are actually being watched.

) AS sink_uri,
targets.names AS full_table_names,
CASE
WHEN COALESCE((jt.filter_type::string)::INT, 0) = 1 THEN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have comments or some kind of named expression that tells us that 0 corresponds to include and 1 to exclude?

ft.filter_type,
array_agg(k) AS filter_tables
FROM filter_targets ft
CROSS JOIN LATERAL (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cross join takes a cartesian product which could theoretically be expensive. When we spoke you said you tested the performance of this and it didn't seem to make an impact.

WHEN COALESCE((jt.filter_type::string)::INT, 0) = 1 THEN
(
SELECT COALESCE(array_agg(x), ARRAY[]::STRING[])
FROM unnest(targets.names) AS x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case "targets.names" is actually the list of all tables in the database, is that right? If that's right, we should probably name it to express that.

(sd.filter_list::json)->'tables' AS tables
FROM spec_details sd
),
joined_targets AS (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"joined_targets" is a somewhat vague name that doesn't help so much with interpreting what's going on. It would be helpful to choose a more descriptive name and/or leave a comment.

SHOW CHANGEFEED JOBS did not filter the tables in full_table_names with
INCLUDE/EXCLUDE TABLES (e.g. CREATE CHANGEFEED FOR DATABASE movr INCLUDE
TABLES users). Because of this, it did not accurately reflect the tables
being watched by the db-level changefeed.

This change filters the table list, either including only specified
tables or else excluding specified tables.

Epic: CRDB-1421
Fixes: cockroachdb#156798

Release note (sql change): For DB-level changefeeds with a table filter (e.g.
CREATE CHANGEFEED FOR DATABASE movr INCLUDE TABLES users), apply this filter to
the results of SHOW CHANGEFEED JOBS full_table_names.
@log-head log-head force-pushed the log-head/db-level-changefeed-filter-show branch from 54412a3 to 02a665a Compare November 5, 2025 19:29
@log-head log-head requested a review from aerfrei November 5, 2025 20:07
Copy link
Contributor

@aerfrei aerfrei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending we discuss through the issue of this potentially being out of sync with the actual watched tables.

@log-head log-head requested a review from asg0451 November 6, 2025 16:06
Copy link
Contributor

@asg0451 asg0451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is very hairy sql and question whether we shouldnt just make separate queries and reuse the filtering logic by turning this into a non-delegate thing

Comment on lines +85 to +104
filter_targets_map AS (
SELECT
sd.id,
sd.filter_specification::json AS filter_specification_json,
(sd.filter_specification::json)->'filter_type' AS filter_type,
(sd.filter_specification::json)->'tables' AS filter_tables_map
FROM spec_details sd
),
filter_targets_array AS (
SELECT
ftm.id,
ftm.filter_type,
array_agg(k) AS filter_tables
FROM filter_targets_map ftm
CROSS JOIN LATERAL (
SELECT key
FROM json_object_keys(ftm.filter_tables_map) AS key
WHERE json_typeof(ftm.filter_tables_map) = 'object'
) AS j(k)
GROUP BY ftm.id, ftm.filter_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some more comments about what these intermediate sections do? i'm finding it hard to follow

targets.names AS full_table_names,
CASE
-- Include filter is enumerated to 1.
WHEN COALESCE((fta.filter_type::string)::INT, 0) = 1 THEN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why int->string->int?

targets.names AS full_table_names,
CASE
-- Include filter is enumerated to 1.
WHEN COALESCE((fta.filter_type::string)::INT, 0) = 1 THEN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you not hardcode 0 and 1 but interpolate from the enum values?

Comment on lines +127 to +130
SELECT COALESCE(array_agg(x), ARRAY[]::STRING[])
-- targets.names is the list, before filtering, of fully qualified table names.
FROM unnest(targets.names) AS x
WHERE x = ANY(COALESCE(fta.filter_tables, ARRAY[]::STRING[]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm having a hard time understanding these bits too

@msbutler
Copy link
Collaborator

msbutler commented Nov 7, 2025

i have a higher level UX question: why should SHOW CHANGEFEED JOBS show all the little details related to an individual changefeed? Could SHOW CHANGEFEED JOB, without the s, be a more appropriate place? I see a couple benefits here:

  • With the ladder, you have much more flexibility; for example, you could return one row per table in the changefeed.
  • SHOW cmds should not have a wide output, as they are often used in an ad hoc manner. Returning a full list of tables in a single row could get veeery wide.

@log-head
Copy link
Contributor Author

I agree that this query has become too hard to follow. Tried a few different ways to simplify but the intermediate tables remain hairy. Opened an issue to implement as an opaque statement #157764.

I see the point about not adding too many fields to SHOW CHANGEFEED JOBS. The added WATCHED_TABLES option hopes to prevent this. SHOW CHANGEFEED JOBS shouldn't be too wide by itself. Then, when users want the potentially wide list of tables for a db-level changefeed, they can follow that up with a SHOW CHANGEFEED JOB {job_id} WITH WATCHED_TABLES.

@msbutler
Copy link
Collaborator

msbutler commented Dec 8, 2025

is it worth adding a case to pkg/bench/rttanalysis BenchmarkJobs for show changefeed jobs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SHOW CHANGEFEED JOBS for DB-level feeds doesn't respect filters

5 participants