[FEATURE REQUEST] Configure a function that acts as distribution column resolver during shard selection

This issue is an extension of https://github.com/citusdata/citus/issues/8427.

Consider a Row Level Security (RLS) setup like:

```sql
CREATE OR REPLACE FUNCTION get_tenant()
RETURNS uuid AS $$
  SELECT NULLIF(current_setting('app.tenant_id', true), '')::uuid;
$$ LANGUAGE SQL STABLE;

CREATE OR REPLACE FUNCTION set_tenant(tenant_id uuid)
RETURNS void AS $$
BEGIN
  EXECUTE format('SET LOCAL session.tenant_id = %L', tenant_id);
END;
$$ LANGUAGE plpgsql;
```

And let's take a dummy distributed table as an example:

```sql
CREATE TABLE foo_distributed (
    "id" UUID NOT NULL,
    "tenant_id" UUID NOT NULL DEFAULT get_tenant(),

    CONSTRAINT "foo_distributed_pkey" PRIMARY KEY ("id", "tenant_id")
);

SELECT create_distributed_table('foo_distributed', 'tenant_id');

ALTER TABLE foo_distributed ENABLE ROW LEVEL SECURITY;

CREATE POLICY foo_distributed_tenant_rls ON foo_distributed
TO application_rls
USING (tenant_id = get_tenant())
WITH CHECK (tenant_id = get_tenant());
```

Given our setup, the following `SELECT` query is somewhat redundant if we were just dealing with a Postgres database (no citus):

```sql
SELECT * FROM foo_distributed
WHERE tenant_id=get_tenant();
```

Instead, we could have:

```sql
SELECT * FROM foo_distributed;
```

and reap the DX benefits of RLS-based tenant isolation where tenant filtering can be implicit.

Now, enter Citus... of course as-is the latter will result in a multi-shard query as the RLS filtering is completely absent during query planning. My feature request is to allow a user to specify a _distribution column resolver_, i.e. a function, that in shard selection when no explicit distribution column is included in a query filter can be used as a fallback to derive a distribution column value for efficient routing. In my example, I would configure `get_tenant` as the distribution column resolver. It is crucial that the `VOLATILITY` of the function is preserved.

For more context on my use-case, I am migrating a legacy (few millions of lines of nodejs) application onto Citus. There is largely no tenant-awareness throughout the ORM query logic. Tenant boundaries are enforced at the peripheral via RLS. This feature would _massively_ reduce the effort required to migrate onto a Citus cluster, and I'm not the [only one inquiring](https://github.com/citusdata/citus/issues/7415) about this type of RLS-based shard selection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE REQUEST] Configure a function that acts as distribution column resolver during shard selection #8428

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE REQUEST] Configure a function that acts as distribution column resolver during shard selection #8428

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions