Skip to content

[FEATURE REQUEST] Configure a function that acts as distribution column resolver during shard selection #8428

@Drew-Kimberly

Description

@Drew-Kimberly

This issue is an extension of #8427.

Consider a Row Level Security (RLS) setup like:

CREATE OR REPLACE FUNCTION get_tenant()
RETURNS uuid AS $$
  SELECT NULLIF(current_setting('app.tenant_id', true), '')::uuid;
$$ LANGUAGE SQL STABLE;

CREATE OR REPLACE FUNCTION set_tenant(tenant_id uuid)
RETURNS void AS $$
BEGIN
  EXECUTE format('SET LOCAL session.tenant_id = %L', tenant_id);
END;
$$ LANGUAGE plpgsql;

And let's take a dummy distributed table as an example:

CREATE TABLE foo_distributed (
    "id" UUID NOT NULL,
    "tenant_id" UUID NOT NULL DEFAULT get_tenant(),

    CONSTRAINT "foo_distributed_pkey" PRIMARY KEY ("id", "tenant_id")
);

SELECT create_distributed_table('foo_distributed', 'tenant_id');

ALTER TABLE foo_distributed ENABLE ROW LEVEL SECURITY;

CREATE POLICY foo_distributed_tenant_rls ON foo_distributed
TO application_rls
USING (tenant_id = get_tenant())
WITH CHECK (tenant_id = get_tenant());

Given our setup, the following SELECT query is somewhat redundant if we were just dealing with a Postgres database (no citus):

SELECT * FROM foo_distributed
WHERE tenant_id=get_tenant();

Instead, we could have:

SELECT * FROM foo_distributed;

and reap the DX benefits of RLS-based tenant isolation where tenant filtering can be implicit.

Now, enter Citus... of course as-is the latter will result in a multi-shard query as the RLS filtering is completely absent during query planning. My feature request is to allow a user to specify a distribution column resolver, i.e. a function, that in shard selection when no explicit distribution column is included in a query filter can be used as a fallback to derive a distribution column value for efficient routing. In my example, I would configure get_tenant as the distribution column resolver. It is crucial that the VOLATILITY of the function is preserved.

For more context on my use-case, I am migrating a legacy (few millions of lines of nodejs) application onto Citus. There is largely no tenant-awareness throughout the ORM query logic. Tenant boundaries are enforced at the peripheral via RLS. This feature would massively reduce the effort required to migrate onto a Citus cluster, and I'm not the only one inquiring about this type of RLS-based shard selection.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RLSissues related to Row level securitybacklogenhancementto track requests for existing functionality

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions