Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support foreign tables in MX #5461

Merged
merged 62 commits into from Jan 6, 2022
Merged

Support foreign tables in MX #5461

merged 62 commits into from Jan 6, 2022

Conversation

agedemenli
Copy link
Contributor

@agedemenli agedemenli commented Nov 11, 2021

DESCRIPTION: Adds support for foreign tables in MX

This PR allows to add foreign tables to Citus metadata (see #4143). Foreign tables are also propagated to the MX workers and can be accessed on them.

Using foreign reference tables would be more efficient, especially when doing JOINs with distributed tables, as reference tables does not require fetching data, where local tables added to metadata requires the data to be fetched to the workers. However, we preferred to support foreign tables with local tables added to Citus metadata, instead of reference tables, because reference tables would require USER MAPPING propagation, and also it requires a specific logic to handle modifications on the foreign table. In other words, reference tables send the write commands to all it's replicas, that would duplicate all the modifications.

Foreign servers are already gets propagated to the workers: #5468
User mappings are not propagated, since the coordinator (hence the shard) would already have access to it.

It's also allowed to create foreign tables as partitions, and add them to metadata.

We now check if the table_name option is provided with the foreign table or not, and error out if it's not provided and the foreign table depends on postgres_fdw. It's because Postgres uses the name of the shard when querying the fdw, not the name of the table.

The GUC citus.use_citus_managed_tables adds newly created foreign tables to metadata when it's set to on. (See: #5440)
One limitation we have is: the GUC does not work as expected for IMPORT FOREIGN SCHEMA.
Supporting IMPORT FOREIGN SCHEMA with the new GUC is one of our future items to do.
(See #5603)

Truncating foreign tables that are added to metadata is unsupported. We should consider automatically undistribute them & truncate & add to metadata again.
(See #5588)

Another future item to do is that we are considering disallowing creating distributed/reference tables from foreign tables.
(See #5604)

fixes: #5299

@codecov
Copy link

codecov bot commented Nov 11, 2021

Codecov Report

Merging #5461 (20d8b04) into master (5305aa4) will decrease coverage by 0.03%.
The diff coverage is 96.72%.

@@            Coverage Diff             @@
##           master    #5461      +/-   ##
==========================================
- Coverage   92.69%   92.66%   -0.04%     
==========================================
  Files         219      219              
  Lines       46006    45996      -10     
==========================================
- Hits        42646    42621      -25     
- Misses       3360     3375      +15     

@agedemenli
Copy link
Contributor Author

agedemenli commented Nov 11, 2021

SELECT citus_add_node('localhost',9700,groupid=>0);

CREATE EXTENSION postgres_fdw;

CREATE SERVER foreign_server
        FOREIGN DATA WRAPPER postgres_fdw
        OPTIONS (host 'localhost', port '9700', dbname 'ahmet');

CREATE FOREIGN TABLE foreign_table (
        id integer NOT NULL,
        data text
)
        SERVER foreign_server
        OPTIONS (schema_name 'some_schema', table_name 'some_table');

SELECT citus_add_local_table_to_metadata('foreign_table');

SELECT undistribute_table('foreign_table');

CREATE EXTENSION postgres_fdw;
CREATE SERVER foreign_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host '192.83.123.89', port '5432', dbname 'foreign_db');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we connect back to self/localhost and add some basic queries/DDLs?

@agedemenli
Copy link
Contributor Author

Fixed test failure issue.
Now we can successfully create foreign tables with servers, and use them on the worker node.
We create shard foreign tables and shard foreign servers on the coordinator for that.
Like:

ahmet@ahmet:9700-6435=# \d
                      List of relations
┌────────┬───────────────────────────┬───────────────┬───────┐
│ Schema │           Name            │     Type      │ Owner │
├────────┼───────────────────────────┼───────────────┼───────┤
│ public │ foreign_table             │ foreign table │ ahmet │
│ public │ foreign_table_102009      │ foreign table │ ahmet │
└────────┴───────────────────────────┴───────────────┴───────┘

ahmet@ahmet:9700-6435=# \des
                List of foreign servers
┌───────────────────────┬───────┬──────────────────────┐
│         Name          │ Owner │ Foreign-data wrapper │
├───────────────────────┼───────┼──────────────────────┤
│ foreign_server        │ ahmet │ postgres_fdw         │
│ foreign_server_102009 │ ahmet │ postgres_fdw         │
└───────────────────────┴───────┴──────────────────────┘

The problem is, when we do undistribute_table, we still have shard foreign server. Plus, we have an undesired intermediate server, like:

ahmet@ahmet:9700-6435=# \des
                  List of foreign servers
┌───────────────────────────┬───────┬──────────────────────┐
│           Name            │ Owner │ Foreign-data wrapper │
├───────────────────────────┼───────┼──────────────────────┤
│ foreign_server            │ ahmet │ postgres_fdw         │
│ foreign_server_102009     │ ahmet │ postgres_fdw         │
│ foreign_server_2537643067 │ ahmet │ postgres_fdw         │
└───────────────────────────┴───────┴──────────────────────┘

I'm currently trying to get rid of that.
But the main question is: Should we undistribute the foreign server as well?

@marcocitus
Copy link
Member

marcocitus commented Nov 15, 2021

But the main question is: Should we undistribute the foreign server as well?

I think ideally we would treat the server as an object that gets propagated to worker nodes, similar to types. The renaming should not be necessary then. On the coordinator it would just be a noop.

StringInfo renameCommand = makeStringInfo();

appendStringInfo(renameCommand, "ALTER SERVER %s RENAME TO %s;",
quotedServerName, serverNameWithShardId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serverNameWithShardId should probably be wrapped in quote_identifier

@agedemenli agedemenli force-pushed the foreign-tables-in-mx branch 2 times, most recently from 0c09359 to 8156ec6 Compare December 24, 2021 11:33
@agedemenli agedemenli marked this pull request as ready for review December 24, 2021 12:22
@@ -114,7 +114,7 @@ worker_drop_distributed_table(PG_FUNCTION_ARGS)
add_exact_object_address(&foreignServerObject, objects);

/* drop both the table and the server */
performMultipleDeletions(objects, DROP_RESTRICT,
performMultipleDeletions(objects, DROP_CASCADE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already drop the server and the table by adding two objects. Why do we need to change DROP_CASCADE?

@@ -724,5 +724,72 @@ $$);
(localhost,57638,t,0)
(2 rows)

-- test adding foreign table to metadata with the guc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial observation: ALTER FOREIGN TABLE commands does not seem to work, maybe you can work on that while I continue to my review.
In general, lets make sure that all here works with our tables: https://www.postgresql.org/docs/14/sql-alterforeigntable.html

-- !!!!does not send to the command workers!!!!!
-- also the query fails
CREATE SCHEMA sc10;
alter foreign table foreign_table set schema sc10;
SELECT count(*) FROM sc10.foreign_table ;
DETAIL:  on server onderkalaci@localhost:5432 connectionId: 8
ERROR:  relation "sc10.foreign_table_102107" does not exist
CONTEXT:  while executing command on localhost:5432

CREATE SERVER foreign_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'localhost', port :'master_port', dbname 'regression');
CREATE USER MAPPING FOR CURRENT_USER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets also show that ALTER USER MAPPING works fine:

alter user MAPPING FOR onderkalaci SERVER foreign_server OPTIONS (SET user 'nonexistiniguser');
-- query the table and fails
alter user MAPPING FOR onderkalaci SERVER foreign_server OPTIONS (SET user 'postgres');

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For enterprise, we should propagate the following two, but lets do that later:

 GRANT USAGE ON FOREIGN SERVER foreign_server TO non_superuser;

In general, we should make sure that the flow works fine for non-superusers.

@@ -724,5 +724,72 @@ $$);
(localhost,57638,t,0)
(2 rows)

-- test adding foreign table to metadata with the guc
SET citus.use_citus_managed_tables TO ON;
CREATE TABLE foreign_table_test (id integer NOT NULL, data text);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like foreign tables could have serial/bigserial/sequence.nextval columns. Can you make sure undistribute/add node works fine with those?

* Returns false otherwise.
*/
bool
IsForeignTable(Oid relationId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One another question I have is regarding this: https://www.postgresql.org/docs/14/sql-importforeignschema.html

How would IMPORT FOREIGN SCHEMA work with Citus when SET citus.use_citus_managed_tables TO ON;?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping on this, it seems like a common pattern

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we decided that we'll do this later on with a separate PR

@agedemenli agedemenli force-pushed the foreign-tables-in-mx branch 2 times, most recently from 2adc6f5 to 7338a9c Compare December 29, 2021 13:34
@agedemenli
Copy link
Contributor Author

self-note to fix:
undistribute table drops servers on workers

@agedemenli
Copy link
Contributor Author

ALTER FOREIGN TABLE OPTIONS
ALTER FOREIGN TABLE COLUMN SET STATISTICS

The first one is a must.
We can consider omitting the latter


DDLJob *ddlJob = palloc0(sizeof(DDLJob));
ddlJob->targetRelationId = relationId;
ddlJob->metadataSyncCommand = sql;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we typically set ddlJob->metadataSyncCommand = queryString. But, it is a probably an old habit where we didn't have DeparseTreeNode for all the commands.

Maybe we should even change for all commands to use the return value of DeparseTreeNode.

Anyway, just realized and wanted to note.

{
CreateForeignTableStmt *createForeignTableStmt =
(CreateForeignTableStmt *) parsetree;
CreateStmt *createTableStmt = (CreateStmt *) &(createForeignTableStmt->base);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the following seems slightly easier to read?

CreateStmt *createTableStmt = (CreateStmt *) (&createForeignTableStmt->base);

src/test/regress/sql/citus_local_tables_mx.sql Outdated Show resolved Hide resolved
CREATE FOREIGN TABLE foreign_partition_2 PARTITION OF parent_for_foreign_tables FOR VALUES WITH (modulus 3, remainder 1) SERVER srv2;
SET citus.use_citus_managed_tables TO OFF;
CREATE FOREIGN TABLE foreign_partition_3 PARTITION OF parent_for_foreign_tables FOR VALUES WITH (modulus 3, remainder 2) SERVER srv3;
SELECT citus_add_local_table_to_metadata('foreign_partition_3');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly meant we use citus_add_local_table_to_metadata on the parent table, with probably another set of tables. I want to make sure the "cascade" logic works fine

ALTER FOREIGN TABLE public.foreign_table_newname ALTER dummy_col DROP DEFAULT;
ALTER FOREIGN TABLE public.foreign_table_newname ALTER dummy_col SET DEFAULT 2;
ALTER FOREIGN TABLE public.foreign_table_newname ALTER dummy_col TYPE int;
ALTER TABLE foreign_table_test RENAME COLUMN id TO id_test;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, how does this work? We sometimes use ALTER TABLE and sometime ALTER FOREIGN table.

Is that a PG feature? If so, why we needed to implement ForeignTable_AlterObjectSchema? Could that be a minor issue on alter table .. set schema implementation of Citus?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it comes down to the places where OBJECT_FOREIGN_TABLE and OBJECT_TABLE is used differently in PG code. For example, rename table simply goes through the same code: https://github.com/postgres/postgres/blob/5a2832465fd8984d089e8c44c094e6900d987fcd/src/backend/commands/alter.c#L351-L357

When they go through the same code path, already working commands should work for foreign tables as well on Citus (maybe only adding FOREIGN to the command that Citus generates)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or RELKIND_FOREIGN_TABLE vs RELKIND_RELATION)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are still ALTER TABLE commands here. Can you make sure we convert all to ALTER FOREIGN TABLE?

Copy link
Member

@onderkalaci onderkalaci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is ready to go.

Few remaining things:

  1. enterprise non-superuser tests
  2. drop support for distributed/reference foreign tables on the community

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-consider the foreign table implementation via Citus Local Tables or Reference tables
3 participants