Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate table type decision: distributed vs reference vs citus local vs pg local #4261

Open
onderkalaci opened this issue Oct 20, 2020 · 3 comments

Comments

@onderkalaci
Copy link
Member

With #4143, we have introduced Citus Local Table. Citus local tables are almost the same with Postgres local tables, with a few differences:

  • The data sits on the coordinator only
  • It is a first class Citus table, having an entry in the metadata. It is a single shard/single placement table
  • When the coordinator is added to the metadata, a foreign key between a citus local table and a reference table is allowed
  • On Citus MX, the citus local can be accessed from the worker nodes as well. (Note that with Citus MX, users cannot access to the postgres local tables from the workers)
  • There are very certain limitations listed Adding local tables to metadata: Limitations & TO-DO's #4145
  • Users need to explicitly call SELECT create_citus_local_table('table_name')

One of the major feedback we got is adding a new table type (a.k.a., citus local tables) would complicate the user experience. It is not a simple decision by the users on decide when to make a table reference or citus local or postgres local. To further discuss this idea, it is even not easy for users to decide on which tables to become distributed and which tables reference.

In the lights of the discussion above, we could consider few alternatives to simplify the user experience. For completeness, we might end-up doing multiple of them. In general, if we make table decision automatic, we have to implement #4190 so that users can change the table types if it is not the most optimal one.

1. Introduce CASCADE option to create_distributed_table():

Keep the user experience as close as possible to what we have today. When a user does create_distributed_table( 'table', 'dist_key', CASCADE), citus automatically converts the tables that has from/to foreign keys to it are converted to reference/distributed/citus local tables.

Pros:

  • Simple enough for starter users
  • Pro-users can still to per table basis create_distributed_table / create_reference_table / create_citus_local_table.

Cons:

  • We may not always end-up with the intended schema/table types. For example, many tables could potentially be either distributed or reference, which one should we favor?
  • We may have to block writes during these operations due to drop/re-create foreign keys on the tables
  • Not easy to implement, we might need to come up with a simple/robust algorithm

2.a Act immediately after foreign key creation: Convert either reference or citus local tables
-If a foreign key from a distributed table to local table is created, convert the local table to reference table
-If a foreign key from a reference table to local table is created, convert the local table to citus local table

Pros:

  • For the tables that are not involved in any foreign key, there will be no limitation compared to Postgres
  • Hide complexity from the user

Cons:

  • Tables involved in the foreign keys will have the limitations listed here

2.b Act immediately after foreign key creation: Convert to reference table
-If a foreign key from a distributed table to local table is created, convert the local table to reference table
-If a foreign key from a reference table to local table is created, convert the local table to reference table
Pros:

  • Eliminate the need for citus local tables

Cons:

  • Significant performance drawback when the cluster is scaled out as writes needs to go to multiple nodes and require 2PC
  • Reference tables has certain SQL limitations such as triggers or some complex transaction blocks are not supported. It means that users would not be able to fully rely on PG features if they need.

3 Hook into CREATE TABLE command, and convert every Postgres local table to Citus local table
Pros:

  • Hide complexity from the user

Cons:

  • Every table will have the limitations listed here compared to plain PG tables
  • Upgrade of the existing cluster might require manual effort / or slightly complicated upgrade script
@metdos
Copy link
Contributor

metdos commented Feb 3, 2021

@onurctirtir can we close this issue or should we keep it open for the remaining items and move to the 10.1 milestone?

@onurctirtir
Copy link
Member

onurctirtir commented Feb 3, 2021

@metdos I think we should keep it open for the remaining items as you suggested

@metdos metdos modified the milestones: 10.0 Release, 10.1 Release Feb 3, 2021
@onderkalaci onderkalaci removed this from the 10.1 Release milestone Mar 5, 2021
@onderkalaci
Copy link
Member Author

I removed 10.1 tag because we mostly resolved this issue. Lets hear some feedback from users before continuing on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants