Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CitusDB metadata interoperability #103

Merged
merged 12 commits into from
Jul 2, 2015
Merged

Conversation

jasonmp85
Copy link
Collaborator

OK, so this has been entirely rebased on top of the changes from #110, i.e. we're using SPI everywhere now. This review is now all about the triggers and views needed to adapt CitusDB's metadata tables to be used within pg_shard.

Travis is building this under PostgreSQL 9.3/9.4, and CitusDB 4.0, so there is some assurance this actually works.

Code Review Tasks

  • Add initialization mode to assign variables based on CitusDB presence
  • Figure out how to test this
  • Transform pg_shard tables into views when Citus present (to prevent them from appearing blank), or just error out when queried.
  • Determine ramifications for upgrading from pg_shard to CitusDB. Users might pg_dump from a PostgreSQL pg_shard install and try to load into CitusDB. This must work seamlessly.
  • Determine interoperability requirements for \STAGE and using an existing CitusDB-distributed table with pg_shard
  • Address interoperability issues

Resolves #11.
Resolves #27.

@marcocitus
Copy link
Member

I'm aware that this work in progress, but some early feedback. We were trying this out today to see if it would help make the CitusDB -> pg_shard migration path easier for CitusDB documentation purposes, since that seems to be the more common direction. We created a table using CitusDB with DISTRIBUTE BY APPEND and used this branch of pg_shard.

One problem we ran into is that when doing an INSERT in an non-existing range we got "ERROR: cannot execute INSERT on a distributed table on master node", because pg_shard ExecutorStart executor hook falls back to CitusDB for zero-shard queries. This led to some confusion.

The next problem we ran into is that pg_shard caches the metadata. Any change made to the metadata by CitusDB is not visible to pg_shard until a new session is started, this also led to some confusion. For example, performing \STAGE and then INSERT into the range would give "ERROR: no placements exist for shard with ID".

Finally, we added a new shard using \STAGE, the range of which overlapped with an existing the shard. We then tried an INSERT on the overlapping range. It went to the first shard. We then started a new session to clear the cache, after which we got "ERROR: cannot modify multiple shards during a single query". This was expected, but I guess an INSERT should always go to at most one shard.

It seems the more fundamental issue here is the caching. We might have to make changes to CitusDB to help pg_shard clear its cache or set up a trigger on the catalog table.

@jasonmp85
Copy link
Collaborator Author

Thanks @marcocitus… there was no intention of pg_shard honoring DISTRIBUTE BY APPEND at the moment anyways, as it should only understand hash partitioning. I'll work to ensure it understands there are tables that are distributed but which it should not touch.

As far as caching goes… bleh. I forgot we had added caching. pg_shard doesn't expect things to change underneath it. It doesn't cache empty results, so if no shards exist you shouldn't see caching. But once shards exist we currently expect them to be long-lived (placements, on the other hand, can change state, etc.)

Do we anticipate having \STAGE work with pg_shard anytime soon? pg_shard assumes partitions are that: partitions.

@ozgune
Copy link
Contributor

ozgune commented Apr 16, 2015

I had a quick clarification question.

With these changes, which migration scenario do we intend to handle? (a) pg_shard -> CitusDB, (b) CitusDB -> pg_shard (shard rebalancer?), or (c) CitusDB <-> pg_shard

CitusDB currently observes state changes when we \stage to a new shard (append partitioned), append to an existing shard, or rebalance a table. pg_shard observes state changes when we create shards for a new table, or fail to write to a shard placement (hash partitioned).

Two other questions -- don't know, just asking. Who has the authoritative metadata, CitusDB or pg_shard with these changes? Under what conditions does the authority change?

@jasonmp85
Copy link
Collaborator Author

Because this PR actually causes pg_shard to write through to CitusDB when CitusDB is present, there is only ever one actual metadata store. When using CitusDB, CitusDB's pg_dist* tables store metadata. When using pg_shard with PostgreSQL, it uses its own tables.

That means the only possibility is stale metadata (i.e. from caching). So we'll want to address that.

@jasonmp85
Copy link
Collaborator Author

@ozgune what do you mean "migration scenario"? I can see:

  • Using pg_shard with PostgreSQL and upgrading your underlying DB to CitusDB
  • Using CitusDB without pg_shard and wanting to add it for INSERTs
  • Using pg_shard to create new distributed tables within CitusDB and have it know how to query them

Am I missing any other possibilities?

@ozgune
Copy link
Contributor

ozgune commented Apr 17, 2015

@jasonmp85 Yup, I was curious more about the use-cases, where we needed to sync metadata between pg_shard and CitusDB.

  1. Using pg_shard with PostgreSQL and upgrading your underlying DB to CitusDB (pg_shard metadata tables -> CitusDB metadata tables)
  2. Using pg_shard to create new distributed tables within CitusDB and have it know how to query them (pg_shard -> CitusDB)
  3. Using CitusDB without pg_shard and wanting to add it for INSERTs: In this case, is the CitusDB table partitioned by append? Could we overlook this scenario by guiding CitusDB users to start with pg_shard?
  4. Using pg_shard for Inserts, CitusDB for Selects, and shard rebalancer to rebalance the shards (pg_shard -> CitusDB, then CitusDB -> pg_shard): Is adding code to the shard rebalancer an option to sync the CitusDB metadata back to pg_shard?

I'm wonder if we can simplify the problem down to (pg_shard -> CitusDB). If we could and wanted to, could we then put a trigger on the pg_shard metadata to propagate the update to CitusDB system catalogs?

@jasonmp85
Copy link
Collaborator Author

Let's stop using the word sync. There is no syncing here.

@jasonmp85
Copy link
Collaborator Author

To clarify, everything is always in sync… there is no sync step. pg_shard will directly write to CitusDB metadata tables when installed within CitusDB, and the right things are in place to permit a migration from PostgreSQL + pg_shard to CitusDB using pg_dump to write out a pg_shard master and turn it into a CitusDB master.

@jasonmp85
Copy link
Collaborator Author

By the way, the migration step is to write out all normal tables except the pg_shard metadata using pg_dump -N pgs_distribution_metadata. You apply that to a fresh CitusDB install that will be your master .Then you dump the pg_shard metadata with pg_dump --inserts -t 'pgs_distribution_metadata.*' and apply that to the new CitusDB master. From that point onwards all previous pg_shard tables will be seen by CitusDB, and all pg_shard and CitusDB metadata will be identical.

@jasonmp85
Copy link
Collaborator Author

If you're using pg_shard within CitusDB from square one, it always remains in sync: it sees all CitusDB tables, and CitusDB sees all pg_shard metadata.

@jasonmp85
Copy link
Collaborator Author

I'm still figuring out how I want to test the CitusDB-specific parts of this, but the main code is complete.

* sequence name.
*/
static uint64
NextSequenceId(char *sequenceName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pass the schema name as an argument to this function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't pass it into any other functions, and in the previous implementation of this function (essentially unchanged here) it was just a #define we passed into makeRangeVar or whatnot, so I don't see why it needs to be exposed: this function is for getting sequence values related to metadata (see the previous implementation, which was locked to the metadata schema), so it should encapsulate the knowledge of where they live.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function no longer exists; no problem.

@jasonmp85 jasonmp85 force-pushed the feature-bidi_md_sync#27 branch 3 times, most recently from c559161 to befe15c Compare May 25, 2015 22:36
partkey)
VALUES (NEW.relation_id,
NEW.partition_method,
column_name_to_column(NEW.relation_id, NEW.key));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasonmp85 We need to add explicit cast at this line. My tests with pg_dump showed me that we need to do following:

column_name_to_column(NEW.relation_id::Oid, NEW.key));

The reason is that pg_dump dumps it as int (at least with no special configuration), so we see an error here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the column type of the view and table should just be regclass. It will solve that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'll add the Oid cast for this code review. (done)

ON ( shardid = shard_placement.shard_id AND
nodename = shard_placement.node_name AND
nodeport = shard_placement.node_port )
WHERE shardid IS NULL AND
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add aliases (like pg_shard_table.shardid / citusdb_table.shard_id)? Makes it a bit easier to grok.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, since this method does not change (apart from the deprecation warning) in this version, I'd prefer to leave it alone. In fact, there is no reason to change it at all anymore, since it's no longer needed.

@jasonmp85
Copy link
Collaborator Author

Code rebased on latest develop, tests fixed, and changes from feedback made (denoted by (done) in one of my replies).

@sumedhpathak, can you see what else needs attention on this PR?

Going to need this in the view stuff.
Allows upgrades from PostgreSQL + pg_shard to CitusDB to flow smoothly.
Error out if we see a partition type we don't understand.
More obvious than checking for some GUC.
Copies over existing metadata to CitusDB catalogs, then redefines the
pg_shard relations as views. Postcondition is that an upgraded install
is indistinguishable from a fresh one.
Necessary due to unused parameters in SPI includes.
Clarified a function contracts, added a comment, and inserted explicit
cast to oid for better `pg_dump` compatibility.
Needed since the key column isn't a direct mapping.
We abandoned pg_dump support in this release regardless, so might as
well remove the cast, which was only here to accomodate pg_dump.
Because the old tables were config tables, we need to explicitly remove
them from the extension's dependencies before dropping the schema. Then
we can recreate the schema (using views) in a nice compound statement.
Per IWYU's suggestions.
Exercised an arcane bug wherein we'd begin a PL/pgSQL trigger without
pg_shard loaded, load it partway through, and crash because we have a
PL/pgSQL plugin that expects to be called during function entry and
exit being called only on exit.
jasonmp85 added a commit that referenced this pull request Jul 2, 2015
@jasonmp85 jasonmp85 merged commit 63ba619 into develop Jul 2, 2015
@jasonmp85 jasonmp85 deleted the feature-bidi_md_sync#27 branch July 2, 2015 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plan bidirectional metadata sync Citus treats pg_shard tables as local when missing metadata
5 participants