CitusDB metadata interoperability #103

jasonmp85 · 2015-04-15T19:00:27Z

OK, so this has been entirely rebased on top of the changes from #110, i.e. we're using SPI everywhere now. This review is now all about the triggers and views needed to adapt CitusDB's metadata tables to be used within pg_shard.

Travis is building this under PostgreSQL 9.3/9.4, and CitusDB 4.0, so there is some assurance this actually works.

Code Review Tasks

Add initialization mode to assign variables based on CitusDB presence
Figure out how to test this
Transform pg_shard tables into views when Citus present (to prevent them from appearing blank), or just error out when queried.
Determine ramifications for upgrading from pg_shard to CitusDB. Users might pg_dump from a PostgreSQL pg_shard install and try to load into CitusDB. This must work seamlessly.
Determine interoperability requirements for \STAGE and using an existing CitusDB-distributed table with pg_shard
Address interoperability issues

Resolves #11.
Resolves #27.

marcocitus · 2015-04-16T12:31:29Z

I'm aware that this work in progress, but some early feedback. We were trying this out today to see if it would help make the CitusDB -> pg_shard migration path easier for CitusDB documentation purposes, since that seems to be the more common direction. We created a table using CitusDB with DISTRIBUTE BY APPEND and used this branch of pg_shard.

One problem we ran into is that when doing an INSERT in an non-existing range we got "ERROR: cannot execute INSERT on a distributed table on master node", because pg_shard ExecutorStart executor hook falls back to CitusDB for zero-shard queries. This led to some confusion.

The next problem we ran into is that pg_shard caches the metadata. Any change made to the metadata by CitusDB is not visible to pg_shard until a new session is started, this also led to some confusion. For example, performing \STAGE and then INSERT into the range would give "ERROR: no placements exist for shard with ID".

Finally, we added a new shard using \STAGE, the range of which overlapped with an existing the shard. We then tried an INSERT on the overlapping range. It went to the first shard. We then started a new session to clear the cache, after which we got "ERROR: cannot modify multiple shards during a single query". This was expected, but I guess an INSERT should always go to at most one shard.

It seems the more fundamental issue here is the caching. We might have to make changes to CitusDB to help pg_shard clear its cache or set up a trigger on the catalog table.

jasonmp85 · 2015-04-16T16:21:45Z

Thanks @marcocitus… there was no intention of pg_shard honoring DISTRIBUTE BY APPEND at the moment anyways, as it should only understand hash partitioning. I'll work to ensure it understands there are tables that are distributed but which it should not touch.

As far as caching goes… bleh. I forgot we had added caching. pg_shard doesn't expect things to change underneath it. It doesn't cache empty results, so if no shards exist you shouldn't see caching. But once shards exist we currently expect them to be long-lived (placements, on the other hand, can change state, etc.)

Do we anticipate having \STAGE work with pg_shard anytime soon? pg_shard assumes partitions are that: partitions.

ozgune · 2015-04-16T22:08:02Z

I had a quick clarification question.

With these changes, which migration scenario do we intend to handle? (a) pg_shard -> CitusDB, (b) CitusDB -> pg_shard (shard rebalancer?), or (c) CitusDB <-> pg_shard

CitusDB currently observes state changes when we \stage to a new shard (append partitioned), append to an existing shard, or rebalance a table. pg_shard observes state changes when we create shards for a new table, or fail to write to a shard placement (hash partitioned).

Two other questions -- don't know, just asking. Who has the authoritative metadata, CitusDB or pg_shard with these changes? Under what conditions does the authority change?

jasonmp85 · 2015-04-16T22:48:58Z

Because this PR actually causes pg_shard to write through to CitusDB when CitusDB is present, there is only ever one actual metadata store. When using CitusDB, CitusDB's pg_dist* tables store metadata. When using pg_shard with PostgreSQL, it uses its own tables.

That means the only possibility is stale metadata (i.e. from caching). So we'll want to address that.

jasonmp85 · 2015-04-16T22:50:22Z

@ozgune what do you mean "migration scenario"? I can see:

Using pg_shard with PostgreSQL and upgrading your underlying DB to CitusDB
Using CitusDB without pg_shard and wanting to add it for INSERTs
Using pg_shard to create new distributed tables within CitusDB and have it know how to query them

Am I missing any other possibilities?

ozgune · 2015-04-17T00:54:01Z

@jasonmp85 Yup, I was curious more about the use-cases, where we needed to sync metadata between pg_shard and CitusDB.

Using pg_shard with PostgreSQL and upgrading your underlying DB to CitusDB (pg_shard metadata tables -> CitusDB metadata tables)
Using pg_shard to create new distributed tables within CitusDB and have it know how to query them (pg_shard -> CitusDB)
Using CitusDB without pg_shard and wanting to add it for INSERTs: In this case, is the CitusDB table partitioned by append? Could we overlook this scenario by guiding CitusDB users to start with pg_shard?
Using pg_shard for Inserts, CitusDB for Selects, and shard rebalancer to rebalance the shards (pg_shard -> CitusDB, then CitusDB -> pg_shard): Is adding code to the shard rebalancer an option to sync the CitusDB metadata back to pg_shard?

I'm wonder if we can simplify the problem down to (pg_shard -> CitusDB). If we could and wanted to, could we then put a trigger on the pg_shard metadata to propagate the update to CitusDB system catalogs?

jasonmp85 · 2015-04-17T04:13:24Z

Let's stop using the word sync. There is no syncing here.

jasonmp85 · 2015-04-17T04:17:48Z

To clarify, everything is always in sync… there is no sync step. pg_shard will directly write to CitusDB metadata tables when installed within CitusDB, and the right things are in place to permit a migration from PostgreSQL + pg_shard to CitusDB using pg_dump to write out a pg_shard master and turn it into a CitusDB master.

jasonmp85 · 2015-04-17T04:21:05Z

By the way, the migration step is to write out all normal tables except the pg_shard metadata using pg_dump -N pgs_distribution_metadata. You apply that to a fresh CitusDB install that will be your master .Then you dump the pg_shard metadata with pg_dump --inserts -t 'pgs_distribution_metadata.*' and apply that to the new CitusDB master. From that point onwards all previous pg_shard tables will be seen by CitusDB, and all pg_shard and CitusDB metadata will be identical.

jasonmp85 · 2015-04-17T04:21:37Z

If you're using pg_shard within CitusDB from square one, it always remains in sync: it sees all CitusDB tables, and CitusDB sees all pg_shard metadata.

jasonmp85 · 2015-04-17T20:57:33Z

I'm still figuring out how I want to test the CitusDB-specific parts of this, but the main code is complete.

sumedhpathak · 2015-04-30T22:34:24Z

distribution_metadata.c

+ * sequence name.
+ */
+static uint64
+NextSequenceId(char *sequenceName)


Should we pass the schema name as an argument to this function?

We don't pass it into any other functions, and in the previous implementation of this function (essentially unchanged here) it was just a #define we passed into makeRangeVar or whatnot, so I don't see why it needs to be exposed: this function is for getting sequence values related to metadata (see the previous implementation, which was locked to the metadata schema), so it should encapsulate the knowledge of where they live.

Function no longer exists; no problem.

onderkalaci · 2015-06-02T12:12:01Z

pg_shard--1.1--1.2.sql

+						 partkey)
+			VALUES      (NEW.relation_id,
+						 NEW.partition_method,
+						 column_name_to_column(NEW.relation_id, NEW.key));


@jasonmp85 We need to add explicit cast at this line. My tests with pg_dump showed me that we need to do following:

column_name_to_column(NEW.relation_id::Oid, NEW.key));

The reason is that pg_dump dumps it as int (at least with no special configuration), so we see an error here.

Actually, I think the column type of the view and table should just be regclass. It will solve that.

But I'll add the Oid cast for this code review. (done)

sumedhpathak · 2015-06-05T22:37:32Z

pg_shard--1.1--1.2.sql

+							ON ( shardid = shard_placement.shard_id AND
+								 nodename = shard_placement.node_name AND
+								 nodeport = shard_placement.node_port )
+		WHERE  shardid IS NULL AND


Could we add aliases (like pg_shard_table.shardid / citusdb_table.shard_id)? Makes it a bit easier to grok.

Again, since this method does not change (apart from the deprecation warning) in this version, I'd prefer to leave it alone. In fact, there is no reason to change it at all anymore, since it's no longer needed.

jasonmp85 · 2015-06-08T21:40:08Z

Code rebased on latest develop, tests fixed, and changes from feedback made (denoted by (done) in one of my replies).

@sumedhpathak, can you see what else needs attention on this PR?

Going to need this in the view stuff.

Allows upgrades from PostgreSQL + pg_shard to CitusDB to flow smoothly.

Error out if we see a partition type we don't understand.

More obvious than checking for some GUC.

Copies over existing metadata to CitusDB catalogs, then redefines the pg_shard relations as views. Postcondition is that an upgraded install is indistinguishable from a fresh one.

Necessary due to unused parameters in SPI includes.

Clarified a function contracts, added a comment, and inserted explicit cast to oid for better `pg_dump` compatibility.

Needed since the key column isn't a direct mapping.

We abandoned pg_dump support in this release regardless, so might as well remove the cast, which was only here to accomodate pg_dump.

Because the old tables were config tables, we need to explicitly remove them from the extension's dependencies before dropping the schema. Then we can recreate the schema (using views) in a nice compound statement.

Per IWYU's suggestions.

Exercised an arcane bug wherein we'd begin a PL/pgSQL trigger without pg_shard loaded, load it partway through, and crash because we have a PL/pgSQL plugin that expects to be called during function entry and exit being called only on exit.

@sumedhpathak

CitusDB metadata interoperability cr: @sumedhpathak

jasonmp85 added the waffle: needs review label Apr 15, 2015

jasonmp85 force-pushed the feature-bidi_md_sync#27 branch 3 times, most recently from 055bd20 to 31a99da Compare April 16, 2015 03:32

jasonmp85 mentioned this pull request Apr 16, 2015

Citus treats pg_shard tables as local when missing metadata #11

Closed

sumedhpathak reviewed Apr 30, 2015
View reviewed changes

jasonmp85 mentioned this pull request May 8, 2015

Switch to SPI for metadata functions #110

Merged

jasonmp85 force-pushed the feature-bidi_md_sync#27 branch 3 times, most recently from c559161 to befe15c Compare May 25, 2015 22:36

onderkalaci reviewed Jun 2, 2015
View reviewed changes

sumedhpathak reviewed Jun 5, 2015
View reviewed changes

jasonmp85 force-pushed the feature-bidi_md_sync#27 branch from 856592f to 182e8be Compare June 8, 2015 21:38

jasonmp85 added 11 commits July 1, 2015 18:38

Add transforms from Var to column name and back

9f8f3b8

Going to need this in the view stuff.

Add triggers, views, to handle CitusDB metadata

10a87bc

Allows upgrades from PostgreSQL + pg_shard to CitusDB to flow smoothly.

Explicitly handle all partition types

bfd2905

Error out if we see a partition type we don't understand.

Detect CitusDB by checking for its catalogs

deac347

More obvious than checking for some GUC.

Add upgrade logic

563a65c

Copies over existing metadata to CitusDB catalogs, then redefines the pg_shard relations as views. Postcondition is that an upgraded install is indistinguishable from a fresh one.

Put guard around SPI include

b0b7880

Necessary due to unused parameters in SPI includes.

Code review feedback

fe74930

Clarified a function contracts, added a comment, and inserted explicit cast to oid for better `pg_dump` compatibility.

Add UPDATE trigger for partition table

1fcd6ef

Needed since the key column isn't a direct mapping.

Remove oid class needed by pg_dump

5966eb6

We abandoned pg_dump support in this release regardless, so might as well remove the cast, which was only here to accomodate pg_dump.

Update upgrade script

b7b9b03

Because the old tables were config tables, we need to explicitly remove them from the extension's dependencies before dropping the schema. Then we can recreate the schema (using views) in a nice compound statement.

Fix includes

f7213a0

Per IWYU's suggestions.

jasonmp85 force-pushed the feature-bidi_md_sync#27 branch from 182e8be to f7213a0 Compare July 2, 2015 00:53

Force preloading of extension during tests

405027e

Exercised an arcane bug wherein we'd begin a PL/pgSQL trigger without pg_shard loaded, load it partway through, and crash because we have a PL/pgSQL plugin that expects to be called during function entry and exit being called only on exit.

jasonmp85 force-pushed the feature-bidi_md_sync#27 branch from 54767fd to 405027e Compare July 2, 2015 18:17

jasonmp85 added a commit that referenced this pull request Jul 2, 2015

Merge pull request #103 from citusdata/feature-bidi_md_sync#27

63ba619

CitusDB metadata interoperability cr: @sumedhpathak

jasonmp85 merged commit 63ba619 into develop Jul 2, 2015

jasonmp85 removed the waffle: needs review label Jul 2, 2015

jasonmp85 deleted the feature-bidi_md_sync#27 branch July 2, 2015 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CitusDB metadata interoperability #103

CitusDB metadata interoperability #103

jasonmp85 commented Apr 15, 2015

marcocitus commented Apr 16, 2015

jasonmp85 commented Apr 16, 2015

ozgune commented Apr 16, 2015

jasonmp85 commented Apr 16, 2015

jasonmp85 commented Apr 16, 2015

ozgune commented Apr 17, 2015

jasonmp85 commented Apr 17, 2015

jasonmp85 commented Apr 17, 2015

jasonmp85 commented Apr 17, 2015

jasonmp85 commented Apr 17, 2015

jasonmp85 commented Apr 17, 2015

sumedhpathak Apr 30, 2015

jasonmp85 May 1, 2015

jasonmp85 May 23, 2015

onderkalaci Jun 2, 2015

jasonmp85 Jun 2, 2015

jasonmp85 Jun 8, 2015

sumedhpathak Jun 5, 2015

jasonmp85 Jun 8, 2015

jasonmp85 commented Jun 8, 2015

CitusDB metadata interoperability #103

CitusDB metadata interoperability #103

Conversation

jasonmp85 commented Apr 15, 2015

marcocitus commented Apr 16, 2015

jasonmp85 commented Apr 16, 2015

ozgune commented Apr 16, 2015

jasonmp85 commented Apr 16, 2015

jasonmp85 commented Apr 16, 2015

ozgune commented Apr 17, 2015

jasonmp85 commented Apr 17, 2015

jasonmp85 commented Apr 17, 2015

jasonmp85 commented Apr 17, 2015

jasonmp85 commented Apr 17, 2015

jasonmp85 commented Apr 17, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasonmp85 commented Jun 8, 2015