Switch to SPI for metadata functions #110

jasonmp85 · 2015-05-08T18:45:25Z

In #103, we discovered that using low-level operations is kind of unwieldy when the code needs to switch between different backing stores. By using SPI, we can overlay a VIEW on the CitusDB metadata catalogs and have everything "just work".

This change prepares for that by switching to SPI for our metadata functions. The functions will now respect any VIEW or TRIGGER we need to adapt to CitusDB's underlying representation.

sumedhpathak · 2015-05-12T20:27:33Z

create_shards.c

+		/* insert the shard metadata row along with its min/max values */
+		minHashTokenText = IntegerToText(shardMinHashToken);
+		maxHashTokenText = IntegerToText(shardMaxHashToken);
+		shardId = CreateShardRow(distributedTableId, shardStorageType, minHashTokenText,


We seem to have switched to associating the id column with its sequence, with an auto-increment. Any reason why?
This currently switches the order of creation of shard and shard placement. We initially created all the placements first, and then created the shard.

We also error out if we were unable to create placements. Thus creating the shard row after that was considered safe. Will the shard row we've created get rolled back if we error out on placement creation? (See line 296 for the error)

We seem to have switched to associating the id column with its sequence, with an auto-increment. Any reason why?

By using a default value, the C code can remain ignorant of the names of sequences used by CitusDB or pg_shard. This means the only place that knowledge lives will be in the SQL install script: the C code will not have any CitusDB/pg_shard detection code for metadata functionality.

We also error out if we were unable to create placements. Thus creating the shard row after that was considered safe. Will the shard row we've created get rolled back if we error out on placement creation? (See line 296 for the error)

Both the old and the new approaches are wrapped in a transaction. They aren't actually any different. We never "creat[ed] the shard row after that was considered safe"… we had an atomic transaction that resulted in other readers seeing the shard once placement rows showing up at once.

The practical reason for switching the order is because SPI enforces the foreign key constraint, so the old order is impossible (I even tried using a DEFERRED constraint, but I think each SPI command runs in its own subtransaction, so that failed as well).

FWIW I tested failures at each point of creation, both with the old and new implementation, and they exhibit identical functionality. I also tested visibility of modifications, and they were and are atomic.

Discussed at length over the past few weeks. Sticking with this new approach. (done)

sumedhpathak · 2015-05-15T21:28:19Z

pg_shard--1.1.sql

@@ -41,6 +41,12 @@ CREATE SCHEMA pgs_distribution_metadata
 	CREATE SEQUENCE shard_id_sequence MINVALUE 10000 NO CYCLE


So I understand, in the SQL file if we detect CitusDB, we'll modify the view to insert using CitusDB's sequences (instead of these) right? I wanted to make sure we aren't potentially creating collisions here.

Yeah; I'm in the process of rebasing just the schema changes from #103 on top of this branch so it will be much more clear what the behavior will be.

There obviously could be collisions when upgrading from PostgreSQL + pg_shard to CitusDB, but that's a pretty obscure edge case, IMO.

Easiest to start with, with one caveat: the SELECT statements needed to implement these functions with SPI themselves trigger pg_shard code! So in IsDistributedTable we need to short-circuit if a target table is in the pgs_distribution_metadata schema, otherwise we end up with infinite recursion. Another fun thing is that SPI runs everything within a special memory context that it manages. It provides an SPI_palloc function to reach out to the top-level context, but that only helps replace direct palloc calls, not calls to system functions that themselves use palloc. It seems to me that manually switching to another context is the cleanest way around this "feature" of SPI.

These were pretty straightforward. A test had an arbitrary failure due to ordering changing, but otherwise no problems. I did have to change the shard creation method to create the shard _before_ the placements due to foreign key constraint enforcement, but because the creation is contained within a transaction anyways, there is no difference.

The need to use partition type and column information to interpret min and max values for a shard means we would need to have annoying nesting of the SPI calls. Instead, I kept things simpler by just JOIN-ing in the values I need so they'd be in the HeapTuple.

Will make Citus integration cleaner.

Needed to add an explicit UPDATE call for this to be feasible, but otherwise identical to the change for Shard.

No longer needed. Getting away from referencing anything except views in the C code for flexibility's sake.

Clean up comments.

Unit tests FTW.

Used for primary keys, might as well say so.

Decided an Assert is more appropriate here.

Simplifies this call.

These lines were unexercised by our tests.

Getting coverage up.

Several metadata functions were missing unit tests, so I added them.

We have the code here, might as well test it.

Bad whitespace. Bad.

The codebase doesn't use this functionality, so I see little point in maintaining it.

Per code-review feedback.

Since the recursion arises only when a query references the partition table, it's clearer to check only for that case.

Per code review.

Now the target list indexes and queries live side-by-side.

@sumedhpathak

Switch to SPI for metadata functions cr: @sumedhpathak

Forgot this in #110.

jasonmp85 added the waffle: needs review label May 8, 2015

sumedhpathak reviewed May 12, 2015
View reviewed changes

jasonmp85 force-pushed the feature-metadata_uses_spi branch 2 times, most recently from c7ec962 to 128424a Compare May 15, 2015 20:40

sumedhpathak reviewed May 15, 2015
View reviewed changes

jasonmp85 added 15 commits May 21, 2015 16:52

Remove use of NextSequenceId for Shard

1825e72

Will make Citus integration cleaner.

Remove use of NextSequenceId for ShardPlacement

ef7294d

Needed to add an explicit UPDATE call for this to be feasible, but otherwise identical to the change for Shard.

Remove NextSequenceId and its callers

764669f

No longer needed. Getting away from referencing anything except views in the C code for flexibility's sake.

Documentation fixes

08c6632

Clean up comments.

Add test for placement update function

73d449c

Unit tests FTW.

Add ALTER statement to specify sequence owners

54283da

Used for primary keys, might as well say so.

Clean up style, conditionals

bb55332

Decided an Assert is more appropriate here.

Use SPI_exec for no-arg calls

7b58b23

Simplifies this call.

Test rejection of system column as partition

8d53d73

These lines were unexercised by our tests.

Add tests for metadata delete/update errors

f223f68

Getting coverage up.

Add missing unit tests

ce5f2cb

Several metadata functions were missing unit tests, so I added them.

Add test for range-partitioned shards

4c6e189

We have the code here, might as well test it.

jasonmp85 added 2 commits May 21, 2015 16:57

Simple style fixes

e9a4f9a

Bad whitespace. Bad.

Remove NULL value handling in CreateShardRow

c6149bd

The codebase doesn't use this functionality, so I see little point in maintaining it.

jasonmp85 added this to the v1.2 milestone May 21, 2015

jasonmp85 added 4 commits May 21, 2015 17:20

Add comments about SPI's MemoryContext

4e09d61

Per code-review feedback.

Narrow recursion base-case in IsDistributedTable

54cedf2

Since the recursion arises only when a query references the partition table, it's clearer to check only for that case.

Fix minor comment typo

87580e2

Per code review.

Switch to macros for more complicated queries

b22f13e

Now the target list indexes and queries live side-by-side.

jasonmp85 force-pushed the feature-metadata_uses_spi branch from 128424a to b22f13e Compare May 22, 2015 00:21

jasonmp85 added a commit that referenced this pull request May 22, 2015

Merge pull request #110 from citusdata/feature-metadata_uses_spi

5c08492

Switch to SPI for metadata functions cr: @sumedhpathak

jasonmp85 merged commit 5c08492 into develop May 22, 2015

jasonmp85 deleted the feature-metadata_uses_spi branch May 22, 2015 00:32

jasonmp85 removed the waffle: needs review label May 22, 2015

jasonmp85 added a commit that referenced this pull request May 22, 2015

Bump version to v1.2, add upgrade script

05329ba

Forgot this in #110.

This was referenced May 23, 2015

CitusDB metadata interoperability #103

Merged

Switch to signed integers for all identifiers #114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to SPI for metadata functions #110

Switch to SPI for metadata functions #110

jasonmp85 commented May 8, 2015

sumedhpathak May 12, 2015

sumedhpathak May 12, 2015

jasonmp85 May 13, 2015

jasonmp85 May 21, 2015

sumedhpathak May 15, 2015

jasonmp85 May 15, 2015

		@@ -41,6 +41,12 @@ CREATE SCHEMA pgs_distribution_metadata
		CREATE SEQUENCE shard_id_sequence MINVALUE 10000 NO CYCLE

Switch to SPI for metadata functions #110

Switch to SPI for metadata functions #110

Conversation

jasonmp85 commented May 8, 2015

sumedhpathak May 12, 2015

Choose a reason for hiding this comment

sumedhpathak May 12, 2015

Choose a reason for hiding this comment

jasonmp85 May 13, 2015

Choose a reason for hiding this comment

jasonmp85 May 21, 2015

Choose a reason for hiding this comment

sumedhpathak May 15, 2015

Choose a reason for hiding this comment

jasonmp85 May 15, 2015

Choose a reason for hiding this comment