Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature issue#16 #73

Merged
merged 10 commits into from Feb 25, 2015
Merged

Feature issue#16 #73

merged 10 commits into from Feb 25, 2015

Conversation

onderkalaci
Copy link
Member

This pull request aims to prevent tables to be distributed with columns that cannot be used as distribution column.

fixes #16

Code Review Tasks

  • Remove get_attnum section and replace with call to ColumnNameToColumn
  • Rewrite GetSupportRoutine as SupportFunctionForColumn and make it take a Var *
  • Move check for default operator class into SupportFunctionForColumn
  • Update error codes, messages, and codes as specified
  • Reduce variable scope as much as possible
  • Fix SQL formatting issues in test

This commit aims to check the type of distribution column. After this
commit, pg_shard errors out if it cannot work on the distribution column on
table distribution call.
This commit adds regression test for partition column which tests
partition columns which is not eligible for distribution.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.0%) to 93.21% when pulling 20b17d6 on feature-issue#16 into 036e3a8 on develop.

BTGreaterEqualStrategyNumber);

/* if partition column does not support <= and >= operators, error out*/
if (lessEqualOperatorId == InvalidOid || greaterEqualOperatorId == InvalidOid)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find any type, which has non-InvalidOid partitionColumnOpClassId, with invalid equality operators.
But, to be on the safe side, I added this extra control.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this check only makes sense for tables that use range partitioning: for hash partitioning, the shards are always defined using 32-bit integers, and we can always find the inequality operator for that.

If using range partitioning, shard boundaries will be defined using the partition column type, so it makes sense in that case to check for inequality operators.

Can you change this section to do something like:

if using hash partitioning
  check for presence of hash operator for partition column type
else if using range partitioning
  check for existence of inequality operators for partition column type
else
  error out that partition type is unexpected

@onderkalaci onderkalaci self-assigned this Feb 6, 2015
@jasonmp85
Copy link
Collaborator

Hey @onderkalaci — I know I said I'd email @sumedhpathak and @ozgune about breaking the error checking out into its own function, but what I hadn't realized is that master_create_distributed_table is all error checking! The GitHub view cut off right after the call to InsertPartitionRow, so I didn't see that we return immediately after.

So there is no argument to support breaking error handling into its own function here: it makes sense that master_create_distributed_table is all error handling code since the logic is so simple at this point (a single function call).

Which just leaves my point about the inequality operator checks. I'll see whether there are any other stylistic things that need addressing and add a checklist at the top of this review. After those changes I think we're good to merge.

This commit add checks to find whether the data type of the partition
column has the required support routines for future select/insert/update
commands.
Add tests with unsupported column types.
This commit aims to improve style of tests.
@onderkalaci
Copy link
Member Author

Hey @jasonmp85, @sumedhpathak

Below I summerize what is included in the updated pull request:

  • A new function "GetSupportRoutine()" is added. Basically, aim of this function is to find the support routine wrt partition method.
  • For instance, this function helps us to check if the partition column has a hash function if hash partitioning is used. Similarly, check if partition column has required comparison operator if range partitioning is used. This function is compatible with citus codes.
  • Two functions changed from "static" to "extern" which are used in the pull request:
    1. distribution_metadata.c @ColumnNameToColumn(Oid relationId, char *columnName);
    2. pruneShardList.c @GetOperatorByType(Oid typeId, Oid accessMethodId, int16 strategyNumber);
  • Two checks are added to prevent table to be distributed:
    1. If partition column data type does not have a default operator class, error out (ie:json data type).
    2. Depending on the partition type, check for the existence of support procedure. For details, see below bullet item.
  • Added two tests to increase the test coverage. One for each item above. In order to add the test, I created a type, function, operator, operator class and operator family. The reason that I had to create them is that I could not find any internal data type that does not have support routine.
  • One controversial issue in the code is that, I added check for support routine for range partitioning.
    However, that part of the code is not possible to be executed because we reject range partitioning before that code is executed. So, I added a comment saying that when we support range partitioning, we should add regression test. Should I delete or handle this case in another way? Or it is OK?
  • One future issue that we need to do is to apply the composite type bug changes to pg_shard, which is on CR in citus currently. This will enable us to distribute on arrays and composite types.

@@ -50,6 +56,7 @@ static List * ParseWorkerNodeFile(char *workerNodeFilename);
static int CompareWorkerNodes(const void *leftElement, const void *rightElement);
static bool ExecuteRemoteCommand(PGconn *connection, const char *sqlCommand);
static text * IntegerToText(int32 value);
static Oid GetSupportRoutine(Oid columnOid, Oid accessMethodId, int16 supportFunctionNumber);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Routine is an unfamiliar word in this codebase. I don't see any functions in CitusDB or PostgreSQL that use this term except those related to foreign data wrappers. Most comments refer to this as a support function, and I think we should follow that pattern.

@jasonmp85
Copy link
Collaborator

Ok, so after staring at this for a while, I think I know what I'd like to see happen…

In some code you haven't modified, there's a section like:

partitionColumnId = get_attnum(distributedTableId, partitionColumnName);
if (partitionColumnId == InvalidAttrNumber)
{
    ereport(ERROR, (errmsg("could not find column: %s", partitionColumnName)));
}

This error checking is duplicated by ColumnNameToColumn, which you call later. I think the above section should be removed entirely and replaced by the call to ColumnNameToColumn (step one).

Then, GetSupportRoutine should be renamed SupportFunctionForColumn and it should be modified to accept that Var * (step two).

If the type does not have a default operator class, SupportFunctionForColumn should call ereport itself with a suitable error message (step three).

I think this could help code flow quite a bit while cutting down on verbosity and duplication.

Var* partitionColumnVar = NULL;
Oid partitionColumnTypeId = InvalidOid;
Oid partitionColumnOpClassId = InvalidOid;
Oid btreeSupportRoutine = InvalidOid;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the declaration of btreeSupportRoutine and hashSupportRoutine into the respective if/else if blocks to reduce their scope… no reason to declare them up here. And when you move them, you can combine the declaration and assignment into a one-line definition for each.

This commit aims to rename GetSupportRoutine function to
GetSupportFunction. Also, updated related variable names and comments.
*/
if (partitionColumnOpClassId == InvalidOid)
{
ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERRCODE_WRONG_OBJECT_TYPE isn't appropriate here. Let's stick with what other functions do when they get an InvalidOid here. I think the following should work:

ereport(ERROR,
        (errcode(ERRCODE_UNDEFINED_OBJECT),
         errmsg("data type %s has no default operator class for specified partition method",
                format_type_be(attrType)),
         errdetail("Partition column types must have a default operator class defined.")));

@jasonmp85
Copy link
Collaborator

Ok @onderkalaci — I've left my checklist at the top of this PR. I think the reorganization is enough that I'll want to check this out once more before a merge, but assuming it's mostly in shape by morning (MDT) I'll merge it myself if it looks good.

This commit aims to apply feedback from @jasonmp85. Basically, change
code flow so that code gets shorter and easier to read. Also, apply
style changes.
This commits includes only some whitespace changes.
@onderkalaci
Copy link
Member Author

@jasonmp85 Thanks for the review.
I think that now code looks better than previous implementation. But I couldn't understand why coverage decreased even I added tests. Is it a problem?

@jasonmp85
Copy link
Collaborator

@onderkalaci By default, Coveralls will complain if coverage decreases. We discussed the coverage of your patch (and how it couldn't be complete until we support range partitioning) and I expected coverage to decrease slightly.

If you notice, the merge button is still present, and nothing can ever stop any of us from merging from the command line. So it's just a slight inconvenience designed to make sure you are aware you lowered coverage.

In this case, it's expected, so no biggie.

HASH_AM_OID, HASHPROC);
if (hashSupportFunction == InvalidOid)
{
Oid partitionColumnTypeId = partitionColumnVar->vartype;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the point in this variable existing, so I'm going to remove it before merging (it's only referenced once).

Oid columnOid = partitionColumn->vartype;
Oid operatorClassId = GetDefaultOpClass(columnOid, accessMethodId);

/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is actually incorrect, I think… This documentation section shows that an index can be defined using a customer operator class rather than the default. So it's possible we could support types without default operator classes, so long as the user specified which custom operator class to use at the time of distribution. I'll update this comment to clarify that we just don't support anything other than using a default operator class at the moment.

Fixed some alignment issues, clarified some comment language, renamed
a variable, and moved error call for range partitioning.
@jasonmp85
Copy link
Collaborator

I've tested this on the latest CitusDB, PostgreSQL 9.3, and PostgreSQL 9.4 on OS X. Travis'll take care of Linux. After it passes I'm gonna merge.

jasonmp85 added a commit that referenced this pull request Feb 25, 2015
Detect unsupported partition key types during dist. table creation.

cr: @jasonmp85
@jasonmp85 jasonmp85 merged commit 05d1277 into develop Feb 25, 2015
@jasonmp85 jasonmp85 deleted the feature-issue#16 branch February 25, 2015 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect unsupported partition key types during dist. table creation
3 participants