Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading error for incompletely created tables #38

Merged
merged 5 commits into from Jan 9, 2015

Conversation

onderkalaci
Copy link
Member

Before this fix, if you try to execute any query on tables which are distributed with master_create_distributed_table but no shards are created yet for the table (i.e. master_create_worker_shards not called), you get unclear error message. This fix catches that case implicitly and more meaningful message is shown.

fixes #9

Review tasks:

  • Add more specific error code (ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE)
  • Change message, detail, and hint per my suggestions
  • Move NIL check into DistributedQueryShardList itself
  • Update DistributedQueryShardList's function comment to reflect its new "return non-empty list or error" behavior
  • Ensure tests all pass

@onderkalaci
Copy link
Member Author

@jasonmp85 I also believe that the error message should be much more clear.

I created this pull request so that we may chat about the solution that I suggest.

Handling #9 on the planner is a reasonable solution, right? Could you please check my comments on the commit?

Another solution that may solve the issue might be to check the length of queryShardList in the planner phase. However, the solution that I sent makes code more readable I think.

@jasonmp85
Copy link
Collaborator

I chatted with @sumedhpathak about this and we thought this pull request could have drastically fewer lines of code if we put the check for an empty shard in an existing function rather than adding new functions.

The natural place would be LoadShardIntervalList, except that master_create_worker_shards has logic that expects it to return NIL (i.e. we can't make it raise an error instead of returning NIL because master_create_worker_shards expects the NIL when checking whether shards already exist for a table).

That leaves DistributedQueryShardList. Can you just change that function to check the return value of LookupShardIntervalList and error if it's NIL (also update the comment on the function to reflect this new behavior)?

That change would be far fewer lines of code (probably no more than five?) and would accomplish the same end result. You should also be sure to add a test for the new behavior (add a query in queries.sql after the distribution call has been made but before the shards have been created).

@jasonmp85
Copy link
Collaborator

Two things:

  • You commented on the commit itself, which lives outside the pull request. For pull requests we usually converse in two place: here (i.e. the top-level comments one can make in a Pull Request's "Conversation" tab) and on lines in the Files Changed tab of the Pull Request. So in the future, just use the Files Changed tab to make line-specific Pull Request comments
  • I think this change would be much simpler if you simply added a check and error to DistributedQueryShardList. Can you do that?

Before this fix, if you try to execute any query on tables which
are distributed with master_create_distributed_table but no shards
are created yet for the table (ie master_create_worker_shards not
called), you get unclear error message. This fix catches that case
implicitly and more meaningful message is shown.
This commit aims to add a unit test for executing queries on
distributed tables. Test aims to get the error message when
there are no shards created for the distributed tables.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling 6e690cb on feature-issue#9 into 46bcf2c on develop.

@onderkalaci
Copy link
Member Author

With this implementation, we couldn't specify the name of the relation for which shards are not created in the ereport (Well, sure we can specify but code gets longer and become complicated). Thus, I had to use "the distributed table" phrase instead of its name in the ereport. Also, I think we are sure that there is only a single table, because we error out in the "ErrorIfQueryNotSupported" function if there are more than one relations involved in the query.

queryShardList = DistributedQueryShardList(distributedQuery);
if (queryShardList == NIL)
{
ereport(ERROR, (errmsg("cannot plan SELECT query"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message could be generated by an INSERT, UPDATE, or DELETE as well…

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the style guide says this should be in past tense: could not plan query.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE here: we have an object (the distributed table) which hasn't been initialized yet (by creating shards). This code is used in PostgreSQL for other analogous situations, such as not calling nextval for a sequence before asking it what its current value is (grep for it).

@jasonmp85
Copy link
Collaborator

I was envisioning that the check would actually live in DistributedQueryShardList rather than after the call to it. Right now that method always returns a list, which can be the empty list (NIL). I'm proposing we change it to always return a non-empty list or error.

Getting the relation name would also be easy because within that method the distributedTableId is in scope. It would just need a block like this:

List *prunedShardList = PruneShardList(distributedTableId, restrictClauseList, shardIntervalList);

if (prunedShardList == NIL)
{
    char *relname = get_rel_name(distributedTableId);

    ereport(ERROR, (errmsg("... %s ...", relname)));
}

return prunedShardList;

As for the parts of the error message:

  • errcode — we should use ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE
  • errmsg — should read could not find any shards for query
  • errdetail — should read No shards exist for distributed table "%s".
  • errhint — Don't need the quotation marks around the method name: Run master_create_worker_shards to create shards and try again.

My suggestions for the above are based on my reading of the style guide. I think we should avoid using quotes unless they contain dynamic strings that could contain space-separated words. If we know ahead of time that a function name looks like a function name, the quotes are "unnecessary" in the words of the style guide. Additionally, we should say could not when the user can take an action to fix something and cannot if it will remain impossible forever.

@jasonmp85
Copy link
Collaborator

Alright, I'm going to add some checklist items to get this wrapped up. I'm being strict about error messages, but we've been lax about doing them properly, saying we'd clean them up later. So I just want to make sure we are being better about them going forward.

After the checklist items are complete I can probably merge this Thursday night or Friday (US time).

@jasonmp85
Copy link
Collaborator

Added checklist up top! Push up some changes to address those things and we'll have a :shipit:!

This commit updates the previous solution for error messages for
incompletely created tables. This commit also updates unit tests
related to the tables that are marked as distributed but no shards
created yet.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) when pulling ed8f95c on feature-issue#9 into 4583f31 on develop.

@onderkalaci
Copy link
Member Author

Below are my comments on the changes:

  • In your second comment, you asked me to check prunedShardList, however, I think checking shardIntervalList is more convinient, as you had already suggested before.
  • With this changes DistributedQueryShardList became more complex compared to the previous implementation. But, I think we need to change the code such that we delay calls to QueryRestrictList and PruneShardList until we check existence of shards.
  • I totally agree with you for the style and content of error messages. Even I checked them more than once, I still have mistakes. In the future changes, I'll pay more attention to the messages.
    Especially for the "could not vs cannot" discussion, I missed the senctence in the postgres documentation, which is "perhaps after fixing some problem". Certainly, for our case the user can fix the problem.
  • For the error codes, I think I may need your help in the future. There are quite a lot of different error codes, and it is difficult for me to find and decide the correct one yet.
  • Lastly, since we updated the function comment of DistributedQueryShardList, I need to be sure that PruneShardList never returns NIL when a non-NIL shardIntervalList is provided.
    I think this is the case since we are making hash pruning. Am I missing something on that?

/* error out if no shards exists for the table */
if (shardIntervalList == NIL)
{
char *relName = get_rel_name(distributedTableId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use relationName for relation names in our code (PostgreSQL uses relname, no uppercase, but we prefer the full word).

@jasonmp85
Copy link
Collaborator

Two small issues. Fix them and :shipit:. You can merge it yourself with the Merge pull request button or do it in the git CLI.

Before shipping:

  • Change relName to relationName
  • Rewrap the detail and hint strings to 90 columns

Minor variable naming/error style fixes.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) when pulling acef92f on feature-issue#9 into 4583f31 on develop.

onderkalaci added a commit that referenced this pull request Jan 9, 2015
Misleading error for incompletely created tables
@onderkalaci onderkalaci merged commit f1490b4 into develop Jan 9, 2015
@onderkalaci onderkalaci deleted the feature-issue#9 branch January 9, 2015 09:08
@jasonmp85 jasonmp85 mentioned this pull request Feb 27, 2015
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Misleading error for incompletely created tables
3 participants