Misleading error for incompletely created tables #38

onderkalaci · 2014-12-19T15:37:44Z

Before this fix, if you try to execute any query on tables which are distributed with master_create_distributed_table but no shards are created yet for the table (i.e. master_create_worker_shards not called), you get unclear error message. This fix catches that case implicitly and more meaningful message is shown.

fixes #9

Review tasks:

Add more specific error code (ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE)
Change message, detail, and hint per my suggestions
Move NIL check into DistributedQueryShardList itself
Update DistributedQueryShardList's function comment to reflect its new "return non-empty list or error" behavior
Ensure tests all pass

onderkalaci · 2014-12-19T15:47:50Z

@jasonmp85 I also believe that the error message should be much more clear.

I created this pull request so that we may chat about the solution that I suggest.

Handling #9 on the planner is a reasonable solution, right? Could you please check my comments on the commit?

Another solution that may solve the issue might be to check the length of queryShardList in the planner phase. However, the solution that I sent makes code more readable I think.

jasonmp85 · 2014-12-30T08:17:28Z

I chatted with @sumedhpathak about this and we thought this pull request could have drastically fewer lines of code if we put the check for an empty shard in an existing function rather than adding new functions.

The natural place would be LoadShardIntervalList, except that master_create_worker_shards has logic that expects it to return NIL (i.e. we can't make it raise an error instead of returning NIL because master_create_worker_shards expects the NIL when checking whether shards already exist for a table).

That leaves DistributedQueryShardList. Can you just change that function to check the return value of LookupShardIntervalList and error if it's NIL (also update the comment on the function to reflect this new behavior)?

That change would be far fewer lines of code (probably no more than five?) and would accomplish the same end result. You should also be sure to add a test for the new behavior (add a query in queries.sql after the distribution call has been made but before the shards have been created).

jasonmp85 · 2014-12-30T08:28:35Z

Two things:

You commented on the commit itself, which lives outside the pull request. For pull requests we usually converse in two place: here (i.e. the top-level comments one can make in a Pull Request's "Conversation" tab) and on lines in the Files Changed tab of the Pull Request. So in the future, just use the Files Changed tab to make line-specific Pull Request comments
I think this change would be much simpler if you simply added a check and error to DistributedQueryShardList. Can you do that?

Before this fix, if you try to execute any query on tables which are distributed with master_create_distributed_table but no shards are created yet for the table (ie master_create_worker_shards not called), you get unclear error message. This fix catches that case implicitly and more meaningful message is shown.

This commit aims to add a unit test for executing queries on distributed tables. Test aims to get the error message when there are no shards created for the distributed tables.

coveralls · 2015-01-05T11:50:20Z

Coverage increased (+0.01%) when pulling 6e690cb on feature-issue#9 into 46bcf2c on develop.

onderkalaci · 2015-01-05T11:54:36Z

With this implementation, we couldn't specify the name of the relation for which shards are not created in the ereport (Well, sure we can specify but code gets longer and become complicated). Thus, I had to use "the distributed table" phrase instead of its name in the ereport. Also, I think we are sure that there is only a single table, because we error out in the "ErrorIfQueryNotSupported" function if there are more than one relations involved in the query.

jasonmp85 · 2015-01-08T00:37:32Z

pg_shard.c

 		queryShardList = DistributedQueryShardList(distributedQuery);
+		if (queryShardList == NIL)
+		{
+			ereport(ERROR, (errmsg("cannot plan SELECT query"),


This message could be generated by an INSERT, UPDATE, or DELETE as well…

Also, the style guide says this should be in past tense: could not plan query.

I think we can use ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE here: we have an object (the distributed table) which hasn't been initialized yet (by creating shards). This code is used in PostgreSQL for other analogous situations, such as not calling nextval for a sequence before asking it what its current value is (grep for it).

jasonmp85 · 2015-01-08T07:06:59Z

I was envisioning that the check would actually live in DistributedQueryShardList rather than after the call to it. Right now that method always returns a list, which can be the empty list (NIL). I'm proposing we change it to always return a non-empty list or error.

Getting the relation name would also be easy because within that method the distributedTableId is in scope. It would just need a block like this:

List *prunedShardList = PruneShardList(distributedTableId, restrictClauseList, shardIntervalList);

if (prunedShardList == NIL)
{
    char *relname = get_rel_name(distributedTableId);

    ereport(ERROR, (errmsg("... %s ...", relname)));
}

return prunedShardList;

As for the parts of the error message:

errcode — we should use ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE
errmsg — should read could not find any shards for query
errdetail — should read No shards exist for distributed table "%s".
errhint — Don't need the quotation marks around the method name: Run master_create_worker_shards to create shards and try again.

My suggestions for the above are based on my reading of the style guide. I think we should avoid using quotes unless they contain dynamic strings that could contain space-separated words. If we know ahead of time that a function name looks like a function name, the quotes are "unnecessary" in the words of the style guide. Additionally, we should say could not when the user can take an action to fix something and cannot if it will remain impossible forever.

jasonmp85 · 2015-01-08T07:16:53Z

Alright, I'm going to add some checklist items to get this wrapped up. I'm being strict about error messages, but we've been lax about doing them properly, saying we'd clean them up later. So I just want to make sure we are being better about them going forward.

After the checklist items are complete I can probably merge this Thursday night or Friday (US time).

jasonmp85 · 2015-01-08T07:20:14Z

Added checklist up top! Push up some changes to address those things and we'll have a !

This commit updates the previous solution for error messages for incompletely created tables. This commit also updates unit tests related to the tables that are marked as distributed but no shards created yet.

coveralls · 2015-01-08T12:01:43Z

Coverage increased (+0.02%) when pulling ed8f95c on feature-issue#9 into 4583f31 on develop.

onderkalaci · 2015-01-08T12:29:37Z

Below are my comments on the changes:

In your second comment, you asked me to check prunedShardList, however, I think checking shardIntervalList is more convinient, as you had already suggested before.
With this changes DistributedQueryShardList became more complex compared to the previous implementation. But, I think we need to change the code such that we delay calls to QueryRestrictList and PruneShardList until we check existence of shards.
I totally agree with you for the style and content of error messages. Even I checked them more than once, I still have mistakes. In the future changes, I'll pay more attention to the messages.
Especially for the "could not vs cannot" discussion, I missed the senctence in the postgres documentation, which is "perhaps after fixing some problem". Certainly, for our case the user can fix the problem.
For the error codes, I think I may need your help in the future. There are quite a lot of different error codes, and it is difficult for me to find and decide the correct one yet.
Lastly, since we updated the function comment of DistributedQueryShardList, I need to be sure that PruneShardList never returns NIL when a non-NIL shardIntervalList is provided.
I think this is the case since we are making hash pruning. Am I missing something on that?

jasonmp85 · 2015-01-09T07:52:01Z

pg_shard.c

+	/* error out if no shards exists for the table */
+	if (shardIntervalList == NIL)
+	{
+		char *relName = get_rel_name(distributedTableId);


We use relationName for relation names in our code (PostgreSQL uses relname, no uppercase, but we prefer the full word).

jasonmp85 · 2015-01-09T07:53:59Z

Two small issues. Fix them and . You can merge it yourself with the Merge pull request button or do it in the git CLI.

Before shipping:

Change relName to relationName
Rewrap the detail and hint strings to 90 columns

Minor variable naming/error style fixes.

coveralls · 2015-01-09T09:05:01Z

Coverage increased (+0.02%) when pulling acef92f on feature-issue#9 into 4583f31 on develop.

Misleading error for incompletely created tables

onderkalaci added the waffle: needs review label Dec 19, 2014

onderkalaci added 2 commits January 5, 2015 12:06

Add unit test for distributed tables

6e690cb

This commit aims to add a unit test for executing queries on distributed tables. Test aims to get the error message when there are no shards created for the distributed tables.

onderkalaci force-pushed the feature-issue#9 branch from 04ceeee to 6e690cb Compare January 5, 2015 11:48

jasonmp85 reviewed Jan 8, 2015
View reviewed changes

onderkalaci added 2 commits January 8, 2015 13:57

Update misleading error messages for incompletely created tables

6678cee

This commit updates the previous solution for error messages for incompletely created tables. This commit also updates unit tests related to the tables that are marked as distributed but no shards created yet.

Merge remote-tracking branch 'origin/develop' into feature-issue#9

ed8f95c

jasonmp85 reviewed Jan 9, 2015
View reviewed changes

Final style changes

acef92f

Minor variable naming/error style fixes.

onderkalaci added a commit that referenced this pull request Jan 9, 2015

Merge pull request #38 from citusdata/feature-issue#9

f1490b4

Misleading error for incompletely created tables

onderkalaci merged commit f1490b4 into develop Jan 9, 2015

onderkalaci removed the waffle: needs review label Jan 9, 2015

onderkalaci deleted the feature-issue#9 branch January 9, 2015 09:08

jasonmp85 mentioned this pull request Feb 27, 2015

Handle zero-shard SELECT queries #58

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misleading error for incompletely created tables #38

Misleading error for incompletely created tables #38

onderkalaci commented Dec 19, 2014

onderkalaci commented Dec 19, 2014

jasonmp85 commented Dec 30, 2014

jasonmp85 commented Dec 30, 2014

coveralls commented Jan 5, 2015

onderkalaci commented Jan 5, 2015

jasonmp85 Jan 8, 2015

jasonmp85 Jan 8, 2015

jasonmp85 Jan 8, 2015

jasonmp85 commented Jan 8, 2015

jasonmp85 commented Jan 8, 2015

jasonmp85 commented Jan 8, 2015

coveralls commented Jan 8, 2015

onderkalaci commented Jan 8, 2015

jasonmp85 Jan 9, 2015

jasonmp85 commented Jan 9, 2015

coveralls commented Jan 9, 2015

Misleading error for incompletely created tables #38

Misleading error for incompletely created tables #38

Conversation

onderkalaci commented Dec 19, 2014

onderkalaci commented Dec 19, 2014

jasonmp85 commented Dec 30, 2014

jasonmp85 commented Dec 30, 2014

coveralls commented Jan 5, 2015

onderkalaci commented Jan 5, 2015

jasonmp85 Jan 8, 2015

Choose a reason for hiding this comment

jasonmp85 Jan 8, 2015

Choose a reason for hiding this comment

jasonmp85 Jan 8, 2015

Choose a reason for hiding this comment

jasonmp85 commented Jan 8, 2015

jasonmp85 commented Jan 8, 2015

jasonmp85 commented Jan 8, 2015

coveralls commented Jan 8, 2015

onderkalaci commented Jan 8, 2015

jasonmp85 Jan 9, 2015

Choose a reason for hiding this comment

jasonmp85 commented Jan 9, 2015

coveralls commented Jan 9, 2015