New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misleading error for incompletely created tables #38
Conversation
@jasonmp85 I also believe that the error message should be much more clear. I created this pull request so that we may chat about the solution that I suggest. Handling #9 on the planner is a reasonable solution, right? Could you please check my comments on the commit? Another solution that may solve the issue might be to check the length of |
I chatted with @sumedhpathak about this and we thought this pull request could have drastically fewer lines of code if we put the check for an empty shard in an existing function rather than adding new functions. The natural place would be That leaves That change would be far fewer lines of code (probably no more than five?) and would accomplish the same end result. You should also be sure to add a test for the new behavior (add a query in |
Two things:
|
Before this fix, if you try to execute any query on tables which are distributed with master_create_distributed_table but no shards are created yet for the table (ie master_create_worker_shards not called), you get unclear error message. This fix catches that case implicitly and more meaningful message is shown.
This commit aims to add a unit test for executing queries on distributed tables. Test aims to get the error message when there are no shards created for the distributed tables.
04ceeee
to
6e690cb
Compare
With this implementation, we couldn't specify the name of the relation for which shards are not created in the ereport (Well, sure we can specify but code gets longer and become complicated). Thus, I had to use "the distributed table" phrase instead of its name in the ereport. Also, I think we are sure that there is only a single table, because we error out in the "ErrorIfQueryNotSupported" function if there are more than one relations involved in the query. |
queryShardList = DistributedQueryShardList(distributedQuery); | ||
if (queryShardList == NIL) | ||
{ | ||
ereport(ERROR, (errmsg("cannot plan SELECT query"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message could be generated by an INSERT
, UPDATE
, or DELETE
as well…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the style guide says this should be in past tense: could not plan query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE
here: we have an object (the distributed table) which hasn't been initialized yet (by creating shards). This code is used in PostgreSQL for other analogous situations, such as not calling nextval
for a sequence before asking it what its current value is (grep
for it).
I was envisioning that the check would actually live in Getting the relation name would also be easy because within that method the List *prunedShardList = PruneShardList(distributedTableId, restrictClauseList, shardIntervalList);
if (prunedShardList == NIL)
{
char *relname = get_rel_name(distributedTableId);
ereport(ERROR, (errmsg("... %s ...", relname)));
}
return prunedShardList; As for the parts of the error message:
My suggestions for the above are based on my reading of the style guide. I think we should avoid using quotes unless they contain dynamic strings that could contain space-separated words. If we know ahead of time that a function name looks like a function name, the quotes are "unnecessary" in the words of the style guide. Additionally, we should say could not when the user can take an action to fix something and cannot if it will remain impossible forever. |
Alright, I'm going to add some checklist items to get this wrapped up. I'm being strict about error messages, but we've been lax about doing them properly, saying we'd clean them up later. So I just want to make sure we are being better about them going forward. After the checklist items are complete I can probably merge this Thursday night or Friday (US time). |
Added checklist up top! Push up some changes to address those things and we'll have a ! |
This commit updates the previous solution for error messages for incompletely created tables. This commit also updates unit tests related to the tables that are marked as distributed but no shards created yet.
Below are my comments on the changes:
|
/* error out if no shards exists for the table */ | ||
if (shardIntervalList == NIL) | ||
{ | ||
char *relName = get_rel_name(distributedTableId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use relationName
for relation names in our code (PostgreSQL uses relname
, no uppercase, but we prefer the full word).
Two small issues. Fix them and . You can merge it yourself with the Merge pull request button or do it in the Before shipping:
|
Minor variable naming/error style fixes.
Misleading error for incompletely created tables
Before this fix, if you try to execute any query on tables which are distributed with
master_create_distributed_table
but no shards are created yet for the table (i.e.master_create_worker_shards
not called), you get unclear error message. This fix catches that case implicitly and more meaningful message is shown.fixes #9
Review tasks:
ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE
)NIL
check intoDistributedQueryShardList
itselfDistributedQueryShardList
's function comment to reflect its new "return non-empty list or error" behavior