Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdmain: better logging when user forget to set initial flags #3556

Merged
merged 1 commit into from
Sep 21, 2015

Conversation

xiang90
Copy link
Contributor

@xiang90 xiang90 commented Sep 18, 2015

Fix #3552

Now it will log as if you forgot to set both --initial-cluster and --discovery and the configuration does not match.

2015-09-18 15:19:16.721741 I | etcdmain: advertise URLs of "default" do not match in --initial-advertise-peer-urls [http://example.com:1234] and --initial-cluster [http://localhost:2380 http://localhost:7001]
2015-09-18 15:19:16.721746 I | etcdmain: forgot to set --initial-cluster flag?
2015-09-18 15:19:16.721754 I | etcdmain: if you want to use discovery service, please set --discovery flag.

@@ -146,6 +147,19 @@ func Main() {
plog.Errorf("please check (cURL) the discovery token for more information.")
plog.Errorf("please do not reuse the discovery token and generate a new one to bootstrap the cluster.")
default:
if strings.Contains(err.Error(), "match") && strings.Contains(err.Error(), "--initial-cluster") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this deserve to be a specific error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have variables in this error. i would say no.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no better way to handle it. Let us do it in this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless when we decide to construct ALL specific errors in etcdmain.

@yichengq
Copy link
Contributor

LGTM. defer to @crawford

@crawford
Copy link
Contributor

I still don't understand that original error. Can we fix that?

@yichengq
Copy link
Contributor

@crawford xiang reinvents the log for the original error to be:

2015-09-18 15:19:16.721741 I | etcdmain: advertise URLs of "default" do not match in --initial-advertise-peer-urls [http://example.com:1234] and --initial-cluster [http://localhost:2380 http://localhost:7001]
2015-09-18 15:19:16.721746 I | etcdmain: forgot to set --initial-cluster flag?
2015-09-18 15:19:16.721754 I | etcdmain: if you want to use discovery service, please set --discovery flag.

Which part makes you confused?

@xiang90
Copy link
Contributor Author

xiang90 commented Sep 18, 2015

@crawford I think the original error logging is clear enough, but did not give you a hint of how to fix it. It simply says the two flags you give to etcd does not match with each other.

@crawford
Copy link
Contributor

As a user, I'm telling you that the error is not clear enough.

@xiang90
Copy link
Contributor Author

xiang90 commented Sep 21, 2015

@crawford What do you expect to see? Or why do not think it is not clear enough after providing you all the hints?

@crawford
Copy link
Contributor

I would rather us fix the original message instead of adding more details. Maybe something like "no initial cluster or discovery URL provided".

@xiang90
Copy link
Contributor Author

xiang90 commented Sep 21, 2015

@crawford

First, the error message should tell you what the error is, rather than finding the cause of the error. The fact is that the two flags does not match with each other. We can make the wording better or providing details based on other context, but not change the error. That is why we add the details.

Second, your suggestion is not good. We should not try to hide an error message by providing one possible cause. There are other cases that can cause the error.

Also, can you tell me why adding the details is not good enough? It seems that the detailed message after the error message provides what you want based on your reply.

@crawford
Copy link
Contributor

First, the error message should tell you what the error is, rather than finding the cause of the error.

Agreed, but I do not understand the error message. I don't need my hand held, I just need to be able to comprehend the message.

We can make the wording better or providing details based on other context, but not change the error.

You can rephrase the original error so that it gives some hint as to how the user got there in the first place. As it exists, one must understand the etcd codebase in order to interpret this error. That's not good.

Second, your suggestion is not good. We should not try to hide an error message by providing one possible cause. There are other cases that can cause the error.

Well, this is good to know. You know all of the cases; you come up with a better message.

Also, can you tell me why adding the details is not good enough? It seems that the detailed message after the error message provides what you want based on your reply.

Because instead of fixing the root cause, you've thrown a band-aid over it. I still cannot parse that original error message.

@xiang90
Copy link
Contributor Author

xiang90 commented Sep 21, 2015

Because instead of fixing the root cause, you've thrown a band-aid over it.

OK. Let's try to fix this. I would love to fix root cause.

Let me explain the error in a verbose english. You can help to make it shorter. As someone who is very familiar with etcd, I think I am not the best person to shorten the error message.

To configure a local etcd, you need to set a flag to advertise its own peer urls to others. So other peers can use it to reach the local member. That is called --advertise-peer-urls.

To configure a static cluster, you need to set a to list the cluster members and their advertise-peer-urls including itself's. This flag is called --initial-cluster

So you can know that you will repeat the peer-url of the local member twice. One in the --advertise-peer-url flag and the other one in the --initial-cluster flag. If they do not match with each other, then you got an error

 advertise URLs of "default" do not match in --initial-advertise-peer-urls [http://example.com:1234] and --initial-cluster [http://localhost:2380 http://localhost:7001]

You might think that: you can simply remove the duplication. However, we will not. The reason is that we do want the users to repeat this important cluster-wide setting. We do not want one of them to overwrite the other one and we want each member to have exact same initial-cluster.

@xiang90
Copy link
Contributor Author

xiang90 commented Sep 21, 2015

@crawford

With all these explanation, I still cannot think a better error message than simply telling the fact that the two flags do not match.

Well, this is good to know. You know all of the cases; you come up with a better message.

Again, I will not change the error message to explain the causes and you agreed on that. If we cannot come up with a better error message. I would rather keep the explanation here.

@polvi
Copy link
Contributor

polvi commented Sep 21, 2015

working:

$ etcd --name=foo --initial-advertise-peer-urls=http://localhost:2381 --initial-cluster='foo=http://localhost:2381'
...

Failure 1 (changing name=foo to name=default):

$ etcd --name=default --initial-advertise-peer-urls=http://localhost:2381 --initial-cluster='foo=http://localhost:2381'
....
2015-09-20 20:43:32.857996 C | etcdmain: couldn't find local name "default" in the initial cluster configuration

Failure 2 (changing advertise-peer-urls to be one that is not part of initial-cluster):

$ etcd --name=foo --initial-advertise-peer-urls=http://localhost:9000 --initial-cluster='foo=http://localhost:2381'
...
2015-09-20 20:44:56.765504 C | etcdmain: advertise URLs of "foo" do not match in --initial-advertise-peer-urls [http://localhost:9000] and --initial-cluster [http://localhost:2381]

@polvi
Copy link
Contributor

polvi commented Sep 21, 2015

With that it feels like the right error for "failure 2" above is:

"--initial-cluster must include foo=http://localhost:9000"

@xiang90
Copy link
Contributor Author

xiang90 commented Sep 21, 2015

@polvi Right. That is another way to explain the mismatch. However, we are not sure if the information provided in --initial-advertise-peer-urls is correct or --initial-cluster is correct.

For example, when user forget to set --initial-advertise-peer-urls

etcd --name=foo --initial-cluster='foo=http://localhost:9999'

We will print out that --initial-cluster must include foo=http://localhost:2380, which is equally confusing in my opinion. They do not even know where 2380 comes from.

@xiang90
Copy link
Contributor Author

xiang90 commented Sep 21, 2015

@polvi Or we can give them a priority when returning the error. If mismatch is caused by unset, complaints the unset. If it is a really mismatch (both set), complaints the initial-cluster.

/cc @crawford What do you think?

@polvi
Copy link
Contributor

polvi commented Sep 21, 2015

@crawford does this make sense to you at all?

$ etcd --name=foo --initial-cluster='foo=http://localhost:9000,foo=http://localhost:1000'
...
--initial-cluster must include "foo=http://localhost:2380,foo=http://localhost:7001" given --initial-advertise-peer-urls=http://localhost:2380,http://localhost:7001

@crawford
Copy link
Contributor

@polvi yeah, that makes much more sense.

@xiang90
Copy link
Contributor Author

xiang90 commented Sep 21, 2015

@crawford OK. We rephrased the original mismatch error as @polvi suggested.

xiang90 added a commit that referenced this pull request Sep 21, 2015
etcdmain: better logging when user forget to set initial flags
@xiang90 xiang90 merged commit 6188933 into etcd-io:master Sep 21, 2015
@xiang90 xiang90 deleted the better_error_logging branch September 21, 2015 17:52
@polvi
Copy link
Contributor

polvi commented Sep 21, 2015

Based on that commit it looks like it does not do the right thing. Could you run the code path with multiple advertise-urls and print the resulting message?

@xiang90
Copy link
Contributor Author

xiang90 commented Sep 21, 2015

@polvi
Here is the result with multiple advertise-urls

etcdmain: --initial-cluster must include default=[http://example.com:1234 http://example.com:2345] given --initial-advertise-peer-urls=[http://example.com:1234 http://example.com:2345]

You want the error message to suggest the format that you can paste into the flag?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants