Add Helix rest Zookeeper delete API to allow removing ephemeral ZNode#1190
Add Helix rest Zookeeper delete API to allow removing ephemeral ZNode#1190jiajunwang merged 4 commits intoapache:masterfrom
Conversation
...x-rest/src/main/java/org/apache/helix/rest/server/resources/zookeeper/ZooKeeperAccessor.java
Outdated
Show resolved
Hide resolved
...x-rest/src/main/java/org/apache/helix/rest/server/resources/zookeeper/ZooKeeperAccessor.java
Outdated
Show resolved
Hide resolved
| getChildren, | ||
| getStat | ||
| getStat, | ||
| delete |
There was a problem hiding this comment.
I think it would be smarter to use deleteEphemeral and rename your methods accordingly because it seems that it's not the general delete you're trying to support.
There was a problem hiding this comment.
I don't think we want "deleteEphemeral" eventually. As mentioned in the description, this is a premature feature that we add now for unblocking our users.
Alternatively, I tried to allow deleting live instance only. But that will pollute the ZookeeperAccessor API with Helix logics. So I discarded that idea.
There was a problem hiding this comment.
It seems that you are suggesting adding and using delete, expecting its behavior to change in the future. Adding an endpoint and changing its behavior will bring about backward-compatibility issues and make the meaning of "delete" murky. Moreover, there's no harm in having deleteEphemeral - it does what it does, and if the user no longer wishes to use it, then there's no harm in having it.
A good API design I believe is something that is 1) easy to use and 2) doing exactly what it's advertising to do. Do you see why I think it might be less desirable to add hidden assumptions to delete?
There was a problem hiding this comment.
I can see where you are coming from. Could you check the latest change that I have modified the method to use DELETE verb according to what Huizhi suggested? I think it is cleaner. However, in this case, we need some more parameters to separate the cases. And I think it might be overcomplicated.
@dasahcc and @pkuwm please also share your opinion since you also contributed to the Helix rest.
There was a problem hiding this comment.
My opinion is,
- use
DELETEverb that is designed for the REST delete operation. - if we only want to support deleting ephemeral, document it well and return a clear response like:
HTTP/1.1 404
Content-Type: application/json
{
"message": "Deleting a non-ephemeral node is not supported/allowed",
"path": "/a/b/c"
}
And it is extensible if we want to support deleting persistent node in the future.
There was a problem hiding this comment.
This information is carried in the response entity as a string for now. I don't think we need to make it too structural (complicated) given it is a temporary restriction. And eventually, we do not have a clear standard for the response format now. So I would prefer holding on any more complex idea.
...x-rest/src/main/java/org/apache/helix/rest/server/resources/zookeeper/ZooKeeperAccessor.java
Outdated
Show resolved
Hide resolved
helix-rest/src/test/java/org/apache/helix/rest/server/TestZooKeeperAccessor.java
Show resolved
Hide resolved
|
@jiajunwang By "zombie" participant, you meant the ephemeral node doesn't have any active zk connection/session, but it is not deleted by ZK? |
Please read the description of PR, "the instance has an active zk connection, but refuse to do any work" |
Oh I read "an active zkconnection" as inactive zkconnection... |
...x-rest/src/main/java/org/apache/helix/rest/server/resources/zookeeper/ZooKeeperAccessor.java
Outdated
Show resolved
Hide resolved
huizhilu
left a comment
There was a problem hiding this comment.
LGTM. Minor comments for better msg in response.
| private Response delete(BaseDataAccessor zkBaseDataAccessor, String path) { | ||
| Stat stat = zkBaseDataAccessor.getStat(path, AccessOption.PERSISTENT); | ||
| if (stat == null) { | ||
| return notFound(); |
There was a problem hiding this comment.
Could we add a msg to this as well: ("Path %s does not exist", path)? I think it gives a user a better idea. Otherwise the msg returned is unfriendly if we use curl endpoint in terminal.
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 404 </title>
</head>
<body>
<h2>HTTP ERROR: 404</h2>
<p>Problem accessing /admin/v2/zookeeper/aa. Reason:
<pre> Not Found</pre></p>
<hr /><a href="http://eclipse.org/jetty">Powered by Jetty:// 9.4.12.v20180830</a><hr/>
</body>
</html>
VS
{
"message" : "Path /aa does not exist",
"status": 404
}
| } | ||
|
|
||
| if (zkBaseDataAccessor.remove(path, AccessOption.PERSISTENT)) { | ||
| return OK(); |
There was a problem hiding this comment.
At least add a message OK("Success")?
| * @param path | ||
| * @return The delete result and the operated path. | ||
| */ | ||
| private Response delete(BaseDataAccessor zkBaseDataAccessor, String path) { |
There was a problem hiding this comment.
I'm not seeing how this is any different from using delete. This is no better than using delete for two different types of delete's - delete and deleteEphemeral.
Perhaps you could add a commandStr here to differentiate two different types of deletes, and when you want to add an endpoint for regular delete backed by ACL checks, then just implement that if that becomes necessary? I don't think this adds any more work/difficulty for the purposes of this PR? (If any, it saves you the work of adding a TODO)
My point was not about what kind of REST verb we should use - it's pretty clear we should use DELETE in this case. But it's more about following a good API design which, again, is something that is hard to misuse by not embedding hidden assumptions or TODOs that may cause a behavior change down the road. Also, seen from another angle, supporting it as deleteEphemeral gives the user a clear meaning to the command string as opposed to just calling it a HTTP verb DELETE, which might leave the user confused and question the meaning of the API when it fails to delete regular ZNodes.
You could add two commands, delete and deleteEphemeral, and make the default commandStr delete, and throw a not authorized or not supported, and only let deleteEphemeral go through. This way, when we do decide to support delete operation with ACL, there's no confusion or change in behavior.
There was a problem hiding this comment.
Discussed with Junkai in slack, his point is that we don't need the additional cmd layer for now.
|
This PR is ready to be merged, approved by @pkuwm |
…apache#1190) Add a new Helix rest API in the ZookeeperAccessor for deleting an ephemeral ZNode. Note that before we have ACL/audit support in the Helix rest, allowing raw ZK write operation is dangerous. This API is introduced prematurely for resolving the issue of "zombie" participant (the instance has an active zk connection, but refuse to do any work). Currently, the existence of such a node may block the normal state transitions and then impact the cluster's availability. This PR restricts that only an ephemeral node can be deleted to minimize the risk.
…apache#1190) Add a new Helix rest API in the ZookeeperAccessor for deleting an ephemeral ZNode. Note that before we have ACL/audit support in the Helix rest, allowing raw ZK write operation is dangerous. This API is introduced prematurely for resolving the issue of "zombie" participant (the instance has an active zk connection, but refuse to do any work). Currently, the existence of such a node may block the normal state transitions and then impact the cluster's availability. This PR restricts that only an ephemeral node can be deleted to minimize the risk.
Issues
#1189
Description
Add a new Helix rest API in the ZookeeperAccessor for deleting an ephemeral ZNode.
Note that before we have ACL/audit support in the Helix rest, allowing raw ZK write operation is dangerous.
This API is introduced prematurely for resolving the issue of "zombie" participant (the instance has an active zk connection, but refuse to do any work). Currently, the existence of such a node may block the normal state transitions and then impact the cluster's availability. This PR restricts that only an ephemeral node can be deleted to minimize the risk.
Tests
TestZooKeeperAccessor.testDelete()
helix-rest
[INFO] Tests run: 164, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 49.55 s - in TestSuite
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 164, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 55.836 s
[INFO] Finished at: 2020-07-30T15:40:28-07:00
[INFO] ------------------------------------------------------------------------
Commits
Documentation (Optional)
(Link the GitHub wiki you added)
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)