Skip to content

Add Helix rest Zookeeper delete API to allow removing ephemeral ZNode#1190

Merged
jiajunwang merged 4 commits intoapache:masterfrom
jiajunwang:zkRest
Aug 4, 2020
Merged

Add Helix rest Zookeeper delete API to allow removing ephemeral ZNode#1190
jiajunwang merged 4 commits intoapache:masterfrom
jiajunwang:zkRest

Conversation

@jiajunwang
Copy link
Contributor

@jiajunwang jiajunwang commented Jul 30, 2020

Issues

  • My PR addresses the following Helix issues and references them in the PR description:

#1189

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Add a new Helix rest API in the ZookeeperAccessor for deleting an ephemeral ZNode.

Note that before we have ACL/audit support in the Helix rest, allowing raw ZK write operation is dangerous.
This API is introduced prematurely for resolving the issue of "zombie" participant (the instance has an active zk connection, but refuse to do any work). Currently, the existence of such a node may block the normal state transitions and then impact the cluster's availability. This PR restricts that only an ephemeral node can be deleted to minimize the risk.

Tests

  • The following tests are written for this issue:

TestZooKeeperAccessor.testDelete()

  • The following is the result of the "mvn test" command on the appropriate module:

helix-rest

[INFO] Tests run: 164, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 49.55 s - in TestSuite
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 164, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 55.836 s
[INFO] Finished at: 2020-07-30T15:40:28-07:00
[INFO] ------------------------------------------------------------------------

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation (Optional)

  • In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

getChildren,
getStat
getStat,
delete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be smarter to use deleteEphemeral and rename your methods accordingly because it seems that it's not the general delete you're trying to support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want "deleteEphemeral" eventually. As mentioned in the description, this is a premature feature that we add now for unblocking our users.

Alternatively, I tried to allow deleting live instance only. But that will pollute the ZookeeperAccessor API with Helix logics. So I discarded that idea.

Copy link
Contributor

@narendly narendly Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you are suggesting adding and using delete, expecting its behavior to change in the future. Adding an endpoint and changing its behavior will bring about backward-compatibility issues and make the meaning of "delete" murky. Moreover, there's no harm in having deleteEphemeral - it does what it does, and if the user no longer wishes to use it, then there's no harm in having it.

A good API design I believe is something that is 1) easy to use and 2) doing exactly what it's advertising to do. Do you see why I think it might be less desirable to add hidden assumptions to delete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see where you are coming from. Could you check the latest change that I have modified the method to use DELETE verb according to what Huizhi suggested? I think it is cleaner. However, in this case, we need some more parameters to separate the cases. And I think it might be overcomplicated.

@dasahcc and @pkuwm please also share your opinion since you also contributed to the Helix rest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is,

  1. use DELETE verb that is designed for the REST delete operation.
  2. if we only want to support deleting ephemeral, document it well and return a clear response like:
    HTTP/1.1  404
    Content-Type: application/json
 
    {
      "message": "Deleting a non-ephemeral node is not supported/allowed",
      "path": "/a/b/c"
    }

And it is extensible if we want to support deleting persistent node in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information is carried in the response entity as a string for now. I don't think we need to make it too structural (complicated) given it is a temporary restriction. And eventually, we do not have a clear standard for the response format now. So I would prefer holding on any more complex idea.

@huizhilu
Copy link
Contributor

@jiajunwang By "zombie" participant, you meant the ephemeral node doesn't have any active zk connection/session, but it is not deleted by ZK?

@jiajunwang
Copy link
Contributor Author

@jiajunwang By "zombie" participant, you meant the ephemeral node doesn't have any active zk connection/session, but it is not deleted by ZK?

Please read the description of PR, "the instance has an active zk connection, but refuse to do any work"

@huizhilu
Copy link
Contributor

@jiajunwang By "zombie" participant, you meant the ephemeral node doesn't have any active zk connection/session, but it is not deleted by ZK?

Please read the description of PR, "the instance has an active zk connection, but refuse to do any work"

Oh I read "an active zkconnection" as inactive zkconnection...

Copy link
Contributor

@huizhilu huizhilu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor comments for better msg in response.

private Response delete(BaseDataAccessor zkBaseDataAccessor, String path) {
Stat stat = zkBaseDataAccessor.getStat(path, AccessOption.PERSISTENT);
if (stat == null) {
return notFound();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a msg to this as well: ("Path %s does not exist", path)? I think it gives a user a better idea. Otherwise the msg returned is unfriendly if we use curl endpoint in terminal.

<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 404 </title>
</head>
<body>
<h2>HTTP ERROR: 404</h2>
<p>Problem accessing /admin/v2/zookeeper/aa. Reason:
<pre>    Not Found</pre></p>
<hr /><a href="http://eclipse.org/jetty">Powered by Jetty:// 9.4.12.v20180830</a><hr/>
</body>
</html>

VS

{
  "message" : "Path /aa does not exist",
  "status": 404
}

}

if (zkBaseDataAccessor.remove(path, AccessOption.PERSISTENT)) {
return OK();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least add a message OK("Success")?

* @param path
* @return The delete result and the operated path.
*/
private Response delete(BaseDataAccessor zkBaseDataAccessor, String path) {
Copy link
Contributor

@narendly narendly Aug 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing how this is any different from using delete. This is no better than using delete for two different types of delete's - delete and deleteEphemeral.

Perhaps you could add a commandStr here to differentiate two different types of deletes, and when you want to add an endpoint for regular delete backed by ACL checks, then just implement that if that becomes necessary? I don't think this adds any more work/difficulty for the purposes of this PR? (If any, it saves you the work of adding a TODO)

My point was not about what kind of REST verb we should use - it's pretty clear we should use DELETE in this case. But it's more about following a good API design which, again, is something that is hard to misuse by not embedding hidden assumptions or TODOs that may cause a behavior change down the road. Also, seen from another angle, supporting it as deleteEphemeral gives the user a clear meaning to the command string as opposed to just calling it a HTTP verb DELETE, which might leave the user confused and question the meaning of the API when it fails to delete regular ZNodes.

You could add two commands, delete and deleteEphemeral, and make the default commandStr delete, and throw a not authorized or not supported, and only let deleteEphemeral go through. This way, when we do decide to support delete operation with ACL, there's no confusion or change in behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with Junkai in slack, his point is that we don't need the additional cmd layer for now.

@jiajunwang
Copy link
Contributor Author

This PR is ready to be merged, approved by @pkuwm

@jiajunwang jiajunwang merged commit b13d872 into apache:master Aug 4, 2020
@jiajunwang jiajunwang deleted the zkRest branch August 4, 2020 00:10
junkaixue pushed a commit to junkaixue/helix that referenced this pull request Aug 11, 2020
…apache#1190)

Add a new Helix rest API in the ZookeeperAccessor for deleting an ephemeral ZNode.

Note that before we have ACL/audit support in the Helix rest, allowing raw ZK write operation is dangerous.
This API is introduced prematurely for resolving the issue of "zombie" participant (the instance has an active zk connection, but refuse to do any work). Currently, the existence of such a node may block the normal state transitions and then impact the cluster's availability. This PR restricts that only an ephemeral node can be deleted to minimize the risk.
huizhilu pushed a commit to huizhilu/helix that referenced this pull request Aug 16, 2020
…apache#1190)

Add a new Helix rest API in the ZookeeperAccessor for deleting an ephemeral ZNode.

Note that before we have ACL/audit support in the Helix rest, allowing raw ZK write operation is dangerous.
This API is introduced prematurely for resolving the issue of "zombie" participant (the instance has an active zk connection, but refuse to do any work). Currently, the existence of such a node may block the normal state transitions and then impact the cluster's availability. This PR restricts that only an ephemeral node can be deleted to minimize the risk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants