Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Proposal] Pull Request to add support for Erlang-only validate_doc_read functions #1724

Open
AlexanderKaraberov opened this issue Nov 9, 2018 · 2 comments

Comments

@AlexanderKaraberov
Copy link
Contributor

AlexanderKaraberov commented Nov 9, 2018

Overview

Context

This is not really a feature request because we do have this functionality implemented in our CouchDB 2.x fork. However I consider this feature to be a good fit for upstream repository and I decided to open a discussional issue first instead. Let me know if it is better to directly submit a PR with this functionality.

Background and strategic fit

CouchDB already has support for validate_doc_update functions which if defined in the ddoc can be used to prevent invalid or unauthorized document update requests from being stored. In other words document-level write security. I'm proposing a patch which adds the same type of security but for all document reads. This functionality have been used in our production cluster and proved to be useful. There are a lot of valid use case scenarios akin to "Some particular user may only read documents he created" and so on. This also may alleviate overhead caused by alternative solutions such as per-user DBs.

Implementation details

First things first. This feature can lay waste to read performance if validate_doc_read functions will be implemented in JS due to notorious external query server and serialisation/deserialisation and standard I/O overhead. Thereby I propose to impose a limitation on these functions namely they have to be implemented only in Erlang and marked as advanced functionality. Perhaps they have to be available solely by means of some additional config such as validate_doc_read = true. Therefore this will not in any case affect default CouchDB setup but might be useful for advanced users. When implemented as Erlang ones potential overhead of these functions is negligible but benefits are perceptible. In a nutshell implementation extends load_validation_funs to also find, load and parse validate_doc_read functions from design documents and then make appropriate calls in the end of make_doc(). Unauthorised exception will be thrown and status code 403 returned from the chttpd handler when some invariants in validate_doc_read logic are violated.

Conclusions

I've decided to create this issue instead of directly sending a PR because at first I would like to gather feedback from core Apache and Cloudant maintainers regarding the usefulness and expediency of this feature. In case of a more or less positive one I would be glad to submit a PR which will shed more light on the technical part (which is pretty straightforward) as well as open a discussion about potential improvements.

@AlexanderKaraberov AlexanderKaraberov changed the title [Feature Proposal] Support Erlang-only validate_doc_read functions [Feature Proposal] Pull Request to support Erlang-only validate_doc_read functions Nov 9, 2018
@AlexanderKaraberov AlexanderKaraberov changed the title [Feature Proposal] Pull Request to support Erlang-only validate_doc_read functions [Feature Proposal] Pull Request to add support for Erlang-only validate_doc_read functions Nov 9, 2018
@wohali
Copy link
Member

wohali commented Nov 9, 2018

I'm -0.5 on the proposal.

Concerns:

  1. Overall throughput decrease if a database doesn't have any of these new functions, just from having to traverse your new code path on every read,
  2. Lack of security and sandboxing in Erlang-based functions, which is why we ship with the Erlang query server disabled by default

I know that point 2 above is why this feature will never land at Cloudant and would likely never land at anyone running a public/SaaS CouchDB offering. A potential alternative would be #1554 's approach.

The only way to handling point 1 above is to actually have proper load testing & metrics showing the impact. This is something Cloudant devs have done with their major functionality changes in PRs over the past couple of years. Do you have any data points to share?

The alternative to this was already proposed by the committer list and is in the roadmap is the per-document permissions: the _access proposal. See https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E
This approach already has a bunch of code behind it as well, and would vastly improve on the current db-per-user access model. With _access would we even need validate_doc_read functions? I don't think so.

@ermouth
Copy link
Contributor

ermouth commented Nov 10, 2018

There exist approaches to filter outgoing data streams very fast without intervening CouchDB codebase – see ie https://github.com/ermouth/covercouch – however I doubt it’s useful for 2.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants