-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] - New Building Block: DocumentStore #5146
Comments
|
I think this is a good direction as it brings focus and expands Dapr's state management features:
The Query API allows to query Redis, which is a very useful feature being used today. Will this ability be retained with having Redis as a document store? |
Redis with RedisJSON is a document store -- so I think we could add that particular flavor as a supported DocumentStore @yaron2 |
This is pretty cool! Are there any reasons that would preclude, AWS DocumentDb, Azure CosmosDb included in this? Not saying they should be in for day 1, just curious. Also, just to be devils advocate here, may I ask the question : Are we absolutely sure this isn't a new capability which is built on-top of the existing State Management building block? I'm only asking because we went through a similar discussion when discussing the new blob/streaming API capabilities, with which consensus eventually pointed towards delivering the capability via the existing State Management Building Block and not introduce a new State Management Building block. |
No reason, these are perfectly valid document stores.
That was my concern as well first reading this, but unlike blobs which fit existing k/v semantics in terms of API, the document store interface has a large number of distinct and domain specific endpoints/methods to justify a new API. |
I'm lazy and didn't feel like listing every component. Yes those components you mentioned also should be supported DocumentStores. Couldn't think of them in the moment.
It's not a good experience when only a small subset of state store component have a certain feature. As @yaron2 mentioned, the feature matrix gets too complex. What is worse, it is incredibly confusing when querying of documents is completely dependent on the content-type chosen when saving state. This way it is currently possible to save state with MongoDB without actually being able to query the state. DocumentStores should only support content types that are guaranteed to be queryable. Many DocumentStores have additional capabilities to update multiple documents at once (all that match a certain query / condition), only replacing certain sub properties - that capability for example would not make sense within the context of the current state stores. The fact that Query API is currently part of the State Store interface has lead to some hacky attempts at implementing Query API support for state stores which do not truly support this. We should provide for a more consistent experience in querying documents, and we should not allow letting a user write data to a DocumentStore which subsequently cannot be queried. |
Also note: |
Overall LGTM. You mention
Do you reckon it should be a goal to be compatible with data written by non-Dapr users? (I understand the need to not be compatible with data written by State Stores). |
Yes, data written by non-Dapr users in general should be compatible. We should entirely shy away from specialized data representation @yaron2. So in that sense state store data can be accessed too but you would need to understand the internals of Dapr state store to do that. |
I am fine stating that State Store data and DocumentStore data are not compatible. It prevents crossing the streams and, more importantly, allows them to grow independently where otherwise this coupling would limit and entangle them and force compliance checks continuously. I would like to avoid non-deterministic results and thus state upfront that users should not mix State Store data with DocumentStore features. Also, making sure you haven't missed my question about document deletion above. |
Added Bulk Save and Bulk Delete to be clearer. I need to look a bit more at common APIs. We could require Save, Get, Delete etc to support one or more documents. Not sure that these really need to be distinct APIs. They could all support one or more items. |
Open questions:
|
Some relevant API references: |
Mongo APIs looks like this:
Meanwhile for RethinkDB this looks like:
Meanwhile Cosmos DB has: -- On the document creation and update front it seems the consensus is: POST /document/<id> // creates a document
PUT /document/<id> // replaces a document
PATCH /document/<id> // partially updates the document |
@fabistb this proposal would also make more sense for most of our state store scenarios |
I was just listening to the discussion in the community call 71 and I would vote that Dapr keeps its own query language abstraction. If some of the component manifestations then in turn can use MongoDB API to talk to multiple providers - fine - but I would not make it any kind of a dependency. |
@jjcollinge I do want to stress that we must not be burned by prior discussions and art with regards to query API in state store. After all we will focus on true document stores as opposed to relational databases which also previously had to be supported in state store by such an abstraction. The problem is that the common set of query operations supported by all document stores can be quite limited. For example in Cosmos DB the full query capabilities are only available when restricting queries to a single partition. Cross-partition queries (this is the current implementation in state store) only supported a very limited filter set. |
Understood, I was simply linking the 2 issues because although this issue is concerned with a specialisation of the problem space - there is value in understanding the lineage of the existing constraints on the query API. |
This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions. |
Would be cool if we could build index support into the query adapters for the various document stores. As mentioned above, Postgres GIN indexes require a slightly different where condition syntax, it would be nice if the Postgres document store accepted some metadata to configure it for using the GIN syntax. I believe there is a similar Text index/query pattern in MongoDB, may be good to handle that as well. |
Please note that the official proposal is in the proposals repo instead. |
It's too early to consider that. Likely the first components would be MongoDB and maybe CosmosDB (depends on who does the work). There will eventually need to be a separate issue in the contrib repo to add a Postgres document store component (once the building block API exists). That will be the place to then talk about the specific requirements for the various component implementation. But in all likelihood they may be added in future iterations of those components and that functionality may not be available when the component is first added. It will also depends on who adds the component. |
Hey guys! Just wondering what is the current thinking around Document Store? Seems this thread has been dead for a while. Is it still in the works? |
+1 for seeing this become a priority. Users wanting a query interface comes up very frequently on the Discord community, theres no doubt in my mind that a Document Store building block wouldn't be successful. |
@berndverst I noticed the work going on around improving the Cosmos Query API, good stuff. I am however concerned about the strong guidance you've expressed a few times to avoid the Query API. Do you know if there has been any discussion around the DocumentStore building block recently in light of the above community vote? |
Proposal - New Building Block: DocumentStore
Background:
As one of the maintainers of components-contrib it is very apparent that some state stores and their respective use cases are unlike others. MongoDB, RethinkDB and some uses of PostgreSQL are some examples here which store (in the case of PostgreSQL this is optional) data very differently from other state store components.
A DocumentStore allows accessing individual nested properties of a document. These can also be queried. Importantly, data types are retained on the nested properties in many document stores.
Interface
All DocumentStore components should have the following:
The Query and Get operations should support filtering (projecting) of attributes/properties returned. This is done natively by the DocumentStore where supported and to be done by the component implementation if this is not natively supported.
Note: There is no intention to be compatible with data written / stored via state stores as this can lead to inefficient and complex design / implementation decisions as well as anti-patterns. However, the DocumentStore should be able to read data created by non-Dapr sources.
Content Type support requirements
application/bson
): This is the default content-type that all document stores must support as it contains data type information.application/json
): This should be supported, but its use generally discouraged as it is a lossy format which for example cannot distinguish between integer and float data types.As a consequence of this of this proposal:
Potential REST API (request parameters and details not included here).
The text was updated successfully, but these errors were encountered: