Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] - New Building Block: DocumentStore #5146

Open
berndverst opened this issue Sep 8, 2022 · 27 comments
Open

[Proposal] - New Building Block: DocumentStore #5146

berndverst opened this issue Sep 8, 2022 · 27 comments

Comments

@berndverst
Copy link
Member

berndverst commented Sep 8, 2022

Proposal - New Building Block: DocumentStore

Background:

As one of the maintainers of components-contrib it is very apparent that some state stores and their respective use cases are unlike others. MongoDB, RethinkDB and some uses of PostgreSQL are some examples here which store (in the case of PostgreSQL this is optional) data very differently from other state store components.

A DocumentStore allows accessing individual nested properties of a document. These can also be queried. Importantly, data types are retained on the nested properties in many document stores.

Interface

All DocumentStore components should have the following:

  • Get Document (by key)
  • Multi Get Documents (by key)
  • Create Document
  • Replace Document
  • Delete Document
  • Multi Delete Documents
  • Query (Find) Documents Query API support (native support of searching within documents)
  • Update Document -- a HTTP Patch operation which can be used to replace nested document attributes. This should support a query filter since many document stores allow updating multiple documents matching a query.

The Query and Get operations should support filtering (projecting) of attributes/properties returned. This is done natively by the DocumentStore where supported and to be done by the component implementation if this is not natively supported.

Note: There is no intention to be compatible with data written / stored via state stores as this can lead to inefficient and complex design / implementation decisions as well as anti-patterns. However, the DocumentStore should be able to read data created by non-Dapr sources.

Content Type support requirements

  • BSON (application/bson): This is the default content-type that all document stores must support as it contains data type information.
  • JSON (application/json): This should be supported, but its use generally discouraged as it is a lossy format which for example cannot distinguish between integer and float data types.

As a consequence of this of this proposal:

  • The Query API (Alpha) should eventually be deprecated from State Store (it can coexist until DocumentStores are stable).
  • MongoDB, RethinkDB, PostgreSQL, AWS DocumentDB, Azure CosmosDB, and possibly others should also be made available as DocumentStores.
  • SDKs need to implement support for this new building block and support BSON encoding/decoding.

Potential REST API (request parameters and details not included here).

Create document:
POST http://localhost:<daprPort>/v1.0/document/<storename>/<collection>

Replace Document:
PUT http://localhost:<daprPort>/v1.0/document/<storename>/<collection>/<docid>

Update Document:
PATCH http://localhost:<daprPort>/v1.0/document/<storename>/<collection>/<docid>

Get Document by ID:
GET http://localhost:<daprPort>/v1.0/document/<storename>/<collection>/<docid>

Get Multiple Documents by ID:
GET http://localhost:<daprPort>/v1.0/document/<storename>/<collection>

Delete Document by ID:
DELETE http://localhost:<daprPort>/v1.0/document/<storename>/<collection>/<docid>

Delete Multiple Documents by ID:
DELETE http://localhost:<daprPort>/v1.0/document/<storename>/<collection>

Query Documents:
GET http://localhost:<daprPort>/v1.0/document/<storename>/<collection>/query
@berndverst
Copy link
Member Author

berndverst commented Sep 8, 2022

Create document:
POST http://localhost:<daprPort>/v1.0/document/<storename>/<collection>

Replace Document:
PUT http://localhost:<daprPort>/v1.0/document/<storename>/<collection>/<docid>

Update Document:
PATCH http://localhost:<daprPort>/v1.0/document/<storename>/<collection>/<docid>

Get Document by ID:
GET http://localhost:<daprPort>/v1.0/document/<storename>/<collection>/<docid>

Get Multiple Documents by ID:
GET http://localhost:<daprPort>/v1.0/document/<storename>/<collection>

Delete Document by ID:
DELETE http://localhost:<daprPort>/v1.0/document/<storename>/<collection>/<docid>

Delete Multiple Documents by ID:
DELETE http://localhost:<daprPort>/v1.0/document/<storename>/<collection>

Query Documents:
GET http://localhost:<daprPort>/v1.0/document/<storename>/<collection>/query

@yaron2 yaron2 changed the title New Building Block Type: DocumentStore [Proposal] - New Building Block: DocumentStore Sep 8, 2022
@yaron2
Copy link
Member

yaron2 commented Sep 8, 2022

I think this is a good direction as it brings focus and expands Dapr's state management features:

  1. State for K/V - existing API
  2. State for Document Stores - Document API
  3. State for SQL DBs - Database API proposal (in discussion phase)

The Query API (Alpha) should be deprecated from State Store

The Query API allows to query Redis, which is a very useful feature being used today. Will this ability be retained with having Redis as a document store?

@berndverst
Copy link
Member Author

I think this is a good direction as it brings focus and expands Dapr's state management features:

  1. State for K/V - existing API
  2. State for Document Stores - Document API
  3. State for SQL DBs - Database API proposal (in discussion phase)

The Query API (Alpha) should be deprecated from State Store

The Query API allows to query Redis, which is a very useful feature being used today. Will this ability be retained with having Redis as a document store?

Redis with RedisJSON is a document store -- so I think we could add that particular flavor as a supported DocumentStore @yaron2

@olitomlinson
Copy link

This is pretty cool!

Are there any reasons that would preclude, AWS DocumentDb, Azure CosmosDb included in this? Not saying they should be in for day 1, just curious.


Also, just to be devils advocate here, may I ask the question :

Are we absolutely sure this isn't a new capability which is built on-top of the existing State Management building block?

I'm only asking because we went through a similar discussion when discussing the new blob/streaming API capabilities, with which consensus eventually pointed towards delivering the capability via the existing State Management Building Block and not introduce a new State Management Building block.

@yaron2
Copy link
Member

yaron2 commented Sep 9, 2022

Are there any reasons that would preclude, AWS DocumentDb, Azure CosmosDb included in this? Not saying they should be in for day 1, just curious.

No reason, these are perfectly valid document stores.

Are we absolutely sure this isn't a new capability which is built on-top of the existing State Management building block?

That was my concern as well first reading this, but unlike blobs which fit existing k/v semantics in terms of API, the document store interface has a large number of distinct and domain specific endpoints/methods to justify a new API.
As far as users are concerned, there won't be much difference between adding the endpoints to the existing API and creating a new one, save for a new component type - but even that I think goes much in the direction of clarity as users won't need to look at yet another column in the state components table to see which supports document operations or not.

@berndverst
Copy link
Member Author

berndverst commented Sep 9, 2022

This is pretty cool!

Are there any reasons that would preclude, AWS DocumentDb, Azure CosmosDb included in this? Not saying they should be in for day 1, just curious.

I'm lazy and didn't feel like listing every component. Yes those components you mentioned also should be supported DocumentStores. Couldn't think of them in the moment.

Also, just to be devils advocate here, may I ask the question :

Are we absolutely sure this isn't a new capability which is built on-top of the existing State Management building block?

I'm only asking because we went through a similar discussion when discussing the new blob/streaming API capabilities, with which consensus eventually pointed towards delivering the capability via the existing State Management Building Block and not introduce a new State Management Building block.

It's not a good experience when only a small subset of state store component have a certain feature. As @yaron2 mentioned, the feature matrix gets too complex.

What is worse, it is incredibly confusing when querying of documents is completely dependent on the content-type chosen when saving state. This way it is currently possible to save state with MongoDB without actually being able to query the state.

DocumentStores should only support content types that are guaranteed to be queryable.

Many DocumentStores have additional capabilities to update multiple documents at once (all that match a certain query / condition), only replacing certain sub properties - that capability for example would not make sense within the context of the current state stores.

The fact that Query API is currently part of the State Store interface has lead to some hacky attempts at implementing Query API support for state stores which do not truly support this. We should provide for a more consistent experience in querying documents, and we should not allow letting a user write data to a DocumentStore which subsequently cannot be queried.

@berndverst
Copy link
Member Author

Also note:
It is not a goal of DocumentStore to be compatible with data written by State Stores. Document Store will make decisions how to save data in the most appropriate manner to facilitate the querying of data, partial updates, possibly bulk updates etc.

@yaron2 yaron2 added this to the v1.10 milestone Oct 17, 2022
@yaron2
Copy link
Member

yaron2 commented Oct 17, 2022

Overall LGTM.

You mention Delete documents (as in plural), which I assume is Bulk delete documents? if that's the case, should we add a Delete document operation?

It is not a goal of DocumentStore to be compatible with data written by State Stores. Document Store will make decisions how to save data in the most appropriate manner to facilitate the querying of data, partial updates, possibly bulk updates etc.

Do you reckon it should be a goal to be compatible with data written by non-Dapr users? (I understand the need to not be compatible with data written by State Stores).

@berndverst
Copy link
Member Author

Yes, data written by non-Dapr users in general should be compatible. We should entirely shy away from specialized data representation @yaron2.

So in that sense state store data can be accessed too but you would need to understand the internals of Dapr state store to do that.

@yaron2
Copy link
Member

yaron2 commented Oct 18, 2022

So in that sense state store data can be accessed too but you would need to understand the internals of Dapr state store to do that.

I am fine stating that State Store data and DocumentStore data are not compatible. It prevents crossing the streams and, more importantly, allows them to grow independently where otherwise this coupling would limit and entangle them and force compliance checks continuously.

I would like to avoid non-deterministic results and thus state upfront that users should not mix State Store data with DocumentStore features.

Also, making sure you haven't missed my question about document deletion above.

@berndverst
Copy link
Member Author

Added Bulk Save and Bulk Delete to be clearer.

I need to look a bit more at common APIs. We could require Save, Get, Delete etc to support one or more documents. Not sure that these really need to be distinct APIs. They could all support one or more items.

@berndverst
Copy link
Member Author

berndverst commented Oct 19, 2022

Open questions:

  • Query API: Should we reuse the generic queries we currently support in state store? Should we also enable the capability to run native queries (for those who need that) with the caveat that such queries cannot be run across different components?

@berndverst
Copy link
Member Author

Some relevant API references:
RethinkDB: https://rethinkdb.com/api/javascript/get
MongoDB (also AWS DocumentDB, Azure Cosmos DB): https://pkg.go.dev/go.mongodb.org/mongo-driver@v1.10.3/mongo
Azure Cosmos DB (SQL): https://github.com/Azure/azure-sdk-for-go/tree/main/sdk/data/azcosmos

@berndverst
Copy link
Member Author

berndverst commented Oct 19, 2022

Mongo APIs looks like this:

db.collection.find() // returns everything in collection
db.collection.find(query) // returns everything in collection matching query
db.collection.find(query, projection) // returns everything in collection matching query and returns only projected attributes

Meanwhile for RethinkDB this looks like:

- get(key) // gets document by primary keys
- getAll([key1, key2,...]) // gets multiple docs by primary key
- filter(query) // queries for all documents matching the query condition

Meanwhile Cosmos DB has:
Get Document (by collection and key)
List Documents (by collection)
Query Documents (note querying across multiple partitions is only supported with very simple queries)

--

On the document creation and update front it seems the consensus is:

POST /document/<id> // creates a document
PUT /document/<id> // replaces a document
PATCH /document/<id> // partially updates the document

@KaiWalter
Copy link
Contributor

@fabistb this proposal would also make more sense for most of our state store scenarios

@KaiWalter
Copy link
Contributor

I was just listening to the discussion in the community call 71 and I would vote that Dapr keeps its own query language abstraction. If some of the component manifestations then in turn can use MongoDB API to talk to multiple providers - fine - but I would not make it any kind of a dependency.

@berndverst
Copy link
Member Author

@jjcollinge I do want to stress that we must not be burned by prior discussions and art with regards to query API in state store. After all we will focus on true document stores as opposed to relational databases which also previously had to be supported in state store by such an abstraction.

The problem is that the common set of query operations supported by all document stores can be quite limited.

For example in Cosmos DB the full query capabilities are only available when restricting queries to a single partition. Cross-partition queries (this is the current implementation in state store) only supported a very limited filter set.

@jjcollinge
Copy link
Contributor

jjcollinge commented Oct 24, 2022

Understood, I was simply linking the 2 issues because although this issue is concerned with a specialisation of the problem space - there is value in understanding the lineage of the existing constraints on the query API.

@dapr-bot
Copy link
Collaborator

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot dapr-bot added the stale Issues and PRs without response label Jan 20, 2023
@artursouza artursouza removed the stale Issues and PRs without response label Jan 23, 2023
@joshuadmatthews
Copy link

Would be cool if we could build index support into the query adapters for the various document stores. As mentioned above, Postgres GIN indexes require a slightly different where condition syntax, it would be nice if the Postgres document store accepted some metadata to configure it for using the GIN syntax. I believe there is a similar Text index/query pattern in MongoDB, may be good to handle that as well.

@berndverst
Copy link
Member Author

Please note that the official proposal is in the proposals repo instead.

@berndverst
Copy link
Member Author

berndverst commented Feb 23, 2023

Would be cool if we could build index support into the query adapters for the various document stores. As mentioned above, Postgres GIN indexes require a slightly different where condition syntax, it would be nice if the Postgres document store accepted some metadata to configure it for using the GIN syntax. I believe there is a similar Text index/query pattern in MongoDB, may be good to handle that as well.

It's too early to consider that. Likely the first components would be MongoDB and maybe CosmosDB (depends on who does the work). There will eventually need to be a separate issue in the contrib repo to add a Postgres document store component (once the building block API exists). That will be the place to then talk about the specific requirements for the various component implementation. But in all likelihood they may be added in future iterations of those components and that functionality may not be available when the component is first added. It will also depends on who adds the component.
@joshuadmatthews

@joshuadmatthews
Copy link

Hey guys! Just wondering what is the current thinking around Document Store? Seems this thread has been dead for a while. Is it still in the works?

@olitomlinson
Copy link

olitomlinson commented Oct 5, 2023

+1 for seeing this become a priority.

Users wanting a query interface comes up very frequently on the Discord community, theres no doubt in my mind that a Document Store building block wouldn't be successful.

@olitomlinson
Copy link

Well… the community has spoken! I think this looks like a clear mandate that this proposal needs revisiting and executing!

IMG_7052

@joshuadmatthews
Copy link

joshuadmatthews commented Nov 21, 2023

@berndverst I noticed the work going on around improving the Cosmos Query API, good stuff. I am however concerned about the strong guidance you've expressed a few times to avoid the Query API. Do you know if there has been any discussion around the DocumentStore building block recently in light of the above community vote?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants