New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flexible Document Metadata #1337
Comments
I'm generalizing this a bit. Usages:
|
How does custom Document Metadata work?I think the main usage we've heard so far has been for traceability, ala something like the Open Telemetry standard. So tracking correlation identifiers for the document saves, as well as the active user. I think these are the open questions:
This is very closely related to #780 for the event metadata, and I think there's going to be some shared functionality here. So let's talk about the mechanics, and this is a brainstorm w/ some of my own opinions called out: Open-Ended w/ Dictionary<string, object>The obvious "open-ended" way would be to persist a JSON serialized I'm a little concerned by this approach about performance because it forces you to do an extra serialization step within the unit of work operations. Strong-Typed Metadata ObjectsYou could have a user configure a
And this leads us to some mapping concerns. Is this stored as JSON serialized in each document/event record in one field? Do we split it out into separate fields? More on this later. Capturing Metadata on IDocumentSessionI think in this case you could capture the metadata by one of these methods on
I'm not enthusiastic about having folks set the metadata as part of individual Storing the MetaDataI see three options:
Configuring the DocumentStore
|
My take on this is really from the viewpoint of #1557 - I personally have little use for document metadata, and above all would like the option not to bloat my table storage with them if they're not needed. Having said that, I can understand use cases; for a simple example, a "last modified" timestamp is a really common thing to have, and you might want to use Marten's This is a hard problem, and I'm not sure you're going to find a method that suits everyone 😆, but I'll weigh in from the perspective of a document store user: How does custom Document Metadata work? Storing the Metadata,
I'm torn between (2) and (3) |
Yes! This would make the mapping of CurrentUser / CorrelationId, etc in a DocumentSessionListener much easier
The mt_lastmodified column currently has no technical reasons to exist, so it can be removed and if people need it, just define the property in the entity. The mt_version and mt_is_deleted* columns are needed by Marten, so I would leave them as is. To minimize the json (de-)serialization I would suggest to not make a separate column to store the metadata. If people need additional metadata to be stored, just defined the properties as part of the entity. The linq parser will handle the queries if people need to query on those fields. Less features, less work to maintain 😃 |
@cocowalla We've committed to "opt out" of all the document metadata in this issue too, so you're covered. |
From my perspective for sure metadata should be opt-out functionality (or even user could decide which set of fields to use). I understand the query performance perspective, but I don't see also why it should be a key factor - from my perspective querying based on the metadata shouldn't be the common case. This should be probably more used for diagnostics, debugging etc. when it's usually fine to have bigger latency than the regular one. Eg. what business logic would be querying by the correlation id? For sure fields like created, version etc. have use case (like eg. in Event Sourcing - time travelling). I'm not a huge fan of joins with a separate table. I think that this will make the stuff more complex for the advanced scenarios that we plan to add in future like using Postgres partitioning etc. But I agree, it's an option. I agree with the storage size perspective - for sure we need to keep an eye on that eg. by adding feature toggle. Maybe we could find some middle ground by selecting a set of columns that will be common (like correlation id, created, version etc.) and put them into columns, then allow the user to define their own set of data and store it in separate JSON column? I think that giving optional flexibility is a must-have, as we cannot cover all cases (and probably don't want to cover them). Eg. I talked with @jacobpovar and they needed to write their own Aggregator and other internals - to support conflict resolution for the event streams in a multi-master environment. I'm not sure tho if that wouldn't be too complex in implementation then. Also what about the possibility to automatically apply the metadata into the event or document class? That would be extremely useful (eg. for version, timestamp columns for event sourcing). @jeremydmiller thoughts? |
Oh, definitely go all the way on metadata. Either they're all in separate columns or all in a JSON serialized field, but not a mixed set. We've already agreed to make the existing metadata fields be opt out for storage. Any new metadata fields would be "opt in". "what about the possibility to automatically apply the metadata into the event or document class?" -- that's relatively easy if it's a public property/field. A tiny bit more work if we have to support private/internal setters. I'm not very worried about the query performance against metadata, because as you said, it's probably going to be rare. Insert time though, that matters, and that's why I don't like the JSON serialized header idea. The bigger question here was on how the metadata is captured. Any thoughts on some of the alternatives? |
Fair point about insert performance. It's extremelay important - especially for the event store. I think that it'd be worth doing some PoC on those approaches to see how big is the impact and what improvements can we have there. What do you think? Regarding the capturing - I believe that minimum that we have to do is injecting them in the session initialization - and this option probably should be always available even if we select other options - that's probably the most convinient way to inject telemetry related params (as it's usually injected in scope). I'm also not a huge fan of injecting them in SaveChanges, I'd probably prefer to let to pass them in the Insert, Store, AppendEvent methods - as then it'd be more explicit what we're trying to do and still let some custom business logic result be injected. I think that we can start with injecting them during the session initialization and then gather the feedback and then extend it later on if needed. P.S. I know that's might be out of scope, but I'd like to also add possibility to add stream type column into the events table - that can be useful for the outbox pattern and routing (eg. to Kafka topics). We could also consider to use same or similar mechanism for that. |
TaskingRefactorings
Goals
Nice to dos
Dev Tasks
|
@cocowalla I've got the ability to omit the informational metadata columns working in a local branch. It's document type by document type so far, but we'll add some policy support soon too. |
@jeremydmiller 🎉 great to hear, thanks for the update! |
Currently it's not possible to automatically map to the event fields stuff like: Version, Timestamp, StreamId etc.
Because of that it's not easy to implement elegant Aggregates + Repository implementation without the reflection or making some compromises like having mutable Version field, data duplication.
I think that we should give user possibility to map those fields automatically from metadata.
See more in discussion on the Event Sourced Aggregate+Repository sample: #1299
The text was updated successfully, but these errors were encountered: