Event Metadata #780

jeremydmiller · 2017-05-30T19:16:35Z

Had a couple folks ask lately if there was any way to embed metadata into the event store. The usual examples are things like customer id or region names.

We could handle this by saying "just use base classes for the common information." From there we'd possibly allow some aggregated querying against the event store by the base type.

Other ideas:

Support an extra HStore column for user supplied key/value header information
Support something like duplicated fields for the mt_events and/or mt_streams table

The question though becomes how do we expose that information and allow users to search on it? Sticking a Dictionary<string, object> property on Event is no big deal.

I guess my question is, what would you use this for? Where would you want to consume this information?

Also really unsure how we'd go about capturing this information in the IEventStore API

The text was updated successfully, but these errors were encountered:

wastaz · 2017-06-09T06:37:54Z

For me personally I would like to use event metadata for the following cases:

Store command id of the command responsible for causing the event.
Store event "tags" in order to be able to read up partial streams.

In both of these cases the only thing I would really require is an api similar to
EventStore.GetEventsMatchingMetadata(stream_id, metadata_key, metadata_value)
This api is of course quite ugly and I'm in no way proposing that this should be the api. But if Im looking at the tiniest thing I would need on the query side in order to solve my current use cases then this would be it.

Im always a bit wary of introducing Dictionary<string, object> as an offical api. Maybe it could be possible to add a strongly typed event metadata object?

Im thinking maybe something like

class AccountOpenedMetadata {
    [Duplicate]
    public Guid CustomerId { get; set; }
    public string Tag { get; set; }
    public Guid CommandId { get; set; }
}
class AccountOpened {
    Guid AccountId { get; set; }
    Guid OpenedAt { get; set; }
}

session.Events.AppendEvent(new AccountOpened(), new AccountOpenedMetadata());

At this point maybe we could wrap the AccountOpenedMetadata in a wrapper class that adds a link back to the original event, and store the metadata as a normal marten document (which could make attributes such as Duplicate work as well) and we could query the metadata objects like normal marten docs?

Not at all sure if this is a good idea, or how doable it would be but I figured Id throw the idea out there and see what you think.

jeffdoolittle · 2017-06-09T14:18:15Z

@wastaz perhaps I'm misunderstanding but it sounds like everything you described is already doable in Marten. If you want to store changes to events as well as a "metadata" document, you can do that in one DocumentSession and accomplish what you describe. Sure there isn't an official method for this on DocumentSession but it would be pretty easy to spin up your own general use abstraction for this.

Am I understanding you or am I missing something?

wastaz · 2017-06-12T06:25:34Z

@jeffdoolittle Well, yes and no. It's certainly possible today, but there's a bit of heavy lifting involved and it's not really possible to get it done in "one query" without writing direct sql queries against the tables. Basically my point is that if we are considering event metadata as a feature and we already have the docstore stuff in marten then why not try to use it as much as possible? :)

jeffdoolittle · 2017-06-12T18:03:16Z

I don't recall off the top of my head, but can you do a batch query that includes documents and events? Might help if your concern is trips to the database.

To be clear, I'm not opposed to Event Metadata, just trying to understand how people would want to use it so we can come up with a good abstraction and api for it that will stand the test of time.

aprooks · 2018-03-30T15:40:02Z

I'm evaluating Marten and stumbled on this issue almost instantly. Metadata is important part
"stream-entry". It's not an event data, but some infrastructure concerns that are written along side an event.

Some of metadata fields I usually use are: user Id and scopes, correlation/causation ids (especially useful when debugging process managers). Your aggregate might have zero knowledge of these but they should be written along side business data - event.

Seems like it can be achieved by using data envelope like:

class Envelope<T> {
Data T {get;set;}
Dictionary<string,string> Metadata {get;set;}
}

But it feels like it'll make code especially aggregates uglier.

eouw0o83hf · 2018-08-13T15:09:59Z

Also evaluating Marten and lack of metadata capability is a big deal. Other frameworks provide this readily, and an enterprise event store needs to be able to handle metadata/headers.

@aprooks , I actually pursued that route and there's some deserialization bug lurking within Marten that actually renders that route infeasible for now. See #1069 for details.

jeremydmiller · 2018-08-17T16:32:58Z

@eouw0o83hf This got left out of Marten 2.0 because it's a lot of work and there wasn't much demand for it at the time. Since it is an awful lot of work to support this and there's very little concrete definition about what it means or how it'd be used, can you add some concrete examples of what you want here? And as I commented in #1069, you could easily effect this yourself with a base class

eouw0o83hf · 2018-08-17T18:21:42Z

@jeremydmiller definitely doable with a base class (and absolutely understand that "not enough of a priority" reasoning), but I've used NEventStore in the past and the ability to persist headers gave a lot of extra strength to an event-sourcing framework when called upon as an audit log (for either security/"who did this" purposes or debugging/"what did the user do" purposes).

Top few things I've tossed into headers/metadata that were super useful:

ExecutingUserId: Who performed the action which resulted in this event?
AuthenticatedUserId: In a system which supports impersonation, what administrator took this action in the role of another user?
RequestId: Unique identifier for the web request which resulted in this event, useful for cross-correlating with Splunk/logging for debugging
CorrelationId: For a distributed transaction or chain of business events, what do we need to tie this event to (again useful for debugging logs)

eouw0o83hf · 2018-08-17T18:23:03Z

Oh and also in direct response to the original question, I think a straightforward Dictionary<string, object> property on Event would solve this simply and beautifully.

bklooste · 2018-12-07T00:43:11Z

I would recommend a byte[] outside of event , like Event store does .. There are many cases where you want to read all the events get the metadata but not de-serialize all the bodies.

jeremydmiller · 2018-12-07T00:49:54Z

@bklooste YOu can do that to your heart's content with SQL as is

bklooste · 2018-12-11T23:05:40Z

@jeremydmiller
I had a look and its not trivial adding a metadata column to the events table do this , as all the other parts of the lib are not aware of it , not to mention schema generation,.. I suppose you could build up a lookup table.

Note the use case of category or aggregate streams which may be > 100M events and you don't want to deserialize them all .

wclr · 2019-05-19T19:19:01Z

What is the actual difference between "data" and "metadata" in this case? If by metadata meant something specific to a particular event type, why not put in "data" (payload)? Where is the actual boundary between data metadata? How should one determine what specifics deserves to go to metadata vs event data?

aprooks · 2019-05-19T21:06:50Z

If by metadata meant something specific to a particular event type, why not put in "data" (payload)?

metadata is application/protocol specific. Imagine HTTP-headers: you might have a required attribute (Authorize) and some helpers, like x-tracing-id. So anything very general, applicable to all event types, and modifiable during handling pipelines could be stored alongside a message body and be called metadata.

The approach you described will work, but it will be semantically wrong.

bklooste · 2019-05-19T22:28:10Z

@whitecolor the key thing is it requires the consumer to inspect and know the data structure. Metadata tends to be a universal / loose structure which can also be used to make decisions on where data should be routed or partitioned without knowing the data structure. It should also be small where as a data message maybe a Meg.

bklooste · 2019-05-19T22:57:01Z

Note here several types of metadata an organization may employ and note while a message is typically immutable , metadata may not be. Its important to be able to guarantee bodies are immutable.

Here are the 3 key scenarios in decreasing order of importance. I tend to use some of 1 and 2.

Convenience information without message inspection .. Partitioning / keys/ Event Create Time ( not Db create) / Routing/Archiving. eg decryption key for immutable message for EU , when past time or you need to delete that data throw away the key. Many cases here for bigger systems.
General Tracking / audit / correlation/ tracing information eg http headers
Event specific business information. These can be in the event but mutability can get in the way , can also be used to enhance the event while keeping a core structure..

wclr · 2019-05-22T22:21:54Z

Ok, thanks for elaboration, let's see the example case:
If there is some Item aggregate and Users can create/modify it. User id committing action in the whole system can be treated as a kind of general/universal info. But this also potentially information about the user that created/edit an Item though not critical but may be useful for the business in a certain scenario. Should this User id go to metadata or event data? What should be considered for that decision?

aprooks · 2019-05-23T21:33:11Z

UserId could be stored as part of the metadata for every event. For example, if there is a requirement to have a fully traceable to a User audit log of every change in a system.

The same UserId could be part of business logic and event body, for example, if you have a requirement like "a user can only toggle an item created by himself".

There is nothing bad if data occasionally duplicates. Those fields will be handled separately by different application layers/parts/components.

oskardudycz · 2019-06-08T11:11:27Z

I was also thinking about adding Metadata, for sure I'd see there something like correlationId \ traceId \ requestId plus possibility to inject some other custom data. That could really help with diagnostic, analysis, and not polutting events business data.

I don't have specific design yet, but the first guess would be to add it as new column simmilar to data as it's for current json. I'd need to check how EventStore or other simmilar systems are handling that for some inspiration. I see benefit on having that, but that won't be finished sooner than Marten 4.0.0, it would be needed also to have consensus on that with @jeremydmiller @mysticmind and @jokokko.

bklooste · 2019-06-13T07:11:42Z

This is what we are using Time and NodeId are time event was created and node where it was created which can be different to the time its inserted in the DB especially if you have multiple DBs

public class MetaData 
{
    public Guid PartitionId { get; set;}   
    public Guid NodeId { get; set;}

    [JsonProperty("$correlationId")]
    public Guid? CorrelationId { get; set; } 
    [JsonProperty("$causationId")]
    public Guid? CausationId { get; set; }
    public DateTime Time { get; set; }

}

CorrelationId and CausationId come from Event store - note the pattern here its a json document but the db uses JObject["$correlationId"] so the consumer is free to use any structure they choose.

jacobpovar · 2019-07-07T08:19:17Z

@oskardudycz I created a POC implementation of metadata column no so long ago - https://github.com/JasperFx/marten/compare/master...jacobpovar:metadataColumn?expand=1. Not completely finished, but might be worth to take a look at.

jeremydmiller · 2019-07-07T22:07:00Z

@jacobpovar @oskardudycz Just a couple thoughts on the POC code:

It'd be relatively expensive at performance time. JsonConvert.SerializeObject() isn't a very efficient way to do the serialization
I think I'd maybe vote to make the deserialization on Event be lazy 'cause you don't know when it's necessary or not
I think we need to do a big rethink on the projection support soon where projections can either declare much more about what they need (stream id, version, just the data, metadata) as a farther optimization of the async daemon
I think the usage of metadata needs to be "opt in" as part of the event store configuration so you don't get the perf hit without anything going on. So if you're not using metadata at all, there's no serialization hit of any kind for the metadata.
Might consider using the same kind of mechanics as the duplicated fields for the metadata storage, but that's going to force you to declare the necessary metadata options upfront. Do that and it's going to be way easier to query on the event table
We might also consider allowing you to use a custom Event<T> type that adds either a Dictionary type or individual fields that could be persisted & loaded. So something like StoreOptions.Events.EventBaseType = typeof(MyCustomBase<>);, and we derive additional fields from that new base type. But that base type would have to extend Event<T> somehow
For capturing events, I'd vote for overloads like Append(Guid stream, IEnumerable<KeyValuePair<string, object>>, params object[] @events) and say that the metadata gets into each posted event. The {{key=value}, {key=value}} literal syntax should be helpful here
Gotta think about metadata on streams vs. events too.

bklooste · 2019-07-08T00:22:39Z

Other options around serialization. Could also leave it at byte[] and leave it up to the consumer , some people may just put a guid in there.

Note a huge / key difference is metadata is normally the same fixed type for all events vs different types for the event. This allows more optimal custom serialization / deserialization. 2 optional Factory Funcs could help people here.

Agree should be opt in .
Don't agree meta data should be lazy but the event itself probably should be .
Yes projections often inspect meta data than ignore irrelevant messages without deserializing the message.

Id forget about any mights / shoulds - get the basics in do it well , enhance later. Focus on persistence structure /schema.

jeremydmiller · 2019-07-08T01:24:52Z

@bklooste Thanks for the input. The "mights" and "shoulds" above are questions about how to pull it off, and not necessarily about features.

The custom factory Funcs are interesting, but i'm not sure about the usability. I'm also trying to think about how the metadata is going to be consumed from both straight up SQL querying and C# code. Making it a byte[] renders the data almost useful w/o the C# code and that's something else to consider.

bklooste · 2019-07-08T03:39:56Z

Basically AddMetaData ( Func<T, byte[]) metadataCreator = null , Func<T, byte[]) metadataWriter = null ) params on setup where T is the metadata type ..
If null just use Json as above but at least the consumer can do this

AddMetaData ( val => val.ToByteArray() ,val => new Guid( val));

or add custom json serializer, convert to new metadata type or anything else.

oskardudycz · 2019-10-04T12:05:56Z

There is a PR for adding Metadata to documents from @barryhagan. Interested persons are welcome to pleas the comments there :) #1364

jeremydmiller · 2020-12-20T22:37:39Z

Tasks

Some kind of properties on EventGraph that could be used to enable causation id, correlation id, and the header collection.
Add CausationId, CorrelationId, and Headers to the IEvent interface
Write these properties to IEvent if they're set on the IDocumentSession.
If those properties are enabled, add an extra column to the EventsTable. Make sure that the column objects added to the EventsTable all implement the IEventTableColumn to get them to play well with the code generation. The code generation should be pretty simple. The implementation of IEventTableColumn should look a lot like TenantIdColumn
Look at the global policies to add metadata globally and have the EventGraph properties enabled as well

jeremydmiller added the event store label May 30, 2017

jeremydmiller added this to the 2.0 milestone May 30, 2017

jeremydmiller mentioned this issue May 30, 2017

Event Store Overhaul for 2.0 #781

Closed

jeremydmiller modified the milestone: 2.0 Jun 7, 2017

jeremydmiller added the enhancement label Oct 7, 2017

eouw0o83hf mentioned this issue Aug 13, 2018

Deserialization error with generically-typed events #1069

Closed

oskardudycz added this to the 4.0 milestone Jun 8, 2019

jeremydmiller mentioned this issue Jul 7, 2019

ARCHIVED -- Event Store Improvements for v4 #1307

Closed

MateuszNaKodach mentioned this issue Nov 5, 2019

EventSourcing - implementacja, correlationId itp. MateuszNaKodach/DrogaNowoczesnegoArchitekta#80

Open

12 tasks

jeremydmiller mentioned this issue Sep 24, 2020

Flexible Document Metadata #1337

Closed

jeremydmiller mentioned this issue Nov 18, 2020

Event Store Improvements for V4 #1608

Closed

jeremydmiller self-assigned this Nov 30, 2020

mysticmind self-assigned this Mar 24, 2021

jeremydmiller mentioned this issue Apr 6, 2021

Mechanism to Import Events #1435

Open

mysticmind mentioned this issue Apr 9, 2021

Implement flexible event metadata #1792

Merged

jeremydmiller closed this as completed in #1792 Apr 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Event Metadata #780

Event Metadata #780

jeremydmiller commented May 30, 2017 •

edited

Loading

wastaz commented Jun 9, 2017

jeffdoolittle commented Jun 9, 2017

wastaz commented Jun 12, 2017 •

edited

Loading

jeffdoolittle commented Jun 12, 2017

aprooks commented Mar 30, 2018

eouw0o83hf commented Aug 13, 2018

jeremydmiller commented Aug 17, 2018

eouw0o83hf commented Aug 17, 2018

eouw0o83hf commented Aug 17, 2018

bklooste commented Dec 7, 2018 •

edited

Loading

jeremydmiller commented Dec 7, 2018

bklooste commented Dec 11, 2018

wclr commented May 19, 2019 •

edited

Loading

aprooks commented May 19, 2019 •

edited

Loading

bklooste commented May 19, 2019

bklooste commented May 19, 2019 •

edited

Loading

wclr commented May 22, 2019

aprooks commented May 23, 2019

oskardudycz commented Jun 8, 2019 •

edited

Loading

bklooste commented Jun 13, 2019

jacobpovar commented Jul 7, 2019

jeremydmiller commented Jul 7, 2019

bklooste commented Jul 8, 2019

jeremydmiller commented Jul 8, 2019

bklooste commented Jul 8, 2019

oskardudycz commented Oct 4, 2019

jeremydmiller commented Dec 20, 2020 •

edited by mysticmind

Loading

Event Metadata #780

Event Metadata #780

Comments

jeremydmiller commented May 30, 2017 • edited Loading

wastaz commented Jun 9, 2017

jeffdoolittle commented Jun 9, 2017

wastaz commented Jun 12, 2017 • edited Loading

jeffdoolittle commented Jun 12, 2017

aprooks commented Mar 30, 2018

eouw0o83hf commented Aug 13, 2018

jeremydmiller commented Aug 17, 2018

eouw0o83hf commented Aug 17, 2018

eouw0o83hf commented Aug 17, 2018

bklooste commented Dec 7, 2018 • edited Loading

jeremydmiller commented Dec 7, 2018

bklooste commented Dec 11, 2018

wclr commented May 19, 2019 • edited Loading

aprooks commented May 19, 2019 • edited Loading

bklooste commented May 19, 2019

bklooste commented May 19, 2019 • edited Loading

wclr commented May 22, 2019

aprooks commented May 23, 2019

oskardudycz commented Jun 8, 2019 • edited Loading

bklooste commented Jun 13, 2019

jacobpovar commented Jul 7, 2019

jeremydmiller commented Jul 7, 2019

bklooste commented Jul 8, 2019

jeremydmiller commented Jul 8, 2019

bklooste commented Jul 8, 2019

oskardudycz commented Oct 4, 2019

jeremydmiller commented Dec 20, 2020 • edited by mysticmind Loading

Tasks

jeremydmiller commented May 30, 2017 •

edited

Loading

wastaz commented Jun 12, 2017 •

edited

Loading

bklooste commented Dec 7, 2018 •

edited

Loading

wclr commented May 19, 2019 •

edited

Loading

aprooks commented May 19, 2019 •

edited

Loading

bklooste commented May 19, 2019 •

edited

Loading

oskardudycz commented Jun 8, 2019 •

edited

Loading

jeremydmiller commented Dec 20, 2020 •

edited by mysticmind

Loading