Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event Metadata #780

Open
jeremydmiller opened this issue May 30, 2017 · 27 comments
Open

Event Metadata #780

jeremydmiller opened this issue May 30, 2017 · 27 comments
Assignees
Milestone

Comments

@jeremydmiller
Copy link
Contributor

@jeremydmiller jeremydmiller commented May 30, 2017

Had a couple folks ask lately if there was any way to embed metadata into the event store. The usual examples are things like customer id or region names.

We could handle this by saying "just use base classes for the common information." From there we'd possibly allow some aggregated querying against the event store by the base type.

Other ideas:

  1. Support an extra HStore column for user supplied key/value header information
  2. Support something like duplicated fields for the mt_events and/or mt_streams table

The question though becomes how do we expose that information and allow users to search on it? Sticking a Dictionary<string, object> property on Event is no big deal.

I guess my question is, what would you use this for? Where would you want to consume this information?

Also really unsure how we'd go about capturing this information in the IEventStore API

@jeremydmiller jeremydmiller added this to the 2.0 milestone May 30, 2017
@jeremydmiller jeremydmiller modified the milestone: 2.0 Jun 7, 2017
@wastaz
Copy link
Contributor

@wastaz wastaz commented Jun 9, 2017

For me personally I would like to use event metadata for the following cases:

  • Store command id of the command responsible for causing the event.
  • Store event "tags" in order to be able to read up partial streams.

In both of these cases the only thing I would really require is an api similar to
EventStore.GetEventsMatchingMetadata(stream_id, metadata_key, metadata_value)
This api is of course quite ugly and I'm in no way proposing that this should be the api. But if Im looking at the tiniest thing I would need on the query side in order to solve my current use cases then this would be it.

Im always a bit wary of introducing Dictionary<string, object> as an offical api. Maybe it could be possible to add a strongly typed event metadata object?

Im thinking maybe something like

class AccountOpenedMetadata {
    [Duplicate]
    public Guid CustomerId { get; set; }
    public string Tag { get; set; }
    public Guid CommandId { get; set; }
}
class AccountOpened {
    Guid AccountId { get; set; }
    Guid OpenedAt { get; set; }
}

session.Events.AppendEvent(new AccountOpened(), new AccountOpenedMetadata());

At this point maybe we could wrap the AccountOpenedMetadata in a wrapper class that adds a link back to the original event, and store the metadata as a normal marten document (which could make attributes such as Duplicate work as well) and we could query the metadata objects like normal marten docs?

Not at all sure if this is a good idea, or how doable it would be but I figured Id throw the idea out there and see what you think.

@jeffdoolittle
Copy link
Contributor

@jeffdoolittle jeffdoolittle commented Jun 9, 2017

@wastaz perhaps I'm misunderstanding but it sounds like everything you described is already doable in Marten. If you want to store changes to events as well as a "metadata" document, you can do that in one DocumentSession and accomplish what you describe. Sure there isn't an official method for this on DocumentSession but it would be pretty easy to spin up your own general use abstraction for this.

Am I understanding you or am I missing something?

@wastaz
Copy link
Contributor

@wastaz wastaz commented Jun 12, 2017

@jeffdoolittle Well, yes and no. It's certainly possible today, but there's a bit of heavy lifting involved and it's not really possible to get it done in "one query" without writing direct sql queries against the tables. Basically my point is that if we are considering event metadata as a feature and we already have the docstore stuff in marten then why not try to use it as much as possible? :)

@jeffdoolittle
Copy link
Contributor

@jeffdoolittle jeffdoolittle commented Jun 12, 2017

I don't recall off the top of my head, but can you do a batch query that includes documents and events? Might help if your concern is trips to the database.

To be clear, I'm not opposed to Event Metadata, just trying to understand how people would want to use it so we can come up with a good abstraction and api for it that will stand the test of time.

@aprooks
Copy link

@aprooks aprooks commented Mar 30, 2018

I'm evaluating Marten and stumbled on this issue almost instantly. Metadata is important part
"stream-entry". It's not an event data, but some infrastructure concerns that are written along side an event.

Some of metadata fields I usually use are: user Id and scopes, correlation/causation ids (especially useful when debugging process managers). Your aggregate might have zero knowledge of these but they should be written along side business data - event.

Seems like it can be achieved by using data envelope like:

class Envelope<T> {
Data T {get;set;}
Dictionary<string,string> Metadata {get;set;}
}

But it feels like it'll make code especially aggregates uglier.

@eouw0o83hf
Copy link
Contributor

@eouw0o83hf eouw0o83hf commented Aug 13, 2018

Also evaluating Marten and lack of metadata capability is a big deal. Other frameworks provide this readily, and an enterprise event store needs to be able to handle metadata/headers.

@aprooks , I actually pursued that route and there's some deserialization bug lurking within Marten that actually renders that route infeasible for now. See #1069 for details.

@jeremydmiller
Copy link
Contributor Author

@jeremydmiller jeremydmiller commented Aug 17, 2018

@eouw0o83hf This got left out of Marten 2.0 because it's a lot of work and there wasn't much demand for it at the time. Since it is an awful lot of work to support this and there's very little concrete definition about what it means or how it'd be used, can you add some concrete examples of what you want here? And as I commented in #1069, you could easily effect this yourself with a base class

@eouw0o83hf
Copy link
Contributor

@eouw0o83hf eouw0o83hf commented Aug 17, 2018

@jeremydmiller definitely doable with a base class (and absolutely understand that "not enough of a priority" reasoning), but I've used NEventStore in the past and the ability to persist headers gave a lot of extra strength to an event-sourcing framework when called upon as an audit log (for either security/"who did this" purposes or debugging/"what did the user do" purposes).

Top few things I've tossed into headers/metadata that were super useful:

  • ExecutingUserId: Who performed the action which resulted in this event?
  • AuthenticatedUserId: In a system which supports impersonation, what administrator took this action in the role of another user?
  • RequestId: Unique identifier for the web request which resulted in this event, useful for cross-correlating with Splunk/logging for debugging
  • CorrelationId: For a distributed transaction or chain of business events, what do we need to tie this event to (again useful for debugging logs)
@eouw0o83hf
Copy link
Contributor

@eouw0o83hf eouw0o83hf commented Aug 17, 2018

Oh and also in direct response to the original question, I think a straightforward Dictionary<string, object> property on Event would solve this simply and beautifully.

@bklooste
Copy link

@bklooste bklooste commented Dec 7, 2018

I would recommend a byte[] outside of event , like Event store does .. There are many cases where you want to read all the events get the metadata but not de-serialize all the bodies.

@jeremydmiller
Copy link
Contributor Author

@jeremydmiller jeremydmiller commented Dec 7, 2018

@bklooste YOu can do that to your heart's content with SQL as is

@bklooste
Copy link

@bklooste bklooste commented Dec 11, 2018

@jeremydmiller
I had a look and its not trivial adding a metadata column to the events table do this , as all the other parts of the lib are not aware of it , not to mention schema generation,.. I suppose you could build up a lookup table.

Note the use case of category or aggregate streams which may be > 100M events and you don't want to deserialize them all .

@wclr
Copy link

@wclr wclr commented May 19, 2019

What is the actual difference between "data" and "metadata" in this case? If by metadata meant something specific to a particular event type, why not put in "data" (payload)? Where is the actual boundary between data metadata? How should one determine what specifics deserves to go to metadata vs event data?

@aprooks
Copy link

@aprooks aprooks commented May 19, 2019

If by metadata meant something specific to a particular event type, why not put in "data" (payload)?

metadata is application/protocol specific. Imagine HTTP-headers: you might have a required attribute (Authorize) and some helpers, like x-tracing-id. So anything very general, applicable to all event types, and modifiable during handling pipelines could be stored alongside a message body and be called metadata.

The approach you described will work, but it will be semantically wrong.

@bklooste
Copy link

@bklooste bklooste commented May 19, 2019

@whitecolor the key thing is it requires the consumer to inspect and know the data structure. Metadata tends to be a universal / loose structure which can also be used to make decisions on where data should be routed or partitioned without knowing the data structure. It should also be small where as a data message maybe a Meg.

@bklooste
Copy link

@bklooste bklooste commented May 19, 2019

Note here several types of metadata an organization may employ and note while a message is typically immutable , metadata may not be. Its important to be able to guarantee bodies are immutable.

Here are the 3 key scenarios in decreasing order of importance. I tend to use some of 1 and 2.

  • Convenience information without message inspection .. Partitioning / keys/ Event Create Time ( not Db create) / Routing/Archiving. eg decryption key for immutable message for EU , when past time or you need to delete that data throw away the key. Many cases here for bigger systems.
  • General Tracking / audit / correlation/ tracing information eg http headers
  • Event specific business information. These can be in the event but mutability can get in the way , can also be used to enhance the event while keeping a core structure..
@wclr
Copy link

@wclr wclr commented May 22, 2019

Ok, thanks for elaboration, let's see the example case:
If there is some Item aggregate and Users can create/modify it. User id committing action in the whole system can be treated as a kind of general/universal info. But this also potentially information about the user that created/edit an Item though not critical but may be useful for the business in a certain scenario. Should this User id go to metadata or event data? What should be considered for that decision?

@aprooks
Copy link

@aprooks aprooks commented May 23, 2019

UserId could be stored as part of the metadata for every event. For example, if there is a requirement to have a fully traceable to a User audit log of every change in a system.

The same UserId could be part of business logic and event body, for example, if you have a requirement like "a user can only toggle an item created by himself".

There is nothing bad if data occasionally duplicates. Those fields will be handled separately by different application layers/parts/components.

@oskardudycz
Copy link
Collaborator

@oskardudycz oskardudycz commented Jun 8, 2019

I was also thinking about adding Metadata, for sure I'd see there something like correlationId \ traceId \ requestId plus possibility to inject some other custom data. That could really help with diagnostic, analysis, and not polutting events business data.

I don't have specific design yet, but the first guess would be to add it as new column simmilar to data as it's for current json. I'd need to check how EventStore or other simmilar systems are handling that for some inspiration. I see benefit on having that, but that won't be finished sooner than Marten 4.0.0, it would be needed also to have consensus on that with @jeremydmiller @mysticmind and @jokokko.

@oskardudycz oskardudycz added this to the 4.0 milestone Jun 8, 2019
@bklooste
Copy link

@bklooste bklooste commented Jun 13, 2019

This is what we are using Time and NodeId are time event was created and node where it was created which can be different to the time its inserted in the DB especially if you have multiple DBs

public class MetaData 
{
    public Guid PartitionId { get; set;}   
    public Guid NodeId { get; set;}

    [JsonProperty("$correlationId")]
    public Guid? CorrelationId { get; set; } 
    [JsonProperty("$causationId")]
    public Guid? CausationId { get; set; }
    public DateTime Time { get; set; }

}

CorrelationId and CausationId come from Event store - note the pattern here its a json document but the db uses JObject["$correlationId"] so the consumer is free to use any structure they choose.

@jacobpovar
Copy link
Contributor

@jacobpovar jacobpovar commented Jul 7, 2019

@oskardudycz I created a POC implementation of metadata column no so long ago - https://github.com/JasperFx/marten/compare/master...jacobpovar:metadataColumn?expand=1. Not completely finished, but might be worth to take a look at.

@jeremydmiller
Copy link
Contributor Author

@jeremydmiller jeremydmiller commented Jul 7, 2019

@jacobpovar @oskardudycz Just a couple thoughts on the POC code:

  • It'd be relatively expensive at performance time. JsonConvert.SerializeObject() isn't a very efficient way to do the serialization
  • I think I'd maybe vote to make the deserialization on Event be lazy 'cause you don't know when it's necessary or not
  • I think we need to do a big rethink on the projection support soon where projections can either declare much more about what they need (stream id, version, just the data, metadata) as a farther optimization of the async daemon
  • I think the usage of metadata needs to be "opt in" as part of the event store configuration so you don't get the perf hit without anything going on. So if you're not using metadata at all, there's no serialization hit of any kind for the metadata.
  • Might consider using the same kind of mechanics as the duplicated fields for the metadata storage, but that's going to force you to declare the necessary metadata options upfront. Do that and it's going to be way easier to query on the event table
  • We might also consider allowing you to use a custom Event<T> type that adds either a Dictionary type or individual fields that could be persisted & loaded. So something like StoreOptions.Events.EventBaseType = typeof(MyCustomBase<>);, and we derive additional fields from that new base type. But that base type would have to extend Event<T> somehow
  • For capturing events, I'd vote for overloads like Append(Guid stream, IEnumerable<KeyValuePair<string, object>>, params object[] @events) and say that the metadata gets into each posted event. The {{key=value}, {key=value}} literal syntax should be helpful here
  • Gotta think about metadata on streams vs. events too.
@bklooste
Copy link

@bklooste bklooste commented Jul 8, 2019

Other options around serialization. Could also leave it at byte[] and leave it up to the consumer , some people may just put a guid in there.

Note a huge / key difference is metadata is normally the same fixed type for all events vs different types for the event. This allows more optimal custom serialization / deserialization. 2 optional Factory Funcs could help people here.

Agree should be opt in .
Don't agree meta data should be lazy but the event itself probably should be .
Yes projections often inspect meta data than ignore irrelevant messages without deserializing the message.

Id forget about any mights / shoulds - get the basics in do it well , enhance later. Focus on persistence structure /schema.

@jeremydmiller
Copy link
Contributor Author

@jeremydmiller jeremydmiller commented Jul 8, 2019

@bklooste Thanks for the input. The "mights" and "shoulds" above are questions about how to pull it off, and not necessarily about features.

The custom factory Funcs are interesting, but i'm not sure about the usability. I'm also trying to think about how the metadata is going to be consumed from both straight up SQL querying and C# code. Making it a byte[] renders the data almost useful w/o the C# code and that's something else to consider.

@bklooste
Copy link

@bklooste bklooste commented Jul 8, 2019

Basically AddMetaData ( Func<T, byte[]) metadataCreator = null , Func<T, byte[]) metadataWriter = null ) params on setup where T is the metadata type ..
If null just use Json as above but at least the consumer can do this

AddMetaData ( val => val.ToByteArray() ,val => new Guid( val));

or add custom json serializer, convert to new metadata type or anything else.

@oskardudycz
Copy link
Collaborator

@oskardudycz oskardudycz commented Oct 4, 2019

There is a PR for adding Metadata to documents from @barryhagan. Interested persons are welcome to pleas the comments there :) #1364

@jeremydmiller
Copy link
Contributor Author

@jeremydmiller jeremydmiller commented Dec 20, 2020

Tasks

  • Some kind of properties on EventGraph that could be used to enable causation id, correlation id, and the header collection.
  • Add CausationId, CorrelationId, and Headers to the IEvent interface
  • Write these properties to IEvent if they're set on the IDocumentSession.
  • If those properties are enabled, add an extra column to the EventsTable. Make sure that the column objects added to the EventsTable all implement the IEventTableColumn to get them to play well with the code generation. The code generation should be pretty simple. The implementation of IEventTableColumn should look a lot like TenantIdColumn
  • Look at the global policies to add metadata globally and have the EventGraph properties enabled as well
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants