Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redo Akka.Persistence serialization #3811

Closed
Aaronontheweb opened this issue May 25, 2019 · 27 comments
Closed

Redo Akka.Persistence serialization #3811

Aaronontheweb opened this issue May 25, 2019 · 27 comments

Comments

@Aaronontheweb
Copy link
Member

Version: 1.3.13 and later.

Without putting too fine a polish in it, it's clear to me after working on #3744 that the current implementation of Akka.Persistence serialization in Akka.NET is unsustainable and that a serious error in judgement was made when we decided to allow individual Akka.Persistence plugins to choose their own "native-friendly" serialization instead of relying on the Protobuf envelopes that Akka.Persistence provides by default. This is especially true of our Akka.Persistence.Sql plugins.

As the new Akka.Persistence.TCK specifications will make perfectly clear in Akka.NET v1.4.0+, it is very difficult for Akka.Persistence plugin implementations to correctly support things like IActorRef serialization since that requires access to some internal fields and properties, such as CurrentTransportInformation that aren't meant to be exposed outside of Akka.NET for safety reasons.

Most journal and snapshot store plugins will fail the new serialization checks, because the current design of many Akka.Persistence plugins requires them to come up with their own techniques for serialization, virtually none of which are consistent with Akka.NET best practices.

Going forward, we should to streamline everything on top of the built-in Akka.Persistence MessageSerializer and SnapshotSerializer implementations so we can correctly support Akka.Persistence features like multi-tenant journals, snapshot stores, and more.

This is too big and drastic of a change for v1.4.0, but I'd suggest we put it on the table for 1.5.0.

@Aaronontheweb
Copy link
Member Author

I should point out: these issues and the bugs I'm fixing in #3744 are one of the reasons why Akka.Cluster.Sharding has been stuck in beta for so long. Coordinator fail-over and recovery is impossible without reliable serialization and deserialization of IActoRefs, which is made much more complicated by having every Akka.Persistence plugin do its own thing.

@Aaronontheweb
Copy link
Member Author

Example of stuff that should be deleted:

var payloadType = e.Payload.GetType();
var serializer = Serialization.FindSerializerForType(payloadType, Configuration.DefaultSerializer);
string manifest = "";
if (serializer is SerializerWithStringManifest)
{
manifest = ((SerializerWithStringManifest)serializer).Manifest(e.Payload);
}
else
{
if (serializer.IncludeManifest)
{
manifest = e.Payload.GetType().TypeQualifiedName();
}
}
var binary = serializer.ToBinary(e.Payload);

Allowing the plugin to OVERRIDE the akka.actor.serialization-bindings with some kind of external type hint is crazy - there's no way for the serialization system to know what to do based on its own configuration in this case. That's gone in 1.5.0 and it may not even make it into 1.4.0 since getting Akka.Cluster.Sharding out of beta requires that IActorRef serialization work properly.

@Aaronontheweb
Copy link
Member Author

On top of that, having to directly expose both the manifest data AND the serializer ID directly into the schema of the Akka.Persistence plugins has been demonstrably brittle over the years - none of that would have even been necessary had we stuck with the original Akka.Persistence design in the first place, since that information is all neatly encapsulated internally.

@Horusiath
Copy link
Contributor

Horusiath commented May 27, 2019

Regarding code snippet you've shown - I think that nowadays we have a method inside Serialization class itself, that does exactly that.

IMHO:

  1. We shouldn't be too eager to ditch existing mechanisms. Maybe it's possible to simply expose API used to serialize/deserialize akka specific types (like IActorRef)?
  2. I'd take a look at how JVM akka resolves serialization in akka-typed (which AFAIK is a little different from. Also starting from v2.6 its changing default serialization to Jackson (see the issue). Maybe there's something worthwhile to learn from their approach.

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented May 27, 2019 via email

@Aaronontheweb
Copy link
Member Author

Another good example here: akkadotnet/Akka.Persistence.MongoDB#42 and here akkadotnet/Akka.Persistence.MongoDB#26 - Akka.Persistence.MongoDb couldn't support the built-in Cluster.Sharding serializers by default, since it was doing its own thing with serialization rather than using our Protobuf envelopes.

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented May 27, 2019

To be clear though, @Horusiath is absolutely right about this:

We shouldn't be too eager to ditch existing mechanisms.

The existing mechanisms are, by any objective standard, bad. But thousands of users have potentially years worth of data stored in them. The only way we could make this situation worse is to strand those users so they can't upgrade onto newer versions of Akka.NET without having to come up with their own hand-rolled ETL tools for Akka.Persistence data. I'd consider that an abdication of responsibility on the part of the Akka.NET project if we allowed that to happen.

Therefore, I'm going to learn towards, to the extent that it is both feasible and possible, the following upgrade path:

  1. Make it possible to still read and recover messages that were written using the Akka.NET 1.3.0+ serialization mechanisms going forward;
  2. Save all messages by default using the new protobuf envelope; and
  3. Don't allow point 2 to be overwritten by downstream plugin implementations - message / payload serialization should no longer be a concern of of individual Akka.Persistence plugins ever again. We've experimented with it the other way around and it's been a disaster, so someone arguing for the status quo here is going to need to make a very compelling case to the contrary.
  4. Make certain that the Akka.Persistence.TCK enforces point 3.

@Horusiath
Copy link
Contributor

This may be a crazy idea, but: what about making IActorRef inherit ISerializable interface?

  1. We could still keep our own serializers around, to perform efficient IActorRef serialization/deserialization in situations we know about.
  2. We could have out-of-the-box serialization for all serializers that support ISerializable that way.

Of course the problem would be to resolve current ActorSystem, but I think it's doable the same way we deal with current actor cell, using ThreadStatic.

@Aaronontheweb
Copy link
Member Author

Related: #2933

Aaronontheweb added a commit to Aaronontheweb/Akka.Persistence.SqlServer that referenced this issue Jul 30, 2019
Aaronontheweb added a commit to akkadotnet/Akka.Persistence.SqlServer that referenced this issue Jul 30, 2019
* Akka.Persistence.SqlServer v1.3.14 upgrade

* disabled serialization specs since akkadotnet/akka.net#3811

* fixed target to .NET 4.5
@Zetanova
Copy link
Contributor

Zetanova commented Sep 17, 2019

I am currently researching a method to persist the serialization data in a message.

The idea is to have a special payload type that is normally serialized
but it needs to be deserialized on request (user space) or specialy handled like in ActorBase.Receive/Unhandled

Maybe this payload-type would solve the this Akka.Persistence issue too.
Instead of preventing to serialization in the persistence plugins,
they could be extended to use the pre-serialized content.

It would resolve some other issues:

  • The error handling of RemoteDeploy and akka.streams have currently issues with unknown exceptions types with many are not known for the remote site.
  • In a remote running API, where the user can pass a success/failure message as post-back.
  • A relay / gate node dont need to know the payload type, because it does not deserialize it.
  • No double serialization as a feature-bug

Related: object/Akka.Persistence.Reminders#6

@Aaronontheweb
Copy link
Member Author

No double serialization as a feature-bug

It's not a bug - you have to have a standard payload definition to determine what the user-defined message types are. The problem with Akka.Persistence is that we threw the standard Protobuf payload definition in the trash and forced all of that data to get stuck directly into the Akka.Persistence implementations themselves, which makes them very difficult to upgrade and has caused a lot of problems historically.

This is a return to Keep it Simple(tm) - we're going to do what the JVM has been doing and let the Protobuf envelope, which can be changed easily without introducing these types of problems, do its job.

@Zetanova
Copy link
Contributor

Can this Protobuf envelope then be so generic that it can be used other places?

Example:
NodaA sends a wrap message with a "Protobuf envelope payload" field to NodeB
NodeB receivies the wrap message and sends the "Protobuf envelope payload" to NodeC
NodeC receives the "Protobuf envelope payload" and the Actor tries to handle it serialized.
If the receiving actor is not handling the "Protobuf envelope payload" directly the AroundReceive Method tries to deserialize it and on success try to handle the deserialized message.

@Aaronontheweb
Copy link
Member Author

We already have it:

message PersistentMessage {
PersistentPayload payload = 1;
int64 sequenceNr = 2;
string persistenceId = 3;
bool deleted = 4; // not used in new records from 2.4
string sender = 11; // not stored in journal, needed for remote serialization
string manifest = 12;
string writerGuid = 13;
}

It's just not being enforced by the underlying Akka.Persistence implementation.

In Akka.Remote we do something similar:

message RemoteEnvelope {
ActorRefData recipient = 1;
Payload message = 2;
ActorRefData sender = 4;
fixed64 seq = 5;
}

message ActorRefData {
string path = 1;
}
// Defines a remote address.
message AddressData {
string system = 1;
string hostname = 2;
uint32 port = 3;
string protocol = 4;
}
// Defines a payload.
message Payload {
bytes message = 1;
int32 serializerId = 2;
bytes messageManifest = 3;
}

@Aaronontheweb
Copy link
Member Author

The envelope formats are only supposed to contain:

  1. The binary content of the payload
  2. The serializer ID
  3. Some manifest / type-hint information that the serializer can use to reconstitute the message.

In Akka.Remote we also have to include the sender / receiver information; in Akka.Persistence we have to include the sequence number and entity id.

@Zetanova
Copy link
Contributor

thx, the protobuf types i didnt see.

I made a sample implementation snipset in object/Akka.Persistence.Reminders#6 (comment)
Maybe its better to see there what i mean.

My point is that there is currently no pass-through transparent serialization feature
and every "auto" deserialiazion of messages happens out of the scope of the user.
Including the error processing in the deserialiazion itself (Node Disassociated)

The same goes for Akka.Persistence.
there is currently no way to persist an already serialized message as is.
Every payload gets serialized and wrapped into a PersistentPayload again.

private PersistentPayload GetPersistentPayload(object obj)

Maybe there is just a pass-thru Serializer missing, that takes the manifest, data-bytes and serializerId values directly from a serialized-payload message.

´´´
akka.persistence.journal.eventstore {
event-adapters {
passthru = "PassThruSerializer, Akka.Maybe"
}
event-adapter-bindings {
"Akka.Persistence.Serialization.PersistentPayload, Akka.Persistence" = passthru
"ISerialzedPayload, Akka.Maybe" = passthru
}
}
´´´

@ismaelhamed
Copy link
Member

@Aaronontheweb Just to make sure I got it right: the payload (user provided message) inside the PersistentPayload WON'T be serialized anymore using a user configured serializer, if provided? Or are we just talking about enforcing the default serializer (Google Protobuf) only on the envelope (PersistentMessage) but still being able to serialize the payload with whatever the user chooses to (i.e., MessagePack)?

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented Oct 8, 2019

So yet another thing I've discovered while working on trying to get Akka.Persistence.Azure to do the right thing: #3965

There isn't a single plugin, as far as I'm aware, that actually passes our own serialization TCK that has been in-place since Akka.NET v1.3.0. This includes the SQL plugins and others. This is because our entire serialization approach isn't standardized across plugins and it puts a high burden onto individual plugin owners to implement them correctly, which our own team can't even do correctly, so it seems.

Edit: and it doesn't help that our TCK is apparently implemented incorrectly and no one has bothered to notice.

@Aaronontheweb
Copy link
Member Author

Just to make sure I got it right: the payload (user provided message) inside the PersistentPayload WON'T be serialized anymore using a user configured serializer, if provided? Or are we just talking about enforcing the default serializer (Google Protobuf) only on the envelope (PersistentMessage) but still being able to serialize the payload with whatever the user chooses to (i.e., MessagePack)?

All user-defined messages will still be serialized using their configured serializers. It's just that the content being saved into Akka.Persistence itself will all be encoded inside a protobuf envelope that is standardized by Akka.Persistence, very similar to how we do it for Akka.Remote.

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented Oct 23, 2019

For a proof of concept of how this will work, see akkadotnet/Akka.Persistence.MongoDB#71 implementing this in Akka.Persistence.MongoDb since the legacy "object" serialization is totally incompatible with AtLeastOnceDelivery actors and Akka.Cluster.Sharding as a result of not using any of our binary serialization infrastructure.

@Zetanova
Copy link
Contributor

Zetanova commented Oct 24, 2019

how to use Text/Json serialization in debug mode?
I would need to see what gets serialized in the store for PersistentActors.

The main issue is that the journal itself needs currently to serialize/deserialize.

I still thing a serialized-message type for serialization would makes more sens.
This type would tell the Journal that the message is pre-serialized (binary or text)
-) On write an upgraded-journal would then take the serialized-data as is from the message.
-) On write an old-journal would serialize the message as in the workaround
-) On read an upgraded-journal would then emit only the serialized-message type
-) On read an old-journal would still emit the deserialized messages.

A new layer/component would be required that:

  1. Pre-write serialize every message to the serialized-message type for the journal
    Normal serialization for PersistentActors,
    Special serialization for Sharding or AtLeastOnceDelivery
    Passthru for serialized-message type

  2. Post-read every message from the journal
    On serialized-message it will deserialize the message
    On unknwon message type it will passthru the message
    No-deserialize option to emit only serialized-message

This would help in other issues like:

  1. persistence-proxy of pre-serialized messages.
    a proxy-node should not need to bind to every assembly and be up-to-date.
  2. relying of serialized message.
    currently there is no why to tell a remote generic-service-actor a payload as a post-back
    There is no why to rely a payload from nodeA via NodeB to NodeC
    without the need for NodeB to bind every assembly of the cluster/roles.
    //A simple RelyService is already impossible
    nodeBService.Tell(new NodeBServiceCommand.RelyTo(Self, new NodeAMessage.Hello("hallo world"));
    
  3. there will be some performance improvement to skip re-serialization in
    Remote->Remote and Remote->Persistence
    and Persistence ->Persistence and Persistence ->Remote
    and Broadcasts

@Aaronontheweb
Copy link
Member Author

how to use Text/Json serialization in debug mode?

Easiest way to do this would be via an EventAdapter, in the event that you needed to debug things at that level, which most users shouldn't need to do.

The raison d'etre for doing this is twofold:

  1. There's many edge cases that are automatically handled by Akka.NET's built-in serialization, which works with binary.
  2. The protobuf envelopes naturally lend themselves to versioning, introduction of new features, and updates that don't require schema migrations and other "high effort / high risk" practices.

Building an entire second system for doing this with text, solely so people can read the messages that are being serialized in plain text through a third party database tool, has been a total disaster from a maintainability, version-tolerance, and implementation success standpoint.

A small number of users involved in doing the maintenance work, myself among them, have had to bear significant costs fixing these incompatibilities when we could be doing more useful things like performance fixes. I've personally spent on the order of 200-300 hours working on these problems in 2019. We're done doing that - forever.

If having Akka.Persistence data stored in human-readable format is a must-have for you, fork a plugin and bear the costs of maintaining it yourselves. The OSS organization isn't going to do that for free any more. It's an expensive drag on resources and time.


A new layer/component would be required that:

Pre-write serialize every message to the serialized-message type for the journal
Normal serialization for PersistentActors,
Special serialization for Sharding or AtLeastOnceDelivery
Passthru for serialized-message type

Post-read every message from the journal
On serialized-message it will deserialize the message
On unknwon message type it will passthru the message
No-deserialize option to emit only serialized-message

This is what EventAdapters are supposed to do, but the idea is that your serialization format should be stable and reproducible going all the way back to your oldest stored data. This type of event adaptation / change should occur solely with objects, not with bytes. Introducing yet another abstraction to assist with this will make Akka.Peristence even more expensive to implement and maintain.

I'm not trying to discourage innovation or creativity here - simply stating an economic truth: our defaults are bad, usually aren't implemented correctly, and this results in both bad experiences for end-users and a ton of maintenance work on both the part of the OSS community and Petabridge.

The lesser of two evils is something that actually works reliably and correctly, most of the time, even if you can't easily read the output data in something like SMSS or whatever your chosen database platform is. The answer isn't to do the same thing we did before, but with even more abstractions, and hope it works - it's to follow a pattern that works via standardization of the serialization process.

If you want to solve the "data readability" problem, do that on the read side with an independent Akka.Persistence reader tool, or even better - use an EventAdapter or a logger to solve that issue by writing the output into human-readable form elsewhere, instead of trying to do it with the application data itself.

@Zetanova
Copy link
Contributor

Zetanova commented Oct 24, 2019

Thank you for answering my question.

Yes, text representation is optional, i am using EventStore
and looking over the WebInterface into the streams for debug.
It is nice, but of course not required.

For the MongoDb workaround to add a NotSupportedException
into the Text case would make some sens.
The Binary and Text case should have the same behavior
or else only binary-case should work in write mode.

What about the proposal to extract the serialization logic
out from the persistent journal/snapshot-store completely?
The current journal serialization-logic implantation of plugins
can then stay as is and can later be updated.

It is currently not possible to pre-serialize with a poor EventAdapter,
there is no serialized-message-type that a journal knows about.
The simplest form would be to transform into a byte[] array
but then the metadata would be lost.
The next form would be to transform
into a protobuf PersistentPayload message of the Akka.Remoting
but then it would get double serialized and the metadata would be lost too.

I already tried and spend only 10-20h on it,
it is more or less working for me in Akka.Remoting and Persistent as a Payload-callback.
The fields what a serialized-message would need are:

sealed class SerializedPayload {
int SerializerId; 
string Manifest; 
byte[] Data; 
byte[] HashKey;
}

The HashKey is required for the ConsistentHashRouter
and it would be better to change it to an integer type,
but for that a Pre-Hashed return option
inside IConsistentHashable would be required.
I think i made an PR for it some weeks ago.

To have this message type in akka.dll and as an opt-in feature for
Akka.Persistentce.Journals and Remoting would bring some value.

An updated journal would then only need to take the SerializedPayload
as is and write it to the database and instead of deserializing
the message internally it would return the SerializedPayload.
and the DefaultEventAdapter would then deserialize it.

Maybe i am miss taken, but i don't see that this solution
would require much more afford then to update every persistent plugin up front.

@Aaronontheweb
Copy link
Member Author

Do you have an example or some code we could look at? That might help me understand what you're trying to suggest better.

@to11mtm
Copy link
Member

to11mtm commented Apr 16, 2022

So, gonna leave my comments here, since I've been neck deep in different implementations on the JVM Side while looking at Tag tables in SQL.

we're going to do what the JVM has been doing and let the Protobuf envelope, which can be changed easily without introducing these types of problems, do its job.

In persistence-jdbc, they're flattening everything down. Now, they absolutely MUST respect the object's defined serializer, but they do not serialize PersistentRepr. (They also make sure to respect 'manifest' vs 'serializer manifest' now, something I don't think many plugins do well now, not sure they did when I originally ported.) Similar in akka-persistence-cassandra.

(BTW we probably should do this with tag table changes... yay for more compatibility flags!)

@Aaronontheweb
Copy link
Member Author

Meaning PersistentRepresentation is no longer part of the payload?

@to11mtm
Copy link
Member

to11mtm commented Apr 16, 2022

Meaning PersistentRepresentation is no longer part of the payload?

That would seem to be the case. Here's the actual serialize in persistence-cassandra. They grab the payload from the persistent repr, and as with persistence-jdbc they are unrolling tagged payloads.

@Aaronontheweb
Copy link
Member Author

Truth be told, I think this issue is resolved - over the years we've made a lot of progress on making sure that all Akka.Persistence plugins properly support SerializerWithStringManifest and in most cases we've done that without the use of Persistent envelope types. I like the idea of avoiding "double serialization" for plugins where it's not currently needed - deserializing the PersistentRepresentation envelope and then deserializing the message within.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants