Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add Protobuf support #84

Closed
jodydonetti opened this issue Oct 21, 2022 · 12 comments
Closed

[FEATURE] Add Protobuf support #84

jodydonetti opened this issue Oct 21, 2022 · 12 comments
Assignees
Labels
enhancement New feature or request

Comments

@jodydonetti
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
FusionCache provides a way to add custom serialization format support, by implementing the IFusionCacheSerializer interface.

It already provides 2 implementations, both for the JSON format:

It recently added (but not yet released) a third one: MessagePack.

It would be great to add support for the Protobuf serialization format, too.

Describe the solution you'd like
A new package that add supports for the Protobuf format, probably based on protobuf-net by @mgravell which is probably the most used implementation on .NET and the most performant.

Describe alternatives you've considered
Everyone that needs it should otherwise implement their own, which is meh 😐

@jodydonetti jodydonetti self-assigned this Oct 21, 2022
@jodydonetti jodydonetti pinned this issue Oct 21, 2022
@jodydonetti jodydonetti added the enhancement New feature or request label Oct 21, 2022
@mgravell
Copy link

mgravell commented Oct 21, 2022

Probably looks a lot like this:

sealed class ProtobufNetSerializer : IFusionCacheSerializer
{
    private readonly TypeModel _model;
    public ProtobufNetSerializer(TypeModel model = null)
        => _model = model ?? RuntimeTypeModel.Default;

    public byte[] Serialize<T>(T obj)
    {
        using var ms = new MemoryStream();
        _model.Serialize<T>(ms, obj);
        return ms.ToArray();
    }

    public T Deserialize<T>(byte[] data)
        => _model.Deserialize<T>((ReadOnlyMemory<byte>)data);

    ValueTask<T> IFusionCacheSerializer.DeserializeAsync<T>(byte[] data)
        => new(Deserialize<T>(data));

    ValueTask<byte[]> IFusionCacheSerializer.SerializeAsync<T>(T obj) 
        => new(Serialize<T>(obj));
}

There is an API that can avoid having to use the MemoryStream / unknown-length thing, but right now: I'd go with ^^^

@jodydonetti
Copy link
Collaborator Author

jodydonetti commented Oct 21, 2022

Thanks @mgravell !

I was about to push my first impl and ask for your opinion, but you've beaten me to it 😄
From what I can see they are quite similar, at least in principle.

Here it is:

public class FusionCacheProtoBufNetSerializer
	: IFusionCacheSerializer
{
	public FusionCacheProtoBufNetSerializer(RuntimeTypeModel? model = null)
	{
		_model = model ?? RuntimeTypeModel.Default;

		// ENSURE MODEL REGISTRATION FOR FusionCacheEntryMetadata
		if (_model.IsDefined(typeof(FusionCacheEntryMetadata)) == false)
		{
			_model.Add(typeof(FusionCacheEntryMetadata), false)
				.SetSurrogate(typeof(FusionCacheEntryMetadataSurrogate))
			;
		}
	}

	private readonly RuntimeTypeModel _model;
	private readonly object _modelLock = new object();

	private void EnsureDistributedEntryModelIsRegistered<T>()
	{
		// TODO: OPTIMIZE THIS

		var _t = typeof(T);
		if (_t.IsGenericType == false || _t.GetGenericTypeDefinition() != typeof(FusionCacheDistributedEntry<>))
			return;

		if (_model.IsDefined(_t))
			return;

		lock (_modelLock)
		{
			if (_model.IsDefined(_t))
				return;

			// ENSURE MODEL REGISTRATION FOR FusionCacheDistributedEntry<T>
			_model.Add(typeof(T), false)
				.Add(1, nameof(FusionCacheDistributedEntry<T>.Value))
				.Add(2, nameof(FusionCacheDistributedEntry<T>.Metadata))
			;
		}
	}

	public byte[] Serialize<T>(T? obj)
	{
		EnsureDistributedEntryModelIsRegistered<T>();
		using (var stream = new MemoryStream())
		{
			_model.Serialize<T?>(stream, obj);
			return stream.ToArray();
		}
	}

	public T? Deserialize<T>(byte[] data)
	{
		if (data.Length == 0)
			return default(T);

		EnsureDistributedEntryModelIsRegistered<T>();
		using (var stream = new MemoryStream(data))
		{
			return _model.Deserialize<T?>(stream);
		}
	}

	public ValueTask<byte[]> SerializeAsync<T>(T? obj)
	{
		return new ValueTask<byte[]>(Serialize(obj));
	}

	public ValueTask<T?> DeserializeAsync<T>(byte[] data)
	{
		return new ValueTask<T?>(Deserialize<T>(data));
	}
}

A couple of notes:

  • since models must be defined for Protobuf to work best, and since I don't want users of FusionCache to have to define the needed boilerplate types themselves but, at most, their own types, I decided to automatically ensure the needed ones are there. In the end they are the FusionCacheEntryMetadata type (added in the ctor) and each specialized FusionCacheDistributedEntry<T> type (added before each use) for each T that is used, because generics etc. Would you say this is the best approach?
  • in the auto-registration code, after checking if the type is a FusionCacheDistributedEntry<T>, I'm currently using IsDefined(Type). Is this the best approach?
  • I see you're using TypeModel instead of RuntimeTypeModel: would I be correct in assuming that, since I'm auto-adding models, using RuntimeTypeModel instead is the right thing?
  • in the Deserialize() impl I'm checking first if the data has a length of zero and in that case I'm returning default(T). I'm doing this because, strangely enough, I noticed that if I serialize a null string and then I deserialize it, it seems to return an empty string instead. I've already searched online for a behaviour like this, but I only found this SO answer which does not seem to be my case

For the last point this is a minimal repro:

string? value1 = null;
string? value2 = null;
byte[] data;

using (var ms = new MemoryStream())
{
	ProtoBuf.Serializer.Serialize(ms, value1);
	data = ms.ToArray();
}
using (var ms = new MemoryStream(data))
{
	value2 = ProtoBuf.Serializer.Deserialize<string>(ms);
}

Console.WriteLine($"VALUE 1 IS NULL: {value1 is null}");
Console.WriteLine($"VALUE 2 IS NULL: {value2 is null}");

And its output is:

VALUE 1 IS NULL: True
VALUE 2 IS NULL: False

Am I missing something?

Thanks!

@jodydonetti
Copy link
Collaborator Author

jodydonetti commented Oct 21, 2022

Mmmh, upon further investigation it seems that the call to IsDefined() is auto-adding the type to be checked, breaking the logic. I assume it may be related to one of the AutoAdd/AutoCreate options in the ModelType.

I'm trying to understand more, maybe using the CanSerialize() method, and will update later.

@mgravell
Copy link

mgravell commented Oct 21, 2022 via email

@mgravell
Copy link

mgravell commented Oct 21, 2022 via email

@jodydonetti
Copy link
Collaborator Author

Better option might be to hook the events on the model that are invoked when discovering a new type; I'm not at a PC to provide an example, but: Before/After something something!

Thanks for the suggestion, will definitely look into it!

@jodydonetti
Copy link
Collaborator Author

Also, if it won't upset the other serializers: the config can be expressed via DataContract/DataMember(Order=...)

Will definitely check this, too: If I remember correctly the 2 classes are already using those attributes (maybe with a Name, and not yet with an Order, but I can add that) but as the code suggests I'm also using a surrogate class for the metadata class, since that in turn is using a DateTimeOffset prop (and that is how I ended up on that other issue on the protobuf-net repo).

Anyway will play with the DataContract/DataMember approach too, thanks!

@jodydonetti
Copy link
Collaborator Author

Hi @mgravell , FYI I've been able to create a minimal repro about the null string issue I was observing, so I opened an issue in the protobuf-net repo.
Hope this helps.

@jodydonetti
Copy link
Collaborator Author

jodydonetti commented Oct 27, 2022

Better option might be to hook the events on the model that are invoked when discovering a new type; I'm not at a PC to provide an example, but: Before/After something something!

Hi @mgravell , thanks for your help.

As per your suggestion I've looked into the before/after events to hook into registering a type just before serializing, but I've only found BeforeApplyDefaultBehaviour and AfterApplyDefaultBehaviour which don't seem to fire when trying to serialize an unregistered type.

Also as said, calls to both IsDefined() and CanSerialize() to check if a type is already registered seem to auto-add them, defeating the purpose of the check, and trying to add the same type multiple times without having checked throws an exception (which in and on itself makes sense).

The only way I've found is one of these 2:

  1. keep an external list of already registered types, to keep track of the ones that still needs to be registered and avoid the exceptions
  2. simply call Add() every time, but inside a try/catch block just to suppress the exception

Both of these are clearly not good solutions imho, so I'm still trying to come up with something better.

Initially I also played with the idea of using a small internal static class like internal static MyTypeCache<T> where I tried to use the static ctor (guaranteed to only be called once etc) and use that to be sure to register a type only once: the problem with such approach is that it would be in practice a singleton, and since the specific TypeModel to use is passed in my serializer ctor for better flexibility, I cannot do that, too.

Any idea?

@jodydonetti
Copy link
Collaborator Author

Update: I've pushed the branch, the serializer code can be found here.

@jodydonetti
Copy link
Collaborator Author

jodydonetti commented Nov 5, 2022

Update: I went with a mix of both options.

I added an internal cache (static/singleton) to keep track of each model's registered types, of course only the ones related to FusionCache, to auto-register them. I also added some locks (with the usual double check) to ease hypothetical concurrency issues that may arise and finally, when registering the types, I also included a try/catch block to avoid any issue with double registration (it should not happen, but I don't have exclusive control over what happens to a model that may be passed in the ctor.

All seems to be fine now: there are no more simply try/catch blocks that would throw every time, the locking seems fine (to me at least) and the little overhead with the internal models/types cache should be negligible, so I'm feeling good overall about the approach.

Will release it soon, and will notify it here too, just to keep track.

Thanks for your support, and if you happen to have any suggestion it would still be more than welcome.

@jodydonetti
Copy link
Collaborator Author

I've released v0.16.0 which includes support for Protobuf 🎉

Thanks again @mgravell for your support!

@jodydonetti jodydonetti unpinned this issue Nov 12, 2022
@jodydonetti jodydonetti changed the title Add Protobuf support [FEATURE] Add Protobuf support Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants