Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization in .Net 5.0 #371

Closed
YnnamTenob opened this issue Oct 18, 2021 · 14 comments
Closed

Serialization in .Net 5.0 #371

YnnamTenob opened this issue Oct 18, 2021 · 14 comments

Comments

@YnnamTenob
Copy link

YnnamTenob commented Oct 18, 2021

I have recently met with an issue where saving and serializating a model in .Net 5.0 (i.e. in .Net Notebooks or in Azure functions) is met with a Binary Serialization error. As far as I understand this error occurs due to the Security Concerns arround binary serialization. The recommended method of dealing with the Binary serailizer incompatibility is the implementation of the ISafeSerialization. After reviewing the code base in this repo that would require modifications all the way down to the Automaton to allow for safe serialization of models particularly for models that implement Incremental Learning hence need to be saved and serialized in the runtime environment. Is there a solution for this problem other than the obvious one I've mentioned above?

@YnnamTenob YnnamTenob mentioned this issue Oct 18, 2021
6 tasks
@YnnamTenob YnnamTenob changed the title Serialization in .Net Core Serialization in .Net 5.0 Oct 18, 2021
@tminka
Copy link
Contributor

tminka commented Oct 19, 2021

Try one of the other forms of serialization described at How to save distributions to disk.

@YnnamTenob
Copy link
Author

Thank you very much @tminka

@YnnamTenob
Copy link
Author

Hi @tminka I tried serializing using the Json Format as specified in the link you gave above. It serializes an object however said object cannot be deserialized.

The type that BayesPointMachineClassifier.CreateBinaryClassifier() creates is a:

CompoundBinaryStandardDataFormatBayesPointMachineClassifier<IList,Int32,.IList,Boolean>

However CompoundBinaryStandardDataFormatBayesPointMachineClassifier is internal to the library and cannot be deserialized, and as far as I know the interface that it implements also cannot be deserialized:

IBayesPointMachineClassifier<TInstanceSource, TInstance, TLabelSource, TStandardLabel, IDictionary<TStandardLabel, double>, TTrainingSettings, TPredictionSettings>

here is the json for the document that gets serialized. can you provide further guidance

{"$id":"1","Capabilities":{"$type":"Microsoft.ML.Probabilistic.Learners.BayesPointMachineClassifierCapabilities, Microsoft.ML.Probabilistic.Learners.Classifier","IsPrecompiled":true,"SupportsMissingData":false,"SupportsSparseData":true,"SupportsStreamedData":false,"SupportsBatchedTraining":true,"SupportsDistributedTraining":false,"SupportsIncrementalTraining":true,"SupportsModelEvidenceComputation":true,"SupportsCustomPredictionLossFunction":true},"Settings":{"$type":"Microsoft.ML.Probabilistic.Learners.BinaryBayesPointMachineClassifierSettings1[[System.Boolean, System.Private.CoreLib]], Microsoft.ML.Probabilistic.Learners.Classifier","Training":{"ComputeModelEvidence":true,"IterationCount":30,"BatchCount":1},"Prediction":{}},"LogModelEvidence":3.97958442052054E+58,"WeightPosteriorDistributions":[[{"$id":"2","MeanTimesPrecision":626025.5042802754,"Precision":82322.75332875556},{"$id":"3","MeanTimesPrecision":30362.83511852963,"Precision":51130.19043756452},{"$id":"4","MeanTimesPrecision":12859.264686436629,"Precision":29585.013866421086},{"$id":"5","MeanTimesPrecision":32502.261634344814,"Precision":54865.49022045436},{"$id":"6","MeanTimesPrecision":-59793.9994307964,"Precision":51650.08331815232},{"$id":"7","MeanTimesPrecision":-970304.615442721,"Precision":82088.09398443818},{"$id":"8","MeanTimesPrecision":-13736.474440944254,"Precision":1070.108999079218},{"$id":"9","MeanTimesPrecision":0.0,"Precision":2.1356716578201329E-32},{"$id":"10","MeanTimesPrecision":0.0,"Precision":2.1356716578201329E-32},{"$id":"11","MeanTimesPrecision":-1702.6201262384427,"Precision":34903.860626653935},{"$id":"12","MeanTimesPrecision":8658.962632282977,"Precision":37373.63634200442},{"$id":"13","MeanTimesPrecision":-2648.267940949969,"Precision":33252.09404733848},{"$id":"14","MeanTimesPrecision":-1657.4090277754572,"Precision":41051.78020934264},{"$id":"15","MeanTimesPrecision":-8279.877894003368,"Precision":49464.953898461245},{"$id":"16","MeanTimesPrecision":11817.810324595444,"Precision":38585.50544514171},{"$id":"17","MeanTimesPrecision":-998.9685662433118,"Precision":42659.79079697079},{"$id":"18","MeanTimesPrecision":567.713458343229,"Precision":45060.405379882264},{"$id":"19","MeanTimesPrecision":-4866.908171070479,"Precision":33867.34682744422},{"$id":"20","MeanTimesPrecision":-2212.440868850474,"Precision":38654.43235522826},{"$id":"21","MeanTimesPrecision":2033.5012976530375,"Precision":41472.83605039607},{"$id":"22","MeanTimesPrecision":7276.96940891261,"Precision":39560.217427066855},{"$id":"23","MeanTimesPrecision":-474.6755519876042,"Precision":40996.033982061395},{"$id":"24","MeanTimesPrecision":-496.2688900091968,"Precision":41273.181220511135},{"$id":"25","MeanTimesPrecision":-2619.249074880769,"Precision":37094.117068140185},{"$id":"26","MeanTimesPrecision":10753.914960150307,"Precision":36489.15439825997},{"$id":"27","MeanTimesPrecision":0.0,"Precision":2.1356716578201329E-32},{"$id":"28","MeanTimesPrecision":0.0,"Precision":2.1356716578201329E-32},{"$id":"29","MeanTimesPrecision":-3151.8968041883113,"Precision":45325.11342413276},{"$id":"30","MeanTimesPrecision":-1607.6089215679199,"Precision":41735.45510874064},{"$id":"31","MeanTimesPrecision":7864.585933719726,"Precision":38974.62441225201},{"$id":"32","MeanTimesPrecision":-7983.75322009148,"Precision":42422.11953163474},{"$id":"33","MeanTimesPrecision":-390.77278939691513,"Precision":41665.61608825006},{"$id":"34","MeanTimesPrecision":3631.6023830937147,"Precision":40453.76803177451},{"$id":"35","MeanTimesPrecision":-76.39421668042813,"Precision":40111.4519278756},{"$id":"36","MeanTimesPrecision":2424.0668058261167,"Precision":40473.36147362701},{"$id":"37","MeanTimesPrecision":-4011.2590679123623,"Precision":38564.93567352199},{"$id":"38","MeanTimesPrecision":4212.714325147318,"Precision":42481.44758418159},{"$id":"39","MeanTimesPrecision":-3209.426216690391,"Precision":43270.95146322725},{"$id":"40","MeanTimesPrecision":-6828.664619226283,"Precision":41466.37962146438},{"$id":"41","MeanTimesPrecision":-6025.0126948113775,"Precision":41146.139203613944},{"$id":"42","MeanTimesPrecision":-630.2583456838215,"Precision":39252.518898957285},{"$id":"43","MeanTimesPrecision":-3217.7944236199496,"Precision":45003.1837823578},{"$id":"44","MeanTimesPrecision":-2795.307850582982,"Precision":38556.98398877265},{"$id":"45","MeanTimesPrecision":0.0,"Precision":2.1356716578201329E-32},{"$id":"46","MeanTimesPrecision":0.0,"Precision":2.1356716578201329E-32},{"$id":"47","MeanTimesPrecision":4885.973157513242,"Precision":48170.67997592764},{"$id":"48","MeanTimesPrecision":146.57635452268178,"Precision":43660.12391360518},{"$id":"49","MeanTimesPrecision":-620.6493272039843,"Precision":39707.65848153476},{"$id":"50","MeanTimesPrecision":-1020.2163019830665,"Precision":38516.18540140339},{"$id":"51","MeanTimesPrecision":1637.335214187853,"Precision":47097.1747684993},{"$id":"52","MeanTimesPrecision":-156.0525226517272,"Precision":43713.92351849498},{"$id":"53","MeanTimesPrecision":47.991145678774004,"Precision":44963.71262570452},{"$id":"54","MeanTimesPrecision":502.57591237162745,"Precision":43158.944587158614},{"$id":"55","MeanTimesPrecision":1077.817943476354,"Precision":37157.60540980723},{"$id":"56","MeanTimesPrecision":2606.6733501357394,"Precision":43503.10519271262},{"$id":"57","MeanTimesPrecision":-6082.686575452986,"Precision":43900.58114542141},{"$id":"58","MeanTimesPrecision":413.52291401773243,"Precision":45070.26501218828},{"$id":"59","MeanTimesPrecision":2531.300641271793,"Precision":42414.85495110738},{"$id":"60","MeanTimesPrecision":-190.69089873335562,"Precision":41727.10464514889},{"$id":"61","MeanTimesPrecision":-2948.5967275258836,"Precision":42778.8320003398},{"$id":"62","MeanTimesPrecision":71.35541670046288,"Precision":36509.54611718066},{"$id":"63","MeanTimesPrecision":8392763.756970031,"Precision":2605902.5064716735},{"$id":"64","MeanTimesPrecision":-1135155.6708183656,"Precision":83157.7942655128}]]}`

@YnnamTenob YnnamTenob reopened this Nov 9, 2021
@tminka
Copy link
Contributor

tminka commented Nov 15, 2021

How were you able to serialize that type using Json? It isn't designed to do that.

@YnnamTenob
Copy link
Author

YnnamTenob commented Nov 22, 2021

@tminka sorry for the delay I was out on PTO. here is what I did?

`using Net.Json;
using Newtonsoft.Json;
using Newtonsoft.Json.Serialization;
using System.Collections.Concurrent;
using Microsoft.ML.Probabilistic.Collections;
using Microsoft.ML.Probabilistic.Learners.BayesPointMachineClassifierInternal;

var modelFile = "IdentifierSearchScoringModel.json";

class CollectionAsObjectResolver : DefaultContractResolver
{
private static readonly HashSet SerializeAsObjectTypes = new HashSet
{
typeof(Vector),
typeof(Matrix),
typeof(IArray<>),
typeof(ISparseList<>)
};

private static readonly ConcurrentDictionary<Type, JsonContract> ResolvedContracts = new ConcurrentDictionary<Type, JsonContract>();
public override JsonContract ResolveContract(Type type) => ResolvedContracts.GetOrAdd(type, this.ResolveContractInternal);
private JsonContract ResolveContractInternal(Type type) => IsExcludedType(type)? this.CreateObjectContract(type): this.CreateContract(type);
private static bool IsExcludedType(Type type)
{
if (type == null) return false;
if (SerializeAsObjectTypes.Contains(type)) return true;
if (type.IsGenericType && SerializeAsObjectTypes.Contains(type.GetGenericTypeDefinition())) return true;
return IsExcludedType(type.BaseType) || type.GetInterfaces().Any(IsExcludedType);
}
}

var serializerSettings = new JsonSerializerSettings {
TypeNameHandling = TypeNameHandling.Auto,
ContractResolver = new CollectionAsObjectResolver(),
PreserveReferencesHandling = PreserveReferencesHandling.Objects
};
var serializer = JsonSerializer.Create(serializerSettings);

var mapping = new ClassifierMapping();
var classifier = BayesPointMachineClassifier.CreateBinaryClassifier(mapping);
classifier.Settings.Training.ComputeModelEvidence = true;
// Train the Bayes Point Machine classifier
classifier.Train(trainResult.features, trainResult.labels);

// write to disk
using (FileStream stream = new FileStream($"{simulationPath}/{dataVersion}/Engineered/{modelFile}", FileMode.Create))
{
var streamWriter = new StreamWriter(stream);
var jsonWriter = new JsonTextWriter(streamWriter);
serializer.Serialize(jsonWriter, classifier);
jsonWriter.Flush();
}`

@YnnamTenob
Copy link
Author

So if it isn't designed to do that how can I serialize a BPM model in .Net 5.0?

@YnnamTenob
Copy link
Author

YnnamTenob commented Nov 28, 2021

Hi @tminka,

Do you have any advice here? Not that it is any of your problem but I am coming up on a hard deadline and if I am not able to deserialize the model in .Net 5.0 I will have to scrap this approach.

Best,
MB

@tminka
Copy link
Contributor

tminka commented Nov 29, 2021

The code that you sent doesn't work for me. As far as I can tell, Json.NET cannot serialize these classes at all. That is why I was confused how you got that output. The Learner classes all implement custom binary serialization. They would have to be changed to support any other form of serialization.

@YnnamTenob
Copy link
Author

Thanks. I rrun that code in a .Net notebook on Azure Machine Learning. It serializes all of the child classes but does not serilize the top level class. One final question since I have the MeanTimePRecision and Precision can I use the generated algorithm to do inference.

@tminka
Copy link
Contributor

tminka commented Nov 29, 2021

Yes, if you are just making predictions then you can use the generated algorithm directly.

@tminka
Copy link
Contributor

tminka commented Nov 29, 2021

I have created PR #373 which adds the ability to serialize BayesPointMachineClassifiers as text.

@YnnamTenob
Copy link
Author

@tminka Thank You. This is great.

@YnnamTenob
Copy link
Author

HiI @tminka,

When will PR #373 make it to a released Nuget Package?

@tminka
Copy link
Contributor

tminka commented Dec 9, 2021

I will update the Nuget package this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants