Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What would be the best way to save a machine learning model parameters ? #35

Closed
MasterDimensio opened this issue Sep 9, 2018 · 9 comments

Comments

@MasterDimensio
Copy link

No description provided.

@MasterDimensio
Copy link
Author

Sorry, posted with enter too soon.
I'm using rusty machine and wonder if it is possible to save the machine learning model parameters to a save file after training ? What would be the best way to do so, respecting all sgx constraints ?
Thank you very much for your library btw

@gskapka
Copy link
Contributor

gskapka commented Sep 10, 2018

Would the sealeddata sample code fit your needs? Sealing is a way enclaves can persist information in a secure way, which sounds like the sort of thing you're after.

@dingelish
Copy link
Contributor

@MasterDimensio

Don't know if my understanding is right. Let's use an example here.

58     let mut knn = KNNClassifier::new(1);
59     let _ = knn.train(&data, &dataset.target()).unwrap();
60     let res = knn.predict(&matrix![5.9, 3.6]).unwrap();
61     assert_eq!(res, Vector::new(vec![1]));

I guess you may want do something like save_to_disk(knn, "model.bin") and later let knn : KNNClassifier = load_knn_from_disk("model.bin") or something identical.

In this case, it requires a "serialization" function implemented for type KNNClassifier, which depends on "serialization" of binary_tree::KDTree. Only if we could serialize the KDTree to a string could we serialize KNNClassifier and then dump it to external storage.

Is that correct?

@MasterDimensio
Copy link
Author

MasterDimensio commented Sep 11, 2018

@gskapka I didn't think about it that way, that's a great idea thanks !

@dingelish That's exactly what I was looking for, do I have to code my own serialization function in rust then ?

@dingelish
Copy link
Contributor

@MasterDimensio

Personally I would recommend using serde suites (serde + serde_derive + serde_json). However, it seems that the data structure dependencies seem to be complex.

I think you can try write your own serialization and deserialization algorithm which directly convert an object to a string and vice versa. Meanwhile I'm trying to combine serde suites to rusty-machine and its dependencies for a more general solution (but may slower).

@dingelish
Copy link
Contributor

87f663d is a candidate to support ser/de in knn

@MasterDimensio please check if this is what you want.

Please be cautious that this kind of serialization would lost precision on float numbers! and cannot work on NaN and cause unwrap failures.

My solution to this kind of problem is to convert floats to u32 or u64 bytes. I've done this in the sgxwasm project. The BoundaryValue and related funcs are designed to be an additional layer for passing floats between untrusted and trusted without any bit of loss.

I think similar solution could be adopted in this scenario. Need more work on rulinalg and rusty-machine.

@dingelish
Copy link
Contributor

7334c30 provides more ser/de in rusty-machine.

Now only the NN model is incomplete because it depends on trait objects which are not suitable for ser/de. The current solution leverages erased_serde on trait objects. However, it could not do deserialization.

Test cases wanted.

@MasterDimensio
Copy link
Author

Wow thank you so much for that ! I'm currently travelling but I should be able to test all of this by the end of the week, I'll keep you updated by Monday !
Again thanks a lot !

@MasterDimensio
Copy link
Author

This seems to work for me, at least for K-means and logistic regression, which were the ones I needed. Thanks a lot !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants