Is there a better way to save the model to the disk? #8

fMercatili · 2022-06-17T07:56:05Z

Hello everyone. I am a novice in this field and I really would like to thank you all for this wonderful library that is making my work way easier than I expected!
For a project, I have to do a few-shot classification with a relatively large amount of samples (27 labels with roughly 500 entries each). This process takes about 15 minutes on my computer using the sentence-transformers embeddings. I wanted to save the resulting model to the disk in order to be used again in the future. I tried the instructions of the spacy documentation (https://spacy.io/usage/saving-loading), but I observe a strange behavior: if I use the "to_disk" and "from_disk" methods the model is saved on the disk, but when it's time to load it, it seems to start the loading from zero, taking 15 minutes again to load. I also tried with pickle, but I get the following error at loading time: "AttributeError: [E047] Can't assign a value to unregistered extension attribute 'cats'. Did you forget to call the set_extension method?".
Does anybody know a way to save the model (or the spacy object) to the disk and retrieve it rapidly afterwards?

The text was updated successfully, but these errors were encountered:

davidberenstein1957 · 2022-06-17T08:26:00Z

#4

davidberenstein1957 · 2022-06-17T08:27:06Z

There are ways to save and load these kind of models, but I have not gotten around to properly implementing this yet, since it will take up significant time, and I didn't have too much spare time the past few months.

davidberenstein1957 · 2022-06-17T08:28:44Z

As a work-around, you can use the stand alone usage and simply safe the sklearn classifier as a separate component.

davidberenstein1957 · 2022-06-17T08:43:48Z

Let me know if this makes sense and works for you. If so, it might be an easy hot-fix to include the feature of passing such a classifier within the spacy pipeline along with the embedding model.

fMercatili · 2022-06-17T08:52:07Z

I will give it a try in a couple of hours and let you know. Thanks a lot for your reply for now 😊

fMercatili · 2022-06-17T15:01:22Z

Hi! Thank you again for your help. I finally solved the problem following your advice of using the standalone usage. I saved the entire classifier object with pickle.dump() and retrieved afterward with pickle as well. It only takes a couple of seconds to load the model.
This library is pure gold btw 😍

davidberenstein1957 · 2022-06-17T15:18:57Z

Thanks for the shoutout✌️. Awesome that iw worked! Could you share your code? Then I can formalize it and potentially use it as a base for the spacy save update.

…

On 17 Jun 2022, at 17:01, fMercatili ***@***.***> wrote: Hi! Thank you again for your help. I finally solved the problem following your advice of using the standalone usage. I saved the entire classifier object with pickle.dump() and retrieved afterward with pickle as well. It only takes a couple of seconds to load the model. This library is pure gold btw 😍 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

fMercatili · 2022-06-17T15:51:12Z

Literally four lines of code 😁

data = {
    "furniture": ["This text is about chairs.",
               "Couches, benches and televisions.",
               "I really need to get a new sofa."],
    "kitchen": ["There also exist things like fridges.",
                "I hope to be getting a new stove today.",
                "Do you also have some ovens."]
}
classifier = classyClassifier(data=data)

with open("./classifier.pkl", "wb") as f:
    pickle.dump(classifier, f)

and then to load the model and use it:

f = open("./classifier.pkl", "rb")
classifier = pickle.load(f)
print(classifier("I am looking for kitchen appliances."))

davidberenstein1957 closed this as completed Sep 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a better way to save the model to the disk? #8

Is there a better way to save the model to the disk? #8

fMercatili commented Jun 17, 2022

davidberenstein1957 commented Jun 17, 2022

davidberenstein1957 commented Jun 17, 2022

davidberenstein1957 commented Jun 17, 2022

davidberenstein1957 commented Jun 17, 2022

fMercatili commented Jun 17, 2022

fMercatili commented Jun 17, 2022

davidberenstein1957 commented Jun 17, 2022 via email

fMercatili commented Jun 17, 2022

Is there a better way to save the model to the disk? #8

Is there a better way to save the model to the disk? #8

Comments

fMercatili commented Jun 17, 2022

davidberenstein1957 commented Jun 17, 2022

davidberenstein1957 commented Jun 17, 2022

davidberenstein1957 commented Jun 17, 2022

davidberenstein1957 commented Jun 17, 2022

fMercatili commented Jun 17, 2022

fMercatili commented Jun 17, 2022

davidberenstein1957 commented Jun 17, 2022 via email

fMercatili commented Jun 17, 2022