Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a better way to save the model to the disk? #8

Closed
fMercatili opened this issue Jun 17, 2022 · 8 comments
Closed

Is there a better way to save the model to the disk? #8

fMercatili opened this issue Jun 17, 2022 · 8 comments

Comments

@fMercatili
Copy link

Hello everyone. I am a novice in this field and I really would like to thank you all for this wonderful library that is making my work way easier than I expected!
For a project, I have to do a few-shot classification with a relatively large amount of samples (27 labels with roughly 500 entries each). This process takes about 15 minutes on my computer using the sentence-transformers embeddings. I wanted to save the resulting model to the disk in order to be used again in the future. I tried the instructions of the spacy documentation (https://spacy.io/usage/saving-loading), but I observe a strange behavior: if I use the "to_disk" and "from_disk" methods the model is saved on the disk, but when it's time to load it, it seems to start the loading from zero, taking 15 minutes again to load. I also tried with pickle, but I get the following error at loading time: "AttributeError: [E047] Can't assign a value to unregistered extension attribute 'cats'. Did you forget to call the set_extension method?".
Does anybody know a way to save the model (or the spacy object) to the disk and retrieve it rapidly afterwards?

@davidberenstein1957
Copy link
Owner

#4

@davidberenstein1957
Copy link
Owner

There are ways to save and load these kind of models, but I have not gotten around to properly implementing this yet, since it will take up significant time, and I didn't have too much spare time the past few months.

@davidberenstein1957
Copy link
Owner

As a work-around, you can use the stand alone usage and simply safe the sklearn classifier as a separate component.

@davidberenstein1957
Copy link
Owner

Let me know if this makes sense and works for you. If so, it might be an easy hot-fix to include the feature of passing such a classifier within the spacy pipeline along with the embedding model.

@fMercatili
Copy link
Author

I will give it a try in a couple of hours and let you know. Thanks a lot for your reply for now 😊

@fMercatili
Copy link
Author

Hi! Thank you again for your help. I finally solved the problem following your advice of using the standalone usage. I saved the entire classifier object with pickle.dump() and retrieved afterward with pickle as well. It only takes a couple of seconds to load the model.
This library is pure gold btw 😍

@davidberenstein1957
Copy link
Owner

davidberenstein1957 commented Jun 17, 2022 via email

@fMercatili
Copy link
Author

Literally four lines of code 😁

data = {
    "furniture": ["This text is about chairs.",
               "Couches, benches and televisions.",
               "I really need to get a new sofa."],
    "kitchen": ["There also exist things like fridges.",
                "I hope to be getting a new stove today.",
                "Do you also have some ovens."]
}
classifier = classyClassifier(data=data)

with open("./classifier.pkl", "wb") as f:
    pickle.dump(classifier, f)

and then to load the model and use it:

f = open("./classifier.pkl", "rb")
classifier = pickle.load(f)
print(classifier("I am looking for kitchen appliances."))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants