Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support saving/loading models that use OnlineCountVectorizer #1616

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

27pchrisl
Copy link

This PR adds support for serializing the config of the OnlineCountVectorizer by explicitly stating all the constructor properties. Based on the test, it looks like no other changes were needed to support the loading of the vectorizer.

Fixes #1431

@MaartenGr
Copy link
Owner

Thanks for the PR! I am not entirely sure if adding the parameters manually is the right way to go here. What if the parameters of CountVectorizer get updated in a new release of scikit-learn? That would break this functionality. Instead, might it not be possible to update the save_ctfidf_config function to account for the missing parameters by looking at the internal params of the function, either by adding a function to save them or update the get_params function?

@27pchrisl
Copy link
Author

Hi @MaartenGr, no it is certainly not ideal! This was previously mentioned in scikit-learn/scikit-learn#8822 and discussed further in scikit-learn/scikit-learn#13555 - it seems for now this is the way others handle this.

I could add a test case that compared the params of CountVectorizer to a list in the test case, so if the list were changed a test would fail?

@MaartenGr
Copy link
Owner

I could add a test case that compared the params of CountVectorizer to a list in the test case, so if the list were changed a test would fail?

Although that would help detect any issues in newer versions, it will still break the code requiring an immediate update. It is a nice solution if there is not alternative but what about the following I mentioned in the previous message:

Instead, might it not be possible to update the save_ctfidf_config function to account for the missing parameters by looking at the internal params of the function, either by adding a function to save them or update the get_params function?

This could be implemented as shown in the below example: https://stackoverflow.com/a/56491394/10532563

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem with saving the model
2 participants