Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a note about make_serializable argument #484

Merged
merged 1 commit into from
Aug 16, 2023

Conversation

JohnGiorgi
Copy link
Contributor

By default the abbreviation detector pipe is not serializable, so you run into issues when you try to serialize any docs processed with it:

from spacy.tokens import DocBin

nlp = spacy.load("en_core_sci_sm")
nlp.add_pipe("abbreviation_detector")

doc_bin = DocBin(store_user_data=True)
doc = nlp("Spinal and bulbar muscular atrophy (SBMA) is an inherited motor neuron disease caused by the expansion of a polyglutamine tract within the androgen receptor (AR). SBMA can be caused by this easily.")
doc_bin.add(doc)
# Throws an error: TypeError: can not serialize 'spacy.tokens.span.Span' object

It took me a while to figure out this is easily solved with the make_serializable parameter, but it's not documented anywhere so I am proposing to add a short note in the readme about it.

@JohnGiorgi
Copy link
Contributor Author

Also worth asking, is there any reason for make_serializable not to default to True?

@MichalMalyska
Copy link
Contributor

Also worth asking, is there any reason for make_serializable not to default to True?

It was added while there was a lot of weirdness with multiprocessing in spacy (#368) so I just set it to be false by default just in case if I remember correctly. This is a very funny place to meet again @JohnGiorgi btw.

Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry I never approved this!

@dakinggg dakinggg merged commit 2081f77 into allenai:main Aug 16, 2023
1 check passed
@JohnGiorgi JohnGiorgi deleted the patch-1 branch August 28, 2023 18:31
@JohnGiorgi
Copy link
Contributor Author

Also worth asking, is there any reason for make_serializable not to default to True?

It was added while there was a lot of weirdness with multiprocessing in spacy (#368) so I just set it to be false by default just in case if I remember correctly. This is a very funny place to meet again @JohnGiorgi btw.

Haha good to hear from you again @MichalMalyska!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants