Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'NoneType' object has no attribute 'text' when using nlp.pipe() #5

Closed
coltonpeltier-db opened this issue Oct 24, 2022 · 6 comments

Comments

@coltonpeltier-db
Copy link

Hi, when I process multiple text documents as a batch, I have failure with the error message: AttributeError: 'NoneType' object has no attribute 'text'. However, processing each text document by itself produces no such error. Here is a easy to reproduce example:

docs = ["""String of 126 characters. String of 126 characters. String of 126 characters. String of 126 characters. String of 126 characte""","""Any string which is 93 characters. Any string which is 93 characters. Any string which is 93 """]
nlp = spacy.blank("en")
nlp.add_pipe("opentapioca")
for doc in nlp.pipe(docs):
    print(doc)

Fulll stack trace below:

AttributeError                            Traceback (most recent call last)
<command-370658210397732> in <module>
      4 nlp = spacy.blank("en")
      5 nlp.add_pipe("opentapioca")
----> 6 for doc in nlp.pipe(docs):
      7     print(doc)

/databricks/python/lib/python3.8/site-packages/spacy/language.py in pipe(self, texts, as_tuples, batch_size, disable, component_cfg, n_process)
   1570         else:
   1571             # if n_process == 1, no processes are forked.
-> 1572             docs = (self._ensure_doc(text) for text in texts)
   1573             for pipe in pipes:
   1574                 docs = pipe(docs)

/databricks/python/lib/python3.8/site-packages/spacy/util.py in _pipe(docs, proc, name, default_error_handler, kwargs)
   1597     if hasattr(proc, "pipe"):
   1598         yield from proc.pipe(docs, **kwargs)
-> 1599     else:
   1600         # We added some args for pipe that __call__ doesn't expect.
   1601         kwargs = dict(kwargs)

/databricks/python/lib/python3.8/site-packages/spacyopentapioca/entity_linker.py in pipe(self, stream, batch_size)
    117                     self.make_request, doc): doc for doc in docs}
    118                 for doc, future in zip(docs, concurrent.futures.as_completed(future_to_url)):
--> 119                     yield self.process_single_doc_after_call(doc, future.result())

/databricks/python/lib/python3.8/site-packages/spacyopentapioca/entity_linker.py in process_single_doc_after_call(self, doc, r)
     66                                      alignment_mode='expand')
     67                 log.warning('The OpenTapioca-entity "%s" %s does not fit the span "%s" %s in spaCy. EXPANDED!',
---> 68                             ent['tags'][0]['label'][0], (start, end), span.text, (span.start_char, span.end_char))
     69             span._.annotations = ent
     70             span._.description = ent['tags'][0]['desc']

AttributeError: 'NoneType' object has no attribute 'text'

I don't know what about the lengths of the strings causes an issue, but they do seem to matter in some way. Adding or removing a couple characters from either string can resolve the issue.

@davidberenstein1957
Copy link
Contributor

@coltonpeltier-db I will take a look this week👍

@davidberenstein1957
Copy link
Contributor

davidberenstein1957 commented Oct 25, 2022

https://stackoverflow.com/questions/69976538/spacy-preparing-training-data-doc-char-span-returning-none
For some reason the returned payload isn´t processed correctly (might be to the API rate limit), returning in a mis-match between characters ranges and overlapping tokens resulting in faulty spaCy spans, leading to None values from doc.char_span.

@davidberenstein1957
Copy link
Contributor

I will take a further look on coming friday. @coltonpeltier-db

@Hmkhalla
Copy link
Contributor

Hmkhalla commented Nov 12, 2022

Hello there!
Thanks for your work, did you manage to solve it?
I have the same issue with tutorial:

nlp = spacy.blank("en")
nlp.add_pipe('opentapioca')
docs = nlp.pipe(
    [
        "Christian Drosten works in Germany.",
        "Momofuku Ando was born in Japan."
    ]
)
for doc in docs:
    for span in doc.ents:
        print((span.text, span.kb_id_, span.label_, span._.description, span._.score))

I got this

WARNING:spacyopentapioca.entity_linker:The OpenTapioca-entity "M" (0, 13) does not fit the span "Christian Drosten" (0, 17) in spaCy. EXPANDED!
WARNING:spacyopentapioca.entity_linker:The OpenTapioca-entity "J" (26, 31) does not fit the span "Germany" (27, 34) in spaCy. EXPANDED!

('Christian Drosten', 'Q317858', 'PERSON', 'Taiwanese-Japanese businessman', 3.6012208212234302)
('Germany', 'Q17', 'LOC', 'sovereign state in East Asia, situated on an archipelago of five main and over 6,800 smaller islands', 2.349944834167907)
AttributeError                            Traceback (most recent call last)

[<ipython-input-28-f9e1a4c2eead>](https://localhost:8080/#) in <module>
      8     ]
      9 )
---> 10 for doc in docs:
     11     for span in doc.ents:
     12         print((span.text, span.kb_id_, span.label_, span._.description, span._.score))

3 frames

[/usr/local/lib/python3.7/dist-packages/spacy/language.py](https://localhost:8080/#) in pipe(self, texts, as_tuples, batch_size, disable, component_cfg, n_process)
   1587             for pipe in pipes:
   1588                 docs = pipe(docs)
-> 1589         for doc in docs:
   1590             yield doc
   1591 

[/usr/local/lib/python3.7/dist-packages/spacy/util.py](https://localhost:8080/#) in _pipe(docs, proc, name, default_error_handler, kwargs)
   1649 ) -> Iterator["Doc"]:
   1650     if hasattr(proc, "pipe"):
-> 1651         yield from proc.pipe(docs, **kwargs)
   1652     else:
   1653         # We added some args for pipe that __call__ doesn't expect.

[/usr/local/lib/python3.7/dist-packages/spacyopentapioca/entity_linker.py](https://localhost:8080/#) in pipe(self, stream, batch_size)
    117                     self.make_request, doc): doc for doc in docs}
    118                 for doc, future in zip(docs, concurrent.futures.as_completed(future_to_url)):
--> 119                     yield self.process_single_doc_after_call(doc, future.result())

[/usr/local/lib/python3.7/dist-packages/spacyopentapioca/entity_linker.py](https://localhost:8080/#) in process_single_doc_after_call(self, doc, r)
     66                                      alignment_mode='expand')
     67                 log.warning('The OpenTapioca-entity "%s" %s does not fit the span "%s" %s in spaCy. EXPANDED!',
---> 68                             ent['tags'][0]['label'][0], (start, end), span.text, (span.start_char, span.end_char))
     69             span._.annotations = ent
     70             span._.description = ent['tags'][0]['desc']

AttributeError: 'NoneType' object has no attribute 'text'

@shigapov
Copy link
Collaborator

@coltonpeltier-db, is the issue resolved now after the commit by @Hmkhalla? If yes, I'll close this and update the code to a new version.

@coltonpeltier-db
Copy link
Author

Just tested with my example above, can't replicate the issue anymore. Thanks @Hmkhalla !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants