AttributeError: 'NoneType' object has no attribute 'text' when using nlp.pipe() #5

coltonpeltier-db · 2022-10-24T17:57:16Z

Hi, when I process multiple text documents as a batch, I have failure with the error message: AttributeError: 'NoneType' object has no attribute 'text'. However, processing each text document by itself produces no such error. Here is a easy to reproduce example:

docs = ["""String of 126 characters. String of 126 characters. String of 126 characters. String of 126 characters. String of 126 characte""","""Any string which is 93 characters. Any string which is 93 characters. Any string which is 93 """]
nlp = spacy.blank("en")
nlp.add_pipe("opentapioca")
for doc in nlp.pipe(docs):
    print(doc)

Fulll stack trace below:

AttributeError                            Traceback (most recent call last)
<command-370658210397732> in <module>
      4 nlp = spacy.blank("en")
      5 nlp.add_pipe("opentapioca")
----> 6 for doc in nlp.pipe(docs):
      7     print(doc)

/databricks/python/lib/python3.8/site-packages/spacy/language.py in pipe(self, texts, as_tuples, batch_size, disable, component_cfg, n_process)
   1570         else:
   1571             # if n_process == 1, no processes are forked.
-> 1572             docs = (self._ensure_doc(text) for text in texts)
   1573             for pipe in pipes:
   1574                 docs = pipe(docs)

/databricks/python/lib/python3.8/site-packages/spacy/util.py in _pipe(docs, proc, name, default_error_handler, kwargs)
   1597     if hasattr(proc, "pipe"):
   1598         yield from proc.pipe(docs, **kwargs)
-> 1599     else:
   1600         # We added some args for pipe that __call__ doesn't expect.
   1601         kwargs = dict(kwargs)

/databricks/python/lib/python3.8/site-packages/spacyopentapioca/entity_linker.py in pipe(self, stream, batch_size)
    117                     self.make_request, doc): doc for doc in docs}
    118                 for doc, future in zip(docs, concurrent.futures.as_completed(future_to_url)):
--> 119                     yield self.process_single_doc_after_call(doc, future.result())

/databricks/python/lib/python3.8/site-packages/spacyopentapioca/entity_linker.py in process_single_doc_after_call(self, doc, r)
     66                                      alignment_mode='expand')
     67                 log.warning('The OpenTapioca-entity "%s" %s does not fit the span "%s" %s in spaCy. EXPANDED!',
---> 68                             ent['tags'][0]['label'][0], (start, end), span.text, (span.start_char, span.end_char))
     69             span._.annotations = ent
     70             span._.description = ent['tags'][0]['desc']

AttributeError: 'NoneType' object has no attribute 'text'

I don't know what about the lengths of the strings causes an issue, but they do seem to matter in some way. Adding or removing a couple characters from either string can resolve the issue.

The text was updated successfully, but these errors were encountered:

davidberenstein1957 · 2022-10-24T18:18:09Z

@coltonpeltier-db I will take a look this week👍

davidberenstein1957 · 2022-10-25T09:36:20Z

https://stackoverflow.com/questions/69976538/spacy-preparing-training-data-doc-char-span-returning-none
For some reason the returned payload isn´t processed correctly (might be to the API rate limit), returning in a mis-match between characters ranges and overlapping tokens resulting in faulty spaCy spans, leading to None values from doc.char_span.

davidberenstein1957 · 2022-11-03T07:19:05Z

I will take a further look on coming friday. @coltonpeltier-db

Hmkhalla · 2022-11-12T21:14:47Z

Hello there!
Thanks for your work, did you manage to solve it?
I have the same issue with tutorial:

nlp = spacy.blank("en")
nlp.add_pipe('opentapioca')
docs = nlp.pipe(
    [
        "Christian Drosten works in Germany.",
        "Momofuku Ando was born in Japan."
    ]
)
for doc in docs:
    for span in doc.ents:
        print((span.text, span.kb_id_, span.label_, span._.description, span._.score))

I got this

WARNING:spacyopentapioca.entity_linker:The OpenTapioca-entity "M" (0, 13) does not fit the span "Christian Drosten" (0, 17) in spaCy. EXPANDED!
WARNING:spacyopentapioca.entity_linker:The OpenTapioca-entity "J" (26, 31) does not fit the span "Germany" (27, 34) in spaCy. EXPANDED!

('Christian Drosten', 'Q317858', 'PERSON', 'Taiwanese-Japanese businessman', 3.6012208212234302)
('Germany', 'Q17', 'LOC', 'sovereign state in East Asia, situated on an archipelago of five main and over 6,800 smaller islands', 2.349944834167907)

AttributeError                            Traceback (most recent call last)

[<ipython-input-28-f9e1a4c2eead>](https://localhost:8080/#) in <module>
      8     ]
      9 )
---> 10 for doc in docs:
     11     for span in doc.ents:
     12         print((span.text, span.kb_id_, span.label_, span._.description, span._.score))

3 frames

[/usr/local/lib/python3.7/dist-packages/spacy/language.py](https://localhost:8080/#) in pipe(self, texts, as_tuples, batch_size, disable, component_cfg, n_process)
   1587             for pipe in pipes:
   1588                 docs = pipe(docs)
-> 1589         for doc in docs:
   1590             yield doc
   1591 

[/usr/local/lib/python3.7/dist-packages/spacy/util.py](https://localhost:8080/#) in _pipe(docs, proc, name, default_error_handler, kwargs)
   1649 ) -> Iterator["Doc"]:
   1650     if hasattr(proc, "pipe"):
-> 1651         yield from proc.pipe(docs, **kwargs)
   1652     else:
   1653         # We added some args for pipe that __call__ doesn't expect.

[/usr/local/lib/python3.7/dist-packages/spacyopentapioca/entity_linker.py](https://localhost:8080/#) in pipe(self, stream, batch_size)
    117                     self.make_request, doc): doc for doc in docs}
    118                 for doc, future in zip(docs, concurrent.futures.as_completed(future_to_url)):
--> 119                     yield self.process_single_doc_after_call(doc, future.result())

[/usr/local/lib/python3.7/dist-packages/spacyopentapioca/entity_linker.py](https://localhost:8080/#) in process_single_doc_after_call(self, doc, r)
     66                                      alignment_mode='expand')
     67                 log.warning('The OpenTapioca-entity "%s" %s does not fit the span "%s" %s in spaCy. EXPANDED!',
---> 68                             ent['tags'][0]['label'][0], (start, end), span.text, (span.start_char, span.end_char))
     69             span._.annotations = ent
     70             span._.description = ent['tags'][0]['desc']

AttributeError: 'NoneType' object has no attribute 'text'

shigapov · 2022-11-15T15:17:42Z

@coltonpeltier-db, is the issue resolved now after the commit by @Hmkhalla? If yes, I'll close this and update the code to a new version.

coltonpeltier-db · 2022-11-15T15:57:44Z

Just tested with my example above, can't replicate the issue anymore. Thanks @Hmkhalla !

Hmkhalla mentioned this issue Nov 13, 2022

fixing concurrent work and input matching #7

Merged

coltonpeltier-db closed this as completed Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'NoneType' object has no attribute 'text' when using nlp.pipe() #5

AttributeError: 'NoneType' object has no attribute 'text' when using nlp.pipe() #5

coltonpeltier-db commented Oct 24, 2022

davidberenstein1957 commented Oct 24, 2022

davidberenstein1957 commented Oct 25, 2022 •

edited

davidberenstein1957 commented Nov 3, 2022

Hmkhalla commented Nov 12, 2022 •

edited

shigapov commented Nov 15, 2022

coltonpeltier-db commented Nov 15, 2022

AttributeError: 'NoneType' object has no attribute 'text' when using nlp.pipe() #5

AttributeError: 'NoneType' object has no attribute 'text' when using nlp.pipe() #5

Comments

coltonpeltier-db commented Oct 24, 2022

davidberenstein1957 commented Oct 24, 2022

davidberenstein1957 commented Oct 25, 2022 • edited

davidberenstein1957 commented Nov 3, 2022

Hmkhalla commented Nov 12, 2022 • edited

shigapov commented Nov 15, 2022

coltonpeltier-db commented Nov 15, 2022

davidberenstein1957 commented Oct 25, 2022 •

edited

Hmkhalla commented Nov 12, 2022 •

edited