Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show nested ents results in error #325

Closed
jkgenser opened this issue Jun 9, 2023 · 2 comments
Closed

Show nested ents results in error #325

jkgenser opened this issue Jun 9, 2023 · 2 comments

Comments

@jkgenser
Copy link
Contributor

jkgenser commented Jun 9, 2023

I'm using
spacy == 3.5.3
medcat == 1.7.0

There seems to be an issue when config.general.show_nested_entities = True and the following code path runs. Traceback below.

In particular, it seems the problem is in medcat/cat.py around L1500-1510, reproduce below.

                for _ent in doc._.ents:
                    entity = Span(doc, _ent['start'], _ent['end'], label=_ent['label'])
                    entity._.cui = _ent['cui']
                    entity._.detected_name = _ent['detected_name']
                    entity._.context_similarity = _ent['context_similarity']
                    entity._.id = _ent['id']
                    if 'meta_anns' in _ent:
                        entity._.meta_anns = _ent['meta_anns']
                    _ents.append(entity)

If I replace _ent["start"] with _ent.start (and similar for the other getitem calls which need to be getattirbute) then this code doesn't crash. Perhaps it was previously a dict but now is spacy Span objects and this results in the issue?

Traceback (most recent call last):
  File "/home/j/oler-medcat/src/scripts/basic_eval.py", line 211, in <module>
    main()
  File "/home/j/oler-medcat/src/scripts/basic_eval.py", line 203, in main
    prf_df, global_prf, fpnames_df = score_docs(gold_pages, cat)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/j/oler-medcat/src/scripts/basic_eval.py", line 124, in score_docs
    result = cat.get_entities(gold_doc.text)
             ^^^
  File "/home/j/oler-medcat/src/scripts/basic_eval.py", line 124, in score_docs
    result = cat.get_entities(gold_doc.text)
             ^^^
  File "/usr/lib/python3.11/bdb.py", line 90, in trace_dispatch
    return self.dispatch_line(frame)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/bdb.py", line 115, in dispatch_line
    if self.quitting: raise BdbQuit
@mart-r
Copy link
Collaborator

mart-r commented Jun 15, 2023

I looked into this a little bit. And it seems that part may have been broken for a long time.

The main problem I ran into was not being able to come up with a test case to reach that part of the code. With the limited model available during automated testing, I couldn't find a way to get a Span that has any entries in doc._.ents. Thus, in the cases I was able to come up with, this part of the code didn't run.

We may not have had many people use this part of the library (i.e run stuff with show_nested_entities).

With that said, I've come up with a change that should fix the issue within my testing. PR in a minute.
EDIT: PR #326

This was referenced Jun 15, 2023
@mart-r
Copy link
Collaborator

mart-r commented Jun 26, 2023

Will be included in next release

@mart-r mart-r closed this as completed Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants