Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index error when running TokenGazetteer #93

Closed
johann-petrak opened this issue Apr 12, 2021 · 6 comments
Closed

Index error when running TokenGazetteer #93

johann-petrak opened this issue Apr 12, 2021 · 6 comments
Labels
bug Something isn't working

Comments

@johann-petrak
Copy link
Collaborator

See #92

@johann-petrak johann-petrak added the bug Something isn't working label Apr 12, 2021
@johann-petrak
Copy link
Collaborator Author

Copied traceback info:

IndexError Traceback (most recent call last)
in
8 doc2 = Annie(doc1)
9 properdoc = ProperDoc(doc1)
---> 10 gazdoc = GazDet(properdoc)
11 for ann in gazdoc.annset("Resume"):
12 doc2.annset("Resume").add_ann(ann)

in GazDet(doc)
5 for typ in details:
6 tgaz = TokenGazetteer("data/" + typ + ".def", fmt="gate-def", annset="", outset="Resume", outtype=typ)
----> 7 gazdoc = tgaz(doc)
8 return gazdoc

~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in call(self, doc, annset, tokentype, septype, splittype, withintype, all, skip)
697 for segment_start, segment_end in segment_offs:
698 tokens = list(anns.within(segment_start, segment_end))
--> 699 for matches in self.find_all(tokens, doc=doc):
700 for match in matches:
701 starttoken = tokens[match.start]

~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in find_all(self, tokens, doc, all, skip, fromidx, toidx, endidx, matchfunc)
617 idx = fromidx
618 while idx <= toidx:
--> 619 matches, maxlen, idx = self.find(
620 tokens,
621 doc=doc,

~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in find(self, tokens, doc, all, fromidx, toidx, endidx, matchfunc)
550 endidx = len(tokens)
551 while idx <= toidx:
--> 552 matches, long = self.match(
553 tokens, idx=idx, doc=doc, all=all, endidx=endidx, matchfunc=matchfunc
554 )

~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in match(self, tokens, doc, all, idx, endidx, matchfunc)
454 while j <= endidx:
455 if node.nodes:
--> 456 token = tokens[j]
457 if token.type == self.splittype:
458 break

IndexError: list index out of range

@johann-petrak
Copy link
Collaborator Author

@mdorkhah would you be able to (privately) share a minimal test case?

@mdorkhah
Copy link

@mdorkhah would you be able to (privately) share a minimal test case?

Sure, I just sent you an email.

johann-petrak added a commit that referenced this issue Apr 12, 2021
@johann-petrak
Copy link
Collaborator Author

johann-petrak commented Apr 12, 2021

Thanks - I was not able to get that running yet, but I think I have actually found the bug already! :)

To test this would you be able to install gatenlp from the very latest version of the github main branch?

One way to do this would be:

  • maybe create a separate environment for this and change into it
  • install gatenlp from latest github main branch: pip install -U git+https://github.com/GateNLP/python-gatenlp.git[EXTRAS] where EXTRAS is the list of extras you need also
  • NOTE: this gatenlp version requires the recent new version 3.0.4 of the GATE Python plugin for the GateWorker which should get used automatically.

@mdorkhah
Copy link

Thanks - I was not able to get that running yet, but I think I have actually found the bug already! :)

To test this would you be able to install gatenlp from the very latest version of the github main branch?

One way to do this would be:

  • maybe create a separate environment for this and change into it
  • install gatenlp from latest github main branch: pip install -U git+https://github.com/GateNLP/python-gatenlp.git[EXTRAS] where EXTRAS is the list of extras you need also
  • NOTE: this gatenlp version requires the recent new version 3.0.4 of the GATE Python plugin for the GateWorker which should get used automatically.

Works! Thank you again...

@johann-petrak
Copy link
Collaborator Author

Thanks for testing!
Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants