Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_horz_ngrams works only when mention is_tabular #425

Closed
HiromuHota opened this issue May 21, 2020 · 1 comment · Fixed by #426
Closed

get_horz_ngrams works only when mention is_tabular #425

HiromuHota opened this issue May 21, 2020 · 1 comment · Fixed by #426
Assignees

Comments

@HiromuHota
Copy link
Contributor

Describe the bug

I have a span mention, mention, that is visual but is not tabular.
I tried to get all horizontally aligned ngrams from the same sentence (ie get_horz_ngrams(mention, from_sentence=False)), but I got none because it is not tabular.

To Reproduce
Steps to reproduce the behavior:

>>> mention = session.query(Mention).all()[0]
>>> print(mention.context.get_span())
Obama
>>> print(mention.context.sentence.text)
Obama was born in Honolulu, Hawaii.
>>> print(mention.context.sentence.is_visual())
True
>>> print(mention.context.sentence.is_tabular())
False
>>> from fonduer.utils.data_model_utils.visual import get_horz_ngrams
>>> print(list(get_horz_ngrams(mention, from_sentence=False)))
[]

Expected behavior

Since all the other 1-gram in the same sentence is horizontally aligned with the mention, I want to get all those.

>>> print(list(get_horz_ngrams(mention, from_sentence=False)))
["was", "born", "in", "Honolulu", ",", "Hawaii", "."]

Error Logs/Screenshots
If applicable, add error logs or screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: N/A
  • PostgreSQL Version: N/A
  • Poppler Utils Version: N/A
  • Fonduer Version: 0.8.2

Additional context
Add any other context about the problem here.

In addition to the above issue, I feel from_sentence is a bit confusing.

@HiromuHota
Copy link
Contributor Author

Oh, I just found a TODO comment

def _get_direction_ngrams(
direction: str,
c: Union[Candidate, Mention, TemporarySpanMention],
attrib: str,
n_min: int,
n_max: int,
lower: bool,
from_sentence: bool,
) -> Iterator[str]:
# TODO: this currently looks only in current table;
# precompute over the whole document/page instead

HiromuHota pushed a commit to HiromuHota/fonduer that referenced this issue May 21, 2020
@HiromuHota HiromuHota mentioned this issue May 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant