Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use wordsplit with taggers #1981

Merged
merged 6 commits into from Oct 30, 2018

Conversation

@WrRan
Copy link
Contributor

commented Oct 27, 2018

Fixes #1929.

This PR is same as #1935.

I have deleted the original repo, which makes #1935 from unknown repository.
And I do not how to solve this problem.
So I create this PR.

@WrRan WrRan referenced this pull request Oct 27, 2018
@WrRan

This comment has been minimized.

Copy link
Contributor Author

commented Oct 27, 2018

Though PR passes all test-cases, yet I feel uneasy to modify previous test cases related to this issue.

@matt-gardner
Copy link
Member

left a comment

Thanks, this looks good, but there are some things that could be better.

@@ -23,7 +23,7 @@ class NerTagIndexer(TokenIndexer[int]):
We will use this namespace in the :class:`Vocabulary` to map strings to indices.
"""
# pylint: disable=no-self-use
def __init__(self, namespace: str = 'ner_tags') -> None:
def __init__(self, namespace: str = 'ner_tag') -> None:

This comment has been minimized.

Copy link
@matt-gardner

matt-gardner Oct 27, 2018

Member

What if we make this ner_tokens? That seems like a better name.

@@ -26,7 +26,7 @@ class PosTagIndexer(TokenIndexer[int]):
If ``True``, we will use coarse POS tags instead of the default fine-grained POS tags.
"""
# pylint: disable=no-self-use
def __init__(self, namespace: str = 'pos_tags', coarse_tags: bool = False) -> None:
def __init__(self, namespace: str = 'pos_tag', coarse_tags: bool = False) -> None:

This comment has been minimized.

Copy link
@matt-gardner

matt-gardner Oct 27, 2018

Member

Same here - pos_tokens makes it more clear what's going on here. We're using POS tags as part of our token representation, instead of our tags / labels.

This comment has been minimized.

Copy link
@WrRan

WrRan Oct 29, 2018

Author Contributor

You are right.

indexer.count_vocab_items(token, counter)
vocab = Vocabulary(counter)
# should raise no exception
indexer.tokens_to_indices(tokens, vocab, index_name="ner")

This comment has been minimized.

Copy link
@matt-gardner

matt-gardner Oct 27, 2018

Member

It's not obvious in this test that the NER tags are blank. Can you assert something about that in here? Or you could just construct Token objects yourself that have some NER tags with values, and some with blanks. Also, it'd be good to have an actual assert on the result of this line, that you're getting the value for "NONE" where you expect.

indexer.count_vocab_items(token, counter)
vocab = Vocabulary(counter)
# should raise no exception
indexer.tokens_to_indices(tokens, vocab, index_name="pos")

This comment has been minimized.

Copy link
@matt-gardner

matt-gardner Oct 27, 2018

Member

This one doesn't actually test a blank pos tag, does it? Maybe add a blank pos tag, and check the result?

WrRan added 2 commits Oct 29, 2018
Update pos_tag_indexer_test.py
add a trailing newline
@schmmd

This comment has been minimized.

Copy link
Member

commented Oct 30, 2018

@matt-gardner approved?

@matt-gardner
Copy link
Member

left a comment

Thanks @WrRan!

@matt-gardner matt-gardner merged commit 021f8bb into allenai:master Oct 30, 2018

3 checks passed

Pull Requests (AllenNLP Library) TeamCity build finished
Details
codecov/patch 100% of diff hit (target 90%)
Details
codecov/project 92% (+<1%) compared to 5aa1c8f
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.