Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable RegexMatchSpan with concatenates words by sep="(separator)" option #492

Merged
merged 4 commits into from
Aug 5, 2020
Merged

Enable RegexMatchSpan with concatenates words by sep="(separator)" option #492

merged 4 commits into from
Aug 5, 2020

Conversation

YasushiMiyata
Copy link
Contributor

@YasushiMiyata YasushiMiyata commented Aug 1, 2020

Description of the problems or issues

Is your pull request related to a problem? Please describe.
A clear and concise description of what the problem is.

A sentence "123 456 789" is parsed and gets three words "123", "456", and "789".
I'd like to match a number like
RegexMatchSpan(rgx=r"\d{9}", sep=" ")

but sep=" " has no effect

Does your pull request fix any issue.
Fix #270

Description of the proposed changes

Enable RegexMatchSpan with sep="(separator)" option.
It concatenates mention spans to one word and does RgexMatch without consideration of the separator.

Test plan

Add Test Code to 'fonduer/tests/candidates/test_matchers.py'.
A sentence "This is apple" is parsed and gets 2 2-grams "This is" and "is apple".
We can get "is apple" with following rgx and sep="(space)" option:
RegexMatchSpan(rgx=r"isapple", sep=" ")

Checklist

  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • I have updated the CHANGELOG.rst accordingly.

…tion

Fix #270
Enable RegexMatchSpan with sep="(separator)" option.
It concatenates mention spans to one word and does RegexMatch without consideration of the separator.
@YasushiMiyata
Copy link
Contributor Author

Some codes may be updated while creating #492. I'm now re-checking.

@codecov-commenter
Copy link

codecov-commenter commented Aug 3, 2020

Codecov Report

Merging #492 into master will not change coverage.
The diff coverage is 71.42%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #492   +/-   ##
=======================================
  Coverage   85.85%   85.85%           
=======================================
  Files          88       88           
  Lines        4568     4568           
  Branches      851      853    +2     
=======================================
  Hits         3922     3922           
  Misses        464      464           
  Partials      182      182           
Flag Coverage Δ
#unittests 85.85% <71.42%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...fonduer/candidates/models/implicit_span_mention.py 81.96% <66.66%> (ø)
src/fonduer/candidates/models/span_mention.py 82.24% <66.66%> (ø)
src/fonduer/candidates/matchers.py 97.31% <100.00%> (ø)

@YasushiMiyata YasushiMiyata marked this pull request as ready for review August 3, 2020 21:24
@YasushiMiyata
Copy link
Contributor Author

Something failure in installation of ubuntu. There would be nothing more I can.

@senwu
Copy link
Collaborator

senwu commented Aug 5, 2020

Thanks for making this clear!

@senwu senwu merged commit 01e0d93 into HazyResearch:master Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RegexMatchSpan with sep="" concatenates words with sep="(space)"
3 participants