Fix sentence splitter: sentences ending with acronyms #24

nickto · 2019-05-10T13:51:36Z

A sentence ending with an acronym was dot distinguished from an initial
followed by a period in the middle of the sentence. E.g., "Adamson is not from USA.
They are from Europe" was considered as a single sentence because "USA." was
treated as an initial.

Added a requirement for the initial to be only one letter long: "A." in "A.
Adamson" is still treated as an initial, while "USA" in "from USA. They" is not.
Added a test for such case.

Sentence ending with an acronym was dot distinguisged from an initial followed by a period in the middle of the sentence. Added a requirement for the initial to be only one letter long, and added a test for such a case.

codecov-io · 2019-05-10T13:59:57Z

Codecov Report

Merging #24 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master      #24   +/-   ##
=======================================
  Coverage   80.86%   80.86%           
=======================================
  Files           9        9           
  Lines         277      277           
=======================================
  Hits          224      224           
  Misses         53       53

Impacted Files	Coverage Δ
src/sentences/sentence_splitting.jl	`88.13% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d877dc...e3f8d93. Read the comment docs.

oxinabox · 2019-05-10T14:25:19Z

Good catch, do you think it would be better use use \b rather than \s ?
\b is word-boundry, where as \s is white-space.
I think \b would work better if at the start of the string,

Incorporated the following suggestion and added tests: JuliaText#24 (comment) "This is A. A. Adamson." should be a single sentence, but older regex would split it into mutiple because it required a space before the second A.

nickto · 2019-05-13T07:53:55Z

Good catch, do you think it would be better use use \b rather than \s ?
\b is word-boundry, where as \s is white-space.
I think \b would work better if at the start of the string,

Changed it to \b and added a test that fails when using \s, thanks!

oxinabox · 2019-05-13T07:54:25Z

LGTM, will merge when tests pass

oxinabox · 2019-06-21T10:34:30Z

thank!

Fix sentence splitter: sentences ending with acronyms

54d86df

Sentence ending with an acronym was dot distinguisged from an initial followed by a period in the middle of the sentence. Added a requirement for the initial to be only one letter long, and added a test for such a case.

Fix the double initials sentence splitting

e3f8d93

Incorporated the following suggestion and added tests: JuliaText#24 (comment) "This is A. A. Adamson." should be a single sentence, but older regex would split it into mutiple because it required a space before the second A.

oxinabox merged commit c216aa9 into JuliaText:master Jun 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix sentence splitter: sentences ending with acronyms #24

Fix sentence splitter: sentences ending with acronyms #24

nickto commented May 10, 2019

codecov-io commented May 10, 2019 •

edited

Loading

oxinabox commented May 10, 2019

nickto commented May 13, 2019

oxinabox commented May 13, 2019

oxinabox commented Jun 21, 2019

Fix sentence splitter: sentences ending with acronyms #24

Fix sentence splitter: sentences ending with acronyms #24

Conversation

nickto commented May 10, 2019

codecov-io commented May 10, 2019 • edited Loading

Codecov Report

oxinabox commented May 10, 2019

nickto commented May 13, 2019

oxinabox commented May 13, 2019

oxinabox commented Jun 21, 2019

codecov-io commented May 10, 2019 •

edited

Loading