Skip to content
This repository has been archived by the owner on Feb 19, 2020. It is now read-only.

Remove while space only sentences in NewLineSentenceSegmenter #59

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hiroshinoji
Copy link

NewLineSentenceSegmenter did not trim each segmented sentence, so for example, it always outputted an error:

$ echo I live in Osaka . | java -Xmx4g -cp assembly.jar epic.parser.ParseText --model parsers/SpanModel-300.parser --sentences newline --tokens whitespace
(TOP (S (NP (PRP He) ) (VP (VBZ lives)  (PP (IN in)  (NP (NNP Osaka) )))))
### Could not tag Vector(), because No parse for Vector(): infinite partition... epic.parser.projections.ChartProjector$class.project(ChartProjector.scala:36);epic.parser.projections.AnchoredRuleMarginalProjector.project(EnumeratedAnchoring.scala:78)

I added an filter for empty sentences as in MLSentenceSegmenter, which avoids this by trimming every sentence. Now no error is outputted.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant