-
Notifications
You must be signed in to change notification settings - Fork 489
OPENNLP-1319: The Training API code is outdated in Manual #386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@Alanscut Thanks for the pull request! |
kinow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloned the repository from GitHub, using the master branch. Then created an extra Maven project, and added the snapshot as dependency. After everything was built, started testing each code changed in this PR.
Found a few lines that didn't compile. Tried setting JVM to 8+, but I think the issues are legit. Will wait for @Alanscut 's feedback, but other than that, looks like a good improvement.
Thanks @Alanscut !
opennlp-docs/src/docbkx/doccat.xml
Outdated
| InputStreamFactory dataIn = null; | ||
| try (dataIn = new FileInputStream("en-sentiment.train")) { | ||
| try (dataIn = new MarkableFileInputStreamFactory(new File("en-sentiment.train"))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't compile for me. I tried to move the declaration to a single statement, but that also didn't work. It complained that InputStreamFactory is not a closeable? Did you try this code @Alanscut ? I think we need some sort of input stream here, not the factory being open/closed?
| FileInputStream sampleDataIn = new FileInputStream("en-ner-person.train"); | ||
| ObjectStream<NameSample> sampleStream = new PlainTextByLineStream(sampleDataIn.getChannel(), StandardCharsets.UTF_8); | ||
| InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("en-ner-person.train")); | ||
| ObjectStream<NameSample> sampleStream = new PlainTextByLineStream(dataIn, StandardCharsets.UTF_8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think PlainTextByLineStream is (at least on master) an ObjectStream<String>. So the code with NameSample as generic type doesn't compile?
| POSModel model = null; | ||
| try (InputStream dataIn = new FileInputStream("en-pos.train")){ | ||
| try (InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("en-pos.train"))){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InputStreamFactory is not autocloseable?
|
Sorry I didn't notice that is try-with-resources statement insteadof try-catch block, update now |
kinow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Alanscut thanks for updating the code! There's only one last thing, but I might be nit-picking. Now the factories are created as expected, but with the variable name dataIn. The constructor of the classes that take a factory are clear with the variable name (e.g. inputStreamFactory). Perhaps we should rename the variable in the examples too, but happy with the changes as-is too.
Thank you for contributing to Apache OpenNLP.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.