Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DVector and dependencies #136

Closed
cowchipkid opened this issue Jul 21, 2016 · 29 comments
Closed

DVector and dependencies #136

cowchipkid opened this issue Jul 21, 2016 · 29 comments
Assignees

Comments

@cowchipkid
Copy link
Contributor

The pom.xml for ner specifies a dependency on LBJava 1.2.14, but that version contains DVector, which in fact is now moved to core-utilities. Should that be changed to version 1.2.24? Seems the learners in 1.2.14 will be using the version in that jar rather than the one in core-utilities.

@mssammon
Copy link
Contributor

I assume this will also be true for illinois-pos and illinois-chunker. It makes sense to change the lbjava dependency. Will this mean retraining each component and deploying new models?

@cowchipkid
Copy link
Contributor Author

I would strongly recommend it, there may be different parameters includes over the course of 10 revisions, not to mention the serialization issues that might crop up.

@hhuang97 hhuang97 self-assigned this Sep 9, 2016
@b29308188 b29308188 self-assigned this Sep 9, 2016
@qiangning qiangning self-assigned this Sep 9, 2016
@qiangning
Copy link
Member

Hi @mssammon , I notice that the LBJava versions in pom.xml files are already 1.2.24 (did someone change this from 1.2.14 to 1.2.24?), so we only need to test it without further changes to the pom files, right? Also, since this is my first time doing this, how would I know if the chunker is working properly?

@cogcomp-dev
Copy link

@qiangning you need to run the benchmark script under chunk/scripts/ and check that the performance is in the ballpark of that reported in the relevant publication. Please record the results in a new page linked from here: https://wiki.illinois.edu/wiki/display/ccg/CCG+Software+Information (and also the results reported in the original publication).

@cogcomp-dev
Copy link

@b29308188 , please do the same. @hhuang97 , you just need to compare against the existing NER Benchmark table at the link mentioned above.

@cogcomp-dev cogcomp-dev added this to the CCG Borg Bonanza milestone Sep 12, 2016
@qiangning
Copy link
Member

Hi @mssammon @danyaljj , do you know why in L56 of ChunkTester.java, testFileURL is returned null even if the test file exists?

@danyaljj
Copy link
Member

Maybe it's not in the classpath? Where do you put the file?

Side note: http://stackoverflow.com/questions/23821235/how-to-link-to-specific-line-number-on-github

@qiangning
Copy link
Member

Thanks, Daniel. The test file is here. Does this look correct to you?

@danyaljj
Copy link
Member

Weight did you just pointed out to this line first; right?

I don't know how used to work; as far as my knowledge goes, getClassLoader().getResource(.) can only read from classpath (not from anywhere on the disk). We should double-check this with @nitishgupta tho, since he seems to be the author/user of this configurator.

Side note: your link to the line looks great! 😍

@qiangning
Copy link
Member

I first mentioned that this line returned null. The error came from a test file which is specified here.

Yes. I agree with you about getResource; that's also why I am not using my own script for test instead of using ChunkerTester.

Also, it seems that the test script in chunker/scripts/ are quite out-dated. Do you @nitishgupta think it better if I update them, or do you already have your own plan of doing so?

@qiangning
Copy link
Member

Hi Daniel @danyaljj , are we going to provide the test file along with our package/jar? I'm not sure if that's allowed; but if not, I guess the BenchmarkTest.sh script makes less sense for general users since what a general user needs in the first place is to hit the button and see the results.

@danyaljj
Copy link
Member

As far as I know, upon packaging ALL the files get packaged into a single jar file to be shipped, I think. @mssammon can confirm this.

@mssammon
Copy link
Contributor

No corpus gets packaged. Benchmark tests generally use licensed data; documentation should indicate the variable/argument that needs to be changed and which corpus is required.

@hhuang97
Copy link

@cogcomp-dev I've added a row for NER v3.0.72 with benchmark results. The results are in the ballpark of the previously reported ones.

@qiangning
Copy link
Member

@cogcomp-dev The chunker's also passed the test wiki page.

@mssammon
Copy link
Contributor

@mssammon check and close

@b29308188
Copy link
Contributor

b29308188 commented Sep 22, 2016

@qiangning I also encounter the null pointer problem in this line
How did you solve it? It seems that the variable testFIleName already stores the right absolute path for the test file. ("/shared/corpora/corporaWeb/written/eng/chunking/conll2000distributions/test.txt")

@qiangning
Copy link
Member

Hi Liang-Wei @b29308188 , I didn't fix that problem in ChunkTester. Instead, I created my own tester to do testing. I think the problem of fixing ChunkTester deserves a new issue and probably I will discuss this with Mark in the next software meeting.

Are you also testing chunker?

@mssammon
Copy link
Contributor

@b29308188 did you retrain and evaluate POS with the updated LBJava dependency yet?

@b29308188
Copy link
Contributor

@mssammon I retrain and evaluate it with 1.2.24 LBJava.
The results are are there

@mssammon
Copy link
Contributor

@b29308188 that link takes me to the Chunker results; you were assigned the POS tagger. Is there a separate table with results for POS?

@b29308188
Copy link
Contributor

@mssammon I though your ”please do the same“ means to also test the chuncher. There’s no clear description that indicates it. The description in the wiki says ”retrain and check performance of POS***”. I thought it means ”check the performance of POS dataset with the chuncher” ... I will test the POS tagger today or tomorrow.

@mssammon
Copy link
Contributor

@b29308188 sorry for the confusion. Thanks for the follow-up. Please add comments to the page with the results to clarify what you did; this information may be useful in its own right.

@b29308188
Copy link
Contributor

b29308188 commented Sep 28, 2016

@mssammon
The evaluation result of the POS tagger is here.
Sorry for the delay.

@qiangning
Copy link
Member

Hi Mark @mssammon , just curious. Why is the chunker performance table from Liang-Wei different from mine? Which part of the training process gives rise to this randomness?

@mssammon
Copy link
Contributor

@qiangning @b29308188 what exact command/script did you use to train/evaluate? Some variation is expected if the LBJ internally shuffles the training data, but this seems like a significant difference. (Qiang, thanks for pointing this out....)

@qiangning
Copy link
Member

Mark, I used ChunkerTrain and ChunkTester in my fork. I only modified a bit of the script to fix the path issue (i.e., null pointer to files).

Thanks for clarifying the randomness of LBJ. I thought Liang-Wei's results were in the ballpark of mine (93.862 vs 93.451). Are you saying that this difference is too much?

@b29308188
Copy link
Contributor

@mssammon I modified the tester from @qiangning by pointing the training and testing data to /shared/corpora/corporaWeb/written/eng/chunking/conll2000distributions/ and storing the models into my folder.

@mssammon
Copy link
Contributor

opened issue #222 to deal with the problems identified here. Closing this issue as the original task is complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants