New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DVector and dependencies #136
Comments
I assume this will also be true for illinois-pos and illinois-chunker. It makes sense to change the lbjava dependency. Will this mean retraining each component and deploying new models? |
I would strongly recommend it, there may be different parameters includes over the course of 10 revisions, not to mention the serialization issues that might crop up. |
Hi @mssammon , I notice that the LBJava versions in |
@qiangning you need to run the benchmark script under chunk/scripts/ and check that the performance is in the ballpark of that reported in the relevant publication. Please record the results in a new page linked from here: https://wiki.illinois.edu/wiki/display/ccg/CCG+Software+Information (and also the results reported in the original publication). |
@b29308188 , please do the same. @hhuang97 , you just need to compare against the existing NER Benchmark table at the link mentioned above. |
Maybe it's not in the classpath? Where do you put the file? Side note: http://stackoverflow.com/questions/23821235/how-to-link-to-specific-line-number-on-github |
Thanks, Daniel. The test file is here. Does this look correct to you? |
Weight did you just pointed out to this line first; right? I don't know how used to work; as far as my knowledge goes, Side note: your link to the line looks great! 😍 |
I first mentioned that this line returned Yes. I agree with you about Also, it seems that the test script in |
Hi Daniel @danyaljj , are we going to provide the test file along with our package/jar? I'm not sure if that's allowed; but if not, I guess the BenchmarkTest.sh script makes less sense for general users since what a general user needs in the first place is to hit the button and see the results. |
As far as I know, upon packaging ALL the files get packaged into a single jar file to be shipped, I think. @mssammon can confirm this. |
No corpus gets packaged. Benchmark tests generally use licensed data; documentation should indicate the variable/argument that needs to be changed and which corpus is required. |
@cogcomp-dev I've added a row for NER v3.0.72 with benchmark results. The results are in the ballpark of the previously reported ones. |
@cogcomp-dev The chunker's also passed the test wiki page. |
@mssammon check and close |
@qiangning I also encounter the null pointer problem in this line |
Hi Liang-Wei @b29308188 , I didn't fix that problem in ChunkTester. Instead, I created my own tester to do testing. I think the problem of fixing ChunkTester deserves a new issue and probably I will discuss this with Mark in the next software meeting. Are you also testing chunker? |
@b29308188 did you retrain and evaluate POS with the updated LBJava dependency yet? |
@b29308188 that link takes me to the Chunker results; you were assigned the POS tagger. Is there a separate table with results for POS? |
@mssammon I though your ”please do the same“ means to also test the chuncher. There’s no clear description that indicates it. The description in the wiki says ”retrain and check performance of POS***”. I thought it means ”check the performance of POS dataset with the chuncher” ... I will test the POS tagger today or tomorrow. |
@b29308188 sorry for the confusion. Thanks for the follow-up. Please add comments to the page with the results to clarify what you did; this information may be useful in its own right. |
Hi Mark @mssammon , just curious. Why is the chunker performance table from Liang-Wei different from mine? Which part of the training process gives rise to this randomness? |
@qiangning @b29308188 what exact command/script did you use to train/evaluate? Some variation is expected if the LBJ internally shuffles the training data, but this seems like a significant difference. (Qiang, thanks for pointing this out....) |
Mark, I used ChunkerTrain and ChunkTester in my fork. I only modified a bit of the script to fix the path issue (i.e., null pointer to files). Thanks for clarifying the randomness of LBJ. I thought Liang-Wei's results were in the ballpark of mine (93.862 vs 93.451). Are you saying that this difference is too much? |
@mssammon I modified the tester from @qiangning by pointing the training and testing data to /shared/corpora/corporaWeb/written/eng/chunking/conll2000distributions/ and storing the models into my folder. |
opened issue #222 to deal with the problems identified here. Closing this issue as the original task is complete. |
The pom.xml for ner specifies a dependency on LBJava 1.2.14, but that version contains DVector, which in fact is now moved to core-utilities. Should that be changed to version 1.2.24? Seems the learners in 1.2.14 will be using the version in that jar rather than the one in core-utilities.
The text was updated successfully, but these errors were encountered: