Training Fix #11

kylase · 2018-11-12T16:00:03Z

Fixed Function uses old way to load word vectors #9
Populate missingtags for training

cmkumar87 · 2018-11-12T17:02:38Z

Hi Yuan Chuan “Populate missingtags for training” Are you saying that in the original implementation we missed to add some of the parscit tags that is there in the training data? Muthu

On Tue, Nov 13, 2018 at 12:00 AM Yuan Chuan Kee ***@***.***> wrote: - Fixed #9 <#9> - Populate missingtags for training ------------------------------ You can view, comment on, or merge this pull request online at: #11 Commit Summary - #9 Deprecate load_word2vec_format - Use inputs instead of input as input is a keyword - Use print function - import print_function - Populate tags File Changes - *M* loader.py <https://github.com/WING-NUS/Neural-ParsCit/pull/11/files#diff-0> (19) - *M* train.py <https://github.com/WING-NUS/Neural-ParsCit/pull/11/files#diff-1> (18) - *M* utils.py <https://github.com/WING-NUS/Neural-ParsCit/pull/11/files#diff-2> (27) Patch Links: - https://github.com/WING-NUS/Neural-ParsCit/pull/11.patch - https://github.com/WING-NUS/Neural-ParsCit/pull/11.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#11>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABmJzDK6XP4C0cInyaMFO0Atil2RSAaIks5uuZsEgaJpZM4YZ8Ib> .

-- Cheers! Muthu

kylase · 2018-11-12T17:06:18Z

@cmkumar87, not in that aspect. The upstream code has tags as a key for each piece of data, however somehow it has been missing.

cmkumar87 · 2018-11-12T20:27:14Z

@kylase I saw that the change adds tag to id to the load dataset module. This change seems sensitive to me; perhaps @animeshprasad can weigh in. Was this a bug or just an enhancement to make training easier in some way?

Did yo try retraining with the new code on the cora dataset atleast? Do you get the same published results?

kylase · 2018-11-13T00:32:00Z

It is a critical bug. If you compare that specific file and line to the original code, you will find that the labels (tags) are not provided to the training.

I don’t know how it managed to run previously, but the git log shows it is non-existent before I take over it.

cmkumar87 · 2018-11-13T11:59:17Z

@kylase that is weird. The file from the first modifications of the Named Entity Tagger contains the tagging scheme. See this commit: 590de7c

But I dont' see it in the WING_NUS/Neural-ParsCit. So @animeshprasad may have changed it for some reason. Perhaps the functionality was moved to a different function? Can you trace the commits to check for the modifications to this file?

This is some of the history I see,
Original NER code: https://github.com/glample/tagger/blob/master/loader.py
tags are there.
In this commit where a completely new version of the file is being uploaded it is not there:
e659829

kylase · 2018-11-13T12:04:45Z

@cmkumar87, refer to this blame and look for prepare_dataset. This is the commit that removes it.

Yes, I looked at the original code and then realised that it's missing and hence I put it back. The training has been failing because it becomes an unsupervised dataset.

cmkumar87 · 2018-11-13T15:52:57Z

Hi Yuan Chuan Yes, I see in one of the commits the tagids line is being removed from prepare dataset method. I am checking with you again because what you are saying pretty damning. With an error like that, the training file won't have labels and parser would not learn anything! More likely, the objective function that calculates loss would through an error since it's not able to see the target/label. Yet, we have state of the art results with a lot experiments run over the dataset! So, probably the functionality was moved elsewhere. Cheers! Muthu

…

On Tue, 13 Nov 2018 at 20:04, Yuan Chuan Kee ***@***.***> wrote: @cmkumar87 <https://github.com/cmkumar87>, refer to this blame <https://github.com/WING-NUS/Neural-ParsCit/blame/bb9b9a002582a6619e8fb6f14956e12fddc19608/loader.py> and look for prepare_dataset. This is the commit that removes it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABmJzL6qjLCPoE22aHmj9ebMBXtLXf97ks5uurVdgaJpZM4YZ8Ib> .

kylase · 2018-11-14T07:44:04Z

I have no idea that happened between the code that was run for the paper and the commit. I have been looking at the training code and it seems to have been commented out components other than the training with the training dataset.

Now I am working on restoring the testing portion then follow by the cross-validation.

cmkumar87 · 2018-11-14T15:45:47Z

@kylase Normally the training code and testing code is factored and are executed conditionally based on a command line args passed.
Are you saying that this is not how n-parscit is written? Did the developer remove / comment out the training code to allow the parser to operate under the test mode?

knmnyn · 2018-12-15T18:40:09Z

Has this issue been resolved? Seems from the comments that it is still unresolved.

cmkumar87 · 2018-12-15T19:52:46Z

Update on this:

This codebase was forked from a version that is different from the one that was used to run the experiments for Prasad, Kaur, Kan, 2018, IJDL.
The correct version in question was identified to be on a local server. @kylase is trying to run this; he has been facing out-of-memory errors on the wing internal gpu server possibly due to 'some' theano misconfiguration. Will let him say more.

Co-Authored-By: nsorros <nsorros@gmail.com>

kylase added 6 commits November 12, 2018 23:38

#9 Deprecate load_word2vec_format

26662dc

Use inputs instead of input as input is a keyword

3440a5c

Use print function

0e52c4e

import print_function

3bbdaeb

Populate tags

3143788

Fixed Travis testing issue

2694281

Python 2/3 compatibility

e395d20

Co-Authored-By: nsorros <nsorros@gmail.com>

kylase mentioned this pull request Dec 16, 2018

Add compatibility for python3 #12

Closed

kylase added 8 commits December 18, 2018 15:30

Add Python 3.6 for tests.

21c4c24

Upgrade sklearn

316d6f2

Compile regex to improve performance

25ee8c0

Use sklearn.metrics to compute model performance

162e4d8

Restore the code for validation and test sets

b778cb7

Upgrade libraries and include sklearn in test

7e340f1

Updated bibilograph information and Python 3 warnings.

f3948b7

Removed Python 3.6

44be70a

kylase added the bug Something isn't working label Dec 20, 2018

kylase self-assigned this Dec 20, 2018

Added scikit-learn requirement for training.

07f3070

kylase merged commit 188721d into master Dec 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Fix #11

Training Fix #11

kylase commented Nov 12, 2018

cmkumar87 commented Nov 12, 2018 via email

kylase commented Nov 12, 2018

cmkumar87 commented Nov 12, 2018 •

edited

kylase commented Nov 13, 2018 •

edited

cmkumar87 commented Nov 13, 2018 •

edited

kylase commented Nov 13, 2018 •

edited

cmkumar87 commented Nov 13, 2018 via email

kylase commented Nov 14, 2018

cmkumar87 commented Nov 14, 2018

knmnyn commented Dec 15, 2018

cmkumar87 commented Dec 15, 2018 •

edited

Training Fix #11

Training Fix #11

Conversation

kylase commented Nov 12, 2018

cmkumar87 commented Nov 12, 2018 via email

kylase commented Nov 12, 2018

cmkumar87 commented Nov 12, 2018 • edited

kylase commented Nov 13, 2018 • edited

cmkumar87 commented Nov 13, 2018 • edited

kylase commented Nov 13, 2018 • edited

cmkumar87 commented Nov 13, 2018 via email

kylase commented Nov 14, 2018

cmkumar87 commented Nov 14, 2018

knmnyn commented Dec 15, 2018

cmkumar87 commented Dec 15, 2018 • edited

cmkumar87 commented Nov 12, 2018 •

edited

kylase commented Nov 13, 2018 •

edited

cmkumar87 commented Nov 13, 2018 •

edited

kylase commented Nov 13, 2018 •

edited

cmkumar87 commented Dec 15, 2018 •

edited