New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standing PR for TensorFlow learning refactor #518
Conversation
Looking good so far... ping me on slack when ready for review! |
@ajratner Ok good for a review! There are two major improvement opportunities, but they can come after the release:
|
Ok I'll review a bit later tonight!
…On Mon, Jan 16, 2017 at 6:52 PM Henry Ehrenberg ***@***.***> wrote:
@ajratner <https://github.com/ajratner> Ok good for a review!
cc: @stephenbach <https://github.com/stephenbach>
There are two major improvement opportunities, but they can come after the
release:
- Sparse tensor computations for LogisticRegression
- Same-session model loading for reLSTM
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#518 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABgw_aqvRZlLOPjetJG6bSYDEi8svq3Oks5rTCztgaJpZM4LjfUt>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome. Minor comments, mostly documentation
DEP_SIMILAR, | ||
GenerativeModel, | ||
GenerativeModelWeights, | ||
NaiveBayes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenbach Should we still be supporting NaiveBayes
?
|
||
def __init__(self, name): | ||
self.name = name | ||
super(NoiseAwareModel, self).__init__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks cool. Probably none though...
def __init__(self, save_file=None, name='reLSTM'): | ||
"""LSTM for relation extraction""" | ||
# Define metadata | ||
self.mx_len = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get definitions for these params (esp. mx_len
, n_v
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
# Super constructor | ||
super(reLSTM, self).__init__(save_file=save_file, name=name) | ||
|
||
def _mark(self, l, h, idx): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either make input vars more verbose or explain in doc string (just being more sensitive here given past documentation issues around this module...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with mids
below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
# Get arguments and lemma sequence | ||
args = [ | ||
(c[0].get_word_start(), c[0].get_word_end(), 1), | ||
(c[1].get_word_start(), c[1].get_word_end(), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marker indices should be consistent with doc string above (1,2 vs. 0,1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤓
(c[1].get_word_start(), c[1].get_word_end(), 2) | ||
] | ||
s = self._mark_sentence( | ||
[w.lower() for w in c.get_parent().lemmas], args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easy addition to make the sentence token type configurable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
tx = np.zeros((self.mx_len, batch_size), dtype=np.int32) | ||
tlen = np.zeros(batch_size, dtype=np.int32) | ||
# Pad or trim each x | ||
# TODO: fix for arguments outside max length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either fix or explain what's going on here / what is currently implemented pre-fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add doc
x.name: x for x in tf.get_collection( | ||
tf.GraphKeys.GLOBAL_VARIABLES, scope=scope.name) | ||
} | ||
# Take mean across sentences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it standard to take mean across sentences here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, simplest sentence embedding technique. Can improve down the road, but easy enough for now.
@@ -42,200 +37,85 @@ def score(self, session, X_test, test_labels, gold_candidate_set=None, b=0.5, se | |||
return s.score(test_marginals, train_marginals, b=b, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ever set self.X_train
any more (see lines before, and again comes up below)? If not we should get rid of this, and get the training marginals in a different way, or just not display them here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine just getting rid of this, whatever is fast & cleans this loose thread up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll get rid of this. Never really display them in practice anyway.
"from snorkel.learning import LogReg\n", | ||
"disc_model = LogReg()" | ||
"from snorkel.learning import LogisticRegression\n", | ||
"disc_model = LogisticRegression()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment in the tutorial just mentioning the reLSTM
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add it to another tutorial. LSTM will likely perform much worse on the intro tutorial than LR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok if that's the case makes sense
@ajratner Changes made, testing out now |
No even one invited you, Coveralls #yolo |
NoiseAwareModel
)TFNoiseAwareModel
)LogisticRegression
as a supported model (updates tutorial to reflect)reLSTM
as a contributor model for relation extraction