New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Standing PR for TensorFlow learning refactor #518

Merged

henryre merged 27 commits into dev from lstm

Jan 17, 2017

Member

henryre commented Jan 14, 2017 •

edited

Refactor of learning primitives (NoiseAwareModel)
Wrapper for TensorFlow models with automatic saving and loading utilities (TFNoiseAwareModel)
Adds LogisticRegression as a supported model (updates tutorial to reflect)
Updates to hyperparameter searchers
Adds reLSTM as a contributor model for relation extraction

henryre added 8 commits

January 10, 2017 19:11


          RE LSTM initial commit

714c7c9


          Work on LSTM

ac10271


          LSTM builder

94ae2d9


          Add LSTM preprocessor

e2a51fe


          Fix LSTM preprocessing

995c4b4


          Merge branch 'dev' into lstm

1de5d73


          Tweaks to LSTM code

58ccbe3


          Refactor discriminative model, add TF LR

7264a39

henryre requested a review from ajratner

January 14, 2017 02:09


          Actually add logistic regression

f6ad2ee

Contributor

ajratner commented Jan 14, 2017

Looking good so far... ping me on slack when ready for review!

henryre added 7 commits

January 15, 2017 15:58


          Fix saving utilities

d6376d9


          Tweaks to model graph

e148f2e


          Add some comments to learning code

4277a37


          Fix LSTM saving

ce082c1


          Add some documentation

186248f


          Merge branch 'dev' of https://github.com/HazyResearch/snorkel into lstm

45a1042


          Update parameter search and intro learning notebook

2032abb

Member Author

henryre commented Jan 17, 2017

@ajratner Ok good for a review!
cc: @stephenbach

There are two major improvement opportunities, but they can come after the release:

Sparse tensor computations for LogisticRegression
Same-session model loading for reLSTM

Contributor

ajratner commented Jan 17, 2017 via email

Ok I'll review a bit later tonight!

On Mon, Jan 16, 2017 at 6:52 PM Henry Ehrenberg ***@***.***> wrote: @ajratner <https://github.com/ajratner> Ok good for a review! cc: @stephenbach <https://github.com/stephenbach> There are two major improvement opportunities, but they can come after the release: - Sparse tensor computations for LogisticRegression - Same-session model loading for reLSTM — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#518 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABgw_aqvRZlLOPjetJG6bSYDEi8svq3Oks5rTCztgaJpZM4LjfUt> .

ajratner requested changes

View reviewed changes

Contributor

ajratner left a comment

This looks awesome. Minor comments, mostly documentation

snorkel/learning/__init__.py

+                  DEP_SIMILAR,
+                  GenerativeModel,
+                  GenerativeModelWeights,
+                  NaiveBayes,

Contributor

ajratner Jan 17, 2017

@stephenbach Should we still be supporting NaiveBayes?

snorkel/learning/disc_learning.py

    
                  def __init__(self, name):

                      self.name = name

                      super(NoiseAwareModel, self).__init__()

Contributor

ajratner Jan 17, 2017

What's the point of this line?

Member Author

henryre Jan 17, 2017

Looks cool. Probably none though...

snorkel/contrib/learning/relstm.py

+                  def __init__(self, save_file=None, name='reLSTM'):
+                      """LSTM for relation extraction"""
+                      # Define metadata
+                      self.mx_len          = None

Contributor

ajratner Jan 17, 2017

Could we get definitions for these params (esp. mx_len, n_v)

Member Author

henryre Jan 17, 2017

👍

snorkel/contrib/learning/relstm.py

+                      # Super constructor
+                      super(reLSTM, self).__init__(save_file=save_file, name=name)
+                  def _mark(self, l, h, idx):

Contributor

ajratner Jan 17, 2017

Either make input vars more verbose or explain in doc string (just being more sensitive here given past documentation issues around this module...)

Contributor

ajratner Jan 17, 2017

Same with mids below

Member Author

henryre Jan 17, 2017

👍

snorkel/contrib/learning/relstm.py

+                          # Get arguments and lemma sequence
+                          args = [
+                              (c[0].get_word_start(), c[0].get_word_end(), 1),
+                              (c[1].get_word_start(), c[1].get_word_end(), 2)

Contributor

ajratner Jan 17, 2017

Marker indices should be consistent with doc string above (1,2 vs. 0,1)

Member Author

henryre Jan 17, 2017

🤓

snorkel/contrib/learning/relstm.py

+                              (c[1].get_word_start(), c[1].get_word_end(), 2)
+                          ]
+                          s = self._mark_sentence(
+                              [w.lower() for w in c.get_parent().lemmas], args

Contributor

ajratner Jan 17, 2017

Easy addition to make the sentence token type configurable here?

Member Author

henryre Jan 17, 2017

👍

snorkel/contrib/learning/relstm.py

+                      tx = np.zeros((self.mx_len, batch_size), dtype=np.int32)
+                      tlen = np.zeros(batch_size, dtype=np.int32)
+                      # Pad or trim each x
+                      # TODO: fix for arguments outside max length

Contributor

ajratner Jan 17, 2017

Either fix or explain what's going on here / what is currently implemented pre-fix

Member Author

henryre Jan 17, 2017

Will add doc

snorkel/contrib/learning/relstm.py

+                              x.name: x for x in tf.get_collection(
+                              tf.GraphKeys.GLOBAL_VARIABLES, scope=scope.name)
+                          }
+                      # Take mean across sentences

Contributor

ajratner Jan 17, 2017

Is it standard to take mean across sentences here?

Member Author

henryre Jan 17, 2017

Yeah, simplest sentence embedding technique. Can improve down the road, but easy enough for now.

snorkel/learning/disc_learning.py

		@@ -42,200 +37,85 @@ def score(self, session, X_test, test_labels, gold_candidate_set=None, b=0.5, se
		return s.score(test_marginals, train_marginals, b=b,

Contributor

ajratner Jan 17, 2017

Do we ever set self.X_train any more (see lines before, and again comes up below)? If not we should get rid of this, and get the training marginals in a different way, or just not display them here?

Contributor

ajratner Jan 17, 2017

I'm fine just getting rid of this, whatever is fast & cleans this loose thread up

Member Author

henryre Jan 17, 2017

I'll get rid of this. Never really display them in practice anyway.

tutorials/intro/Intro_Tutorial_4.ipynb

    
                  "from snorkel.learning import LogReg\n",

                  "disc_model = LogReg()"

                  "from snorkel.learning import LogisticRegression\n",

                  "disc_model = LogisticRegression()"

Contributor

ajratner Jan 17, 2017

Let's add a comment in the tutorial just mentioning the reLSTM?

Member Author

henryre Jan 17, 2017

Let's add it to another tutorial. LSTM will likely perform much worse on the intro tutorial than LR.

Contributor

ajratner Jan 17, 2017

Ok if that's the case makes sense

henryre added 2 commits

January 16, 2017 23:36


          PR changes

cb67d47


          Add TF to pkg requirements

7cb072e

Member Author

henryre commented Jan 17, 2017

@ajratner Changes made, testing out now

henryre added 5 commits

January 16, 2017 23:55


          Missing comma

2dcefd9


          Move training index getter to utils

07f73db


          Remove model_name arg


          Fix tutorial text


          Merge branch 'annotator-fix' of https://github.com/HazyResearch/snorkel…

c8d6d4c

… into lstm

ajratner approved these changes

View reviewed changes

henryre added 3 commits

January 17, 2017 10:50


          trusty build for GLIBC

20a0164


          Update last tutorial notebook

f00c99e


          apply_existing hot fix

22cfd8f

coveralls commented Jan 17, 2017

Coverage increased (+5.8%) to 51.943% when pulling 22cfd8f on lstm into c573fd8 on dev.


          Merge branch 'dev' into lstm

c28502f

coveralls commented Jan 17, 2017

Coverage increased (+5.8%) to 51.92% when pulling c28502f on lstm into cb256ca on dev.

Member Author

henryre commented Jan 17, 2017

No even one invited you, Coveralls #yolo

henryre merged commit 465a3f3 into dev

henryre deleted the lstm branch

January 17, 2017 20:03

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment