[MRG] Poincare Model implementation #1696

jayantj · 2017-11-06T21:39:41Z

Pure Python implementation of the Poincare model from [1].

TODO -

Unit tests
API conformity
More logging

Follow up PR: #1700

[1] Poincaré Embeddings for Learning Hierarchical Representations

…keys

janpom · 2017-11-13T12:57:00Z

gensim/models/poincare.py

+            Whether the input array contains any duplicates.
+
+        """
+        seen = set()


return len(array) != len(set(array)) simpler. Probably not even worth adding a method.

janpom · 2017-11-13T13:01:17Z

gensim/models/poincare.py

+
+        Parameters
+        ----------
+        train_data : iterable of (str, str)


str is ambiguous for Python 2 vs 3. Better to say unicode or bytes instead.

Both unicode and bytes are allowed here. Wherever this is true, I've used str, wherever a specific type is required returned, I've used unicode/bytes. Does that sound okay?

Yes, perfect. Are you sure it works correctly with bytes, though? I suppose if we train on bytes we'll end up with a bytes based model. I wonder if that's common for other gensim models. Won't that cause unexpected behavior with some KeyedVectors calls?

janpom · 2017-11-13T13:05:19Z

gensim/models/poincare.py

+        node_relations = defaultdict(set)  # Mapping from node index to its related node indices
+
+        logger.info("Loading relations from train data..")
+        for hypernym_pair in self.train_data:


Please rename hypernym_pair to something more generic such as relation.

janpom · 2017-11-13T13:05:39Z

gensim/models/poincare.py

+            Vectors of all nodes `u` in the batch.
+            Expected shape (batch_size, dim).
+        vectors_v : numpy.array
+            Vectors of all hypernym nodes `v` and negatively sampled nodes `v'`,


just "nodes"

janpom · 2017-11-13T13:06:13Z

gensim/models/poincare.py

+
+
+class PoincareRelations(object):
+    """Class to stream hypernym relations for `PoincareModel` from a tsv-like file."""


just "relations", here and elsewhere

janpom · 2017-11-13T13:06:30Z

gensim/models/poincare.py

+    """Class to stream hypernym relations for `PoincareModel` from a tsv-like file."""
+
+    def __init__(self, file_path, encoding='utf8', delimiter='\t'):
+        """Initialize instance from file containing one hypernym pair per line.


hypernym pair -> relation (here and elsewhere)

jayantj · 2017-11-13T22:02:34Z

I've added the rst files and made some fixes for python2 bugs. The only failing test is the one that requires autograd now (due to it being missing from test dependencies). With autograd added to test dependencies, the build errors (due to some MKL error, as you mentioned).

menshikh-iv · 2017-11-14T04:34:03Z

@jayantj maybe remove this test (because we can't run it correctly in CI)?

janpom · 2017-11-14T08:03:47Z

gensim/models/poincare.py

+    def __init__(
+        self, train_data, size=50, alpha=0.1, negative=10, workers=1,
+        epsilon=1e-5, burn_in=10, burn_in_alpha=0.01, init_range=(-0.001, 0.001), seed=0):
+        """Initialize and train a Poincare embedding model from an iterable of transitive closure relations.


Is the transitive closure a requirement? If not, let's just say "iterable of relations".

…nput data

jayantj · 2017-11-14T09:45:34Z

@menshikh-iv I've instead added a skiptest in case autograd is not installed, that way we can continue to check if the test runs locally, making development easier. Does that seem okay?

…test

jayantj added 30 commits October 26, 2017 22:13

Initial classes and loading data for poincare model

6afdd22

Initial implementation of training using autograd

a804006

faster negative sampling, bugfix in vector updates

6bd0d4b

allows poincare dist function to be differentiable by autograd

98f94a7

batched gradient descent initial implementation

b727523

minor changes to batch poincare distance computation

1e6aee1

Adds calculation of gradients for poincare model

e286a0b

Correct implementation of clipping of updated vectors

3e28e8b

Fixes error in gradient computation

99a2270

Better messages while training

2e9e31c

Renames PoincareDistance to PoincareExample for clarity

d72cb10

Compares computed gradients to autograd gradients every few iterations

d439501

Avoids doing some numpy computations twice

e1ed24d

Avoids creating copies of numpy vectors

3b2a383

Only calls nan_to_num when gamma has at least one value equal to 1

7d68aae

Simply sets nan gradients to zero instead of nan_to_num

ba82d42

Adds batch-wise implementation of training and gradient computations

71f61d1

Minor correction in clipping

2a5a7fb

Merge branch 'poincare' into poincare_model

0c57aa1

Fixes typo in clip_vectors

9c51609

Prints average loss every few iterations instead of current loss

f22d9b2

Adds weighted negative sampling

7905c8c

Ensures positive edges are not returned by negative sampling

075df25

Poincare model stores node indices in relations instead of node keys

6060e56

Minor renaming; uses node indices for batch training instead of node …

8ea8f23

…keys

Changes shapes of vectors passed to PoincareBatch

b8d77e3

Minor bugfixes related to batch size

0011b93

Corrects implementation of negative sampling for batch training

b52ee2e

Adds option to check gradients in batchwise training

d247384

Checks gradients only every few iterations

8c4f5a3

jayantj added 4 commits November 13, 2017 16:45

Imports mock module for tests correctly in python2

f75491f

Cleaner implementation of __iter__ for PoincareRelations

59fcf8b

Adds rst file and updates apiref.rst for poincare module

dcbe7aa

Adds clarifying comment to PoincareRelations.__iter__

b69f51f

janpom reviewed Nov 13, 2017

View reviewed changes

jayantj added 3 commits November 13, 2017 21:46

Updates rst file for poincare

001ec76

Renames hypernym pair to relations everywhere

9446a05

Simpler way of detecting duplicates

930dfd4

jayantj force-pushed the poincare_model branch from 72d6fc4 to 930dfd4 Compare November 13, 2017 21:58

janpom reviewed Nov 14, 2017

View reviewed changes

jayantj added 2 commits November 14, 2017 14:55

Minor documentation updates in poincare.py

355e521

Skips gradients test if autograd not installed, adds test for bytes i…

0d5175c

…nput data

menshikh-iv and others added 5 commits November 14, 2017 22:11

Fix flake8 (noqa + remove unused var)

00ca7ab

Fix missing mock dependency for win

8ff23ae

Fix links in docstrings

30ac3e6

Changes error message for negative sampling failing

a928ca1

Adds option to specify dtype for PoincareModel and corresponding unit…

dfc19cb

…test

jayantj force-pushed the poincare_model branch from 169fbfc to dfc19cb Compare November 15, 2017 10:35

Extends test for dtype to check after training, updates docstring

e967c54

menshikh-iv merged commit 0ae0f96 into poincare Nov 15, 2017

jayantj mentioned this pull request Dec 4, 2017

[MRG] Add Poincare model #1757

Merged

menshikh-iv deleted the poincare_model branch July 5, 2018 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Poincare Model implementation #1696

[MRG] Poincare Model implementation #1696

jayantj commented Nov 6, 2017 •

edited

Loading

janpom Nov 13, 2017

jayantj Nov 13, 2017

janpom Nov 13, 2017

jayantj Nov 13, 2017

janpom Nov 14, 2017

janpom Nov 13, 2017

jayantj Nov 13, 2017

janpom Nov 13, 2017

jayantj Nov 13, 2017

janpom Nov 13, 2017

jayantj Nov 13, 2017

janpom Nov 13, 2017

jayantj Nov 13, 2017

jayantj commented Nov 13, 2017

menshikh-iv commented Nov 14, 2017

janpom Nov 14, 2017

jayantj Nov 14, 2017

jayantj commented Nov 14, 2017



		class PoincareRelations(object):
		"""Class to stream hypernym relations for `PoincareModel` from a tsv-like file."""

[MRG] Poincare Model implementation #1696

[MRG] Poincare Model implementation #1696

Conversation

jayantj commented Nov 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayantj commented Nov 13, 2017

menshikh-iv commented Nov 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayantj commented Nov 14, 2017

jayantj commented Nov 6, 2017 •

edited

Loading