-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Article's score #1
Comments
Thanks for your interest in our work. A simple script may be not enough to fully cover the experiment procedure. Here are some instructions to reproduce our results:
Your suggestion of improvements is very welcomed. |
Hey @bcol23 , I'm trying to create label and word embeddings using Poincare embeddings and Poincaré GloVe respectively. I'm facing issues with traning word embeddings using poincare glove model. I've created the vocab (7MB) and co-occurence file (7.5GB) from the glove's code for RCV1 dataset. When i train the word embeddings, the process is very slow and utilising single core of CPU only (no GPUs).
Thanks |
I didn't keep copies of the pre-trained files and logs. We adopted the word embedding setup detailed in the experiments section of the PoincaréGlove paper. The default setup of gensim worked quite well for the label embeddings. |
Thanks @bcol23 for the response. |
Perhaps it is necessary to check that the workers parameter works properly, since only a single core of CPU is used. And the functionality of the vanille Euclidean GloVe should also be checked. |
I changed the workers parameter to a very high value. Still no impact.
|
I do not keep logs of the exact file sizes. As described in the
should work for a Cartesian product of Poincare balls. The initialization trick should also be applied by setting the |
Thanks @bcol23 for the reply. These details are not mentioned in the paper, therefore asking here. Thanks |
As described in Section 9 of the Poincaré GloVe paper, the initialization trick is used to improve the embeddings when initialized with pretrained parameters. This can be done by first training the model on the restricted vocabulary, and then using this model as an initialization for the full vocabulary. The CPU is Intel Xeon E5-2683 and the training process of the embeddings using poincare_glove should be done within an hour. |
Hi @bcol23
Thank you very much for the inspiring work and publishing your code. I found it very interesting.
I played around with the repo and was discouraged a little bit - I can not achieve score you published in the article. Could you please provide a script to reproduce the published results for RCV1 dataset.
Also I would suggest a few improvement of the project's structure if you don't mind.
The text was updated successfully, but these errors were encountered: