Skip to content

Commit

Permalink
corrected dependencies
Browse files Browse the repository at this point in the history
  • Loading branch information
mkhodak committed Oct 1, 2018
1 parent 291f380 commit 1d28e15
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 9 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,17 +19,17 @@ On a 32-core computer, 25 epochs of AdaGrad run in 3.8 hours on Wikipedia cooccu

Note that our code takes as input an upper-triangular, zero-indexed cooccurrence matrix rather than the full, one-indexed cooccurrence matrix used by the original GloVe code. To convert to our (more disk-memory efficient) version you can use the method <tt>reformat_coocfile</tt> in <tt>solvers.py</tt>. We also allow direct, parallel computation of the vocab and cooccurrence files.

Dependencies: NumPy, [SharedArray](https://pypi.org/project/SharedArray/)
Dependencies: numpy, numba, [SharedArray](https://pypi.org/project/SharedArray/)

Optional: h5py, mpi4py*, scikit-learn
Optional: h5py, mpi4py*, scipy, scikit-learn

\* required for parallelism; [MPI](http://www.mpich.org/downloads/) can be easily installed on Linux, Mac, and Windows Subsystem for Linux

# DisC embeddings

Scripts to recreate the results in the paper are provided in the directory <tt>scripts-AKSV2018</tt>. 1600-dimensional GloVe embeddings trained on the Amazon Product Corpus [3] are provided [here](http://nlp.cs.princeton.edu/DisC/amazon_glove1600.txt.bz2).

Dependencies: NLTK, NumPy, SciPy, scikit-learn
Dependencies: nltk, numpy, scipy, scikit-learn

Optional: tensorflow

Expand Down
10 changes: 4 additions & 6 deletions solvers.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,9 @@
from collections import deque
from operator import itemgetter
from tempfile import NamedTemporaryFile as NTF
import h5py
import SharedArray as sa
import numpy as np
from numba import jit
from sklearn.linear_model import LinearRegression as LR
from text_embedding.documents import *


Expand Down Expand Up @@ -254,8 +253,6 @@ def __enter__(self):

def __exit__(self, *args):

import SharedArray as sa

for array in self._shared:
try:
sa.delete(array)
Expand All @@ -264,8 +261,6 @@ def __exit__(self, *args):

def create(self, array=None, dtype=None):

import SharedArray as sa

comm, rank = self._comm, self._rank

if rank:
Expand Down Expand Up @@ -410,6 +405,8 @@ def save(self, fid):
None
'''

import h5py

if not self._rank:
f = h5py.File(fid)
for name, param in zip(self._pnames, self._params[:self._numpar]):
Expand Down Expand Up @@ -832,6 +829,7 @@ def align_params(params, srcvocab, tgtvocab, mean_fill=True):
def induce_embeddings(srcvocab, srccooc, srcvecs, tgtvocab, tgtcooc, comm=None):

from scipy import sparse as sp
from sklearn.linear_model import LinearRegression as LR

rank, size = ranksize(comm)
Vsrc, d = srcvecs.shape
Expand Down

0 comments on commit 1d28e15

Please sign in to comment.