Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code is broken #2

Closed
albertoandreottiATgmail opened this issue Nov 29, 2021 · 11 comments
Closed

Code is broken #2

albertoandreottiATgmail opened this issue Nov 29, 2021 · 11 comments

Comments

@albertoandreottiATgmail

Hi,

I'm trying to reproduce your results, I can generate the cache without issues.
Then when I run the zeste.py script, it seems you're trying to match the labels in the dataset(I'm using the 20 news group) against the number batch, and they just don't match.
The keys in the embeddings are prefixed with the language like in /c/en/potato, but the labels are not. So they never match, and you cannot retrieve any neighbour.
Has this code ever worked?

@Siliam
Copy link
Collaborator

Siliam commented Nov 30, 2021

Hello,
The keys of the embeddings depend on which version of Numberbatch you're using and whether you're using the multilingual embeddings or the English-only ones (the English-only ones don't have the language prefix in their keys). Can you please tell me which one are you using?

@albertoandreottiATgmail
Copy link
Author

Hi @Siliam ,

I'm filtering the embeddings like this to keep only the ones in English,

cat numberbatch-19.08.txt | grep "/en/" > numberbatch-19.08_en.txt

I see that the English only version is available as well, will try with that.
Do I need an equivalent ConceptNet English only assertions?

@albertoandreottiATgmail
Copy link
Author

albertoandreottiATgmail commented Nov 30, 2021

Now using the English embeddings this is what I get,
File "/home/pepito/zeste123/zeste/ZeSTE/utils.py", line 51, in get_word_neighborhood
cnn = get_word_neighborhood(current_node, depth, numberbatch, cache_path, prefetch_path)
File "/home/pepito/zeste123/zeste/ZeSTE/utils.py", line 51, in get_word_neighborhood
cnn = get_word_neighborhood(current_node, depth, numberbatch, cache_path, prefetch_path)
File "/home/pepito/zeste123/zeste/ZeSTE/utils.py", line 51, in get_word_neighborhood
cnn = get_word_neighborhood(current_node, depth, numberbatch, cache_path, prefetch_path)
[Previous line repeated 986 more times]
File "/home/pepito/zeste123/zeste/ZeSTE/utils.py", line 47, in get_word_neighborhood
if current_node in stopwords.words('english'):
File "/home/pepito/.local/lib/python3.6/site-packages/nltk/corpus/reader/wordlist.py", line 21, in words
for line in line_tokenize(self.raw(fileids))
File "/home/pepito/.local/lib/python3.6/site-packages/nltk/corpus/reader/api.py", line 217, in raw
with self.open(f) as fp:
File "/home/pepito/.local/lib/python3.6/site-packages/nltk/corpus/reader/api.py", line 230, in open
stream = self._root.join(file).open(encoding)
File "/home/pepito/.local/lib/python3.6/site-packages/nltk/data.py", line 334, in join
return FileSystemPathPointer(_path)
File "/home/pepito/.local/lib/python3.6/site-packages/nltk/compat.py", line 41, in _decorator
return init_func(*args, **kwargs)
File "/home/pepito/.local/lib/python3.6/site-packages/nltk/data.py", line 310, in init
_path = os.path.abspath(_path)
File "/home/pepito/miniconda3/lib/python3.6/posixpath.py", line 370, in abspath
if not isabs(path):
File "/home/pepito/miniconda3/lib/python3.6/posixpath.py", line 65, in isabs
sep = _get_sep(s)
File "/home/pepito/miniconda3/lib/python3.6/posixpath.py", line 39, in _get_sep
def _get_sep(path):
RecursionError: maximum recursion depth exceeded while calling a Python object

@albertoandreottiATgmail
Copy link
Author

albertoandreottiATgmail commented Nov 30, 2021

It seems you have an ever lasting loop while looking for neighbours between 'dog' and 'woof',

I fixed that in utils by subtracting 1 from the depth in the recursion,

cnn = get_word_neighborhood(current_node, depth-1, numberbatch, cache_path, prefetch_path)

But then you have problems with folders,

pickle.dump(neighborhood, open(prefetch_path, 'wb'))

FileNotFoundError: [Errno 2] No such file or directory: 'cache_en/2/car.pickle'

it expects the '/2/' part of the path to be there.

@Siliam
Copy link
Collaborator

Siliam commented Dec 1, 2021

I see where the problem is coming from. To make iterating on the experiments faster, I have precomputed cached all the neighborhoods at a given depth to make the computing of neighborhoods quicker (so that we don't have to recompute the similarities between nodes every time). I used this as a shortcut in the code but I guess I did not fully document it :)

I will try to circumvent this and come back to you asap

@Siliam
Copy link
Collaborator

Siliam commented Jan 30, 2022

@albertoandreottiATgmail Sorry for my belated reply.
There were several minor bugs that were addressed in c96a449. Could you please try again and let me know if the code works now?

@creatorrr
Copy link

This is still a problem. A few different issues in the caching script as described here. But even after fixing those, the api/autocomplete endpoint returns and empty array [] for all queries. Not sure what's happening but are there any more up to date instructions for running the codebase?

cc/ @Siliam @ehrhart

@ehrhart
Copy link
Contributor

ehrhart commented Dec 5, 2022

Autocomplete require these two files to be put inside the zeste_cache folder:

They contain a list of terms used for autocompletion. Unfortunately, the method used to generate them has not been documented yet.

@ehrhart
Copy link
Contributor

ehrhart commented Jan 3, 2023

I've added instructions to download the vocabulary files with f46450f.

I will close this issue, but if you encounter another problem please open a new issue.

@ehrhart ehrhart closed this as completed Jan 3, 2023
@Mjuve360
Copy link

After downloading the required files and giving paths I encountered below message:
XXX:~# python3 /root/ZeSTE/generate_cache.py
Reading ConceptNet assertions..
Killed
Do you have any idea about the reason of this error?

@Siliam
Copy link
Collaborator

Siliam commented Feb 14, 2024

Hello @Mjuve360,
It seems like the process crashes when you're trying to load the assertions CSV, which is likely due to insufficient memory on your system (out of memory error). Can you test on a different system?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants