Code is broken #2

albertoandreottiATgmail · 2021-11-29T21:26:20Z

Hi,

I'm trying to reproduce your results, I can generate the cache without issues.
Then when I run the zeste.py script, it seems you're trying to match the labels in the dataset(I'm using the 20 news group) against the number batch, and they just don't match.
The keys in the embeddings are prefixed with the language like in /c/en/potato, but the labels are not. So they never match, and you cannot retrieve any neighbour.
Has this code ever worked?

Siliam · 2021-11-30T10:16:27Z

Hello,
The keys of the embeddings depend on which version of Numberbatch you're using and whether you're using the multilingual embeddings or the English-only ones (the English-only ones don't have the language prefix in their keys). Can you please tell me which one are you using?

albertoandreottiATgmail · 2021-11-30T15:02:14Z

Hi @Siliam ,

I'm filtering the embeddings like this to keep only the ones in English,

cat numberbatch-19.08.txt | grep "/en/" > numberbatch-19.08_en.txt

I see that the English only version is available as well, will try with that.
Do I need an equivalent ConceptNet English only assertions?

albertoandreottiATgmail · 2021-11-30T21:54:15Z

Now using the English embeddings this is what I get,
File "/home/pepito/zeste123/zeste/ZeSTE/utils.py", line 51, in get_word_neighborhood
cnn = get_word_neighborhood(current_node, depth, numberbatch, cache_path, prefetch_path)
File "/home/pepito/zeste123/zeste/ZeSTE/utils.py", line 51, in get_word_neighborhood
cnn = get_word_neighborhood(current_node, depth, numberbatch, cache_path, prefetch_path)
File "/home/pepito/zeste123/zeste/ZeSTE/utils.py", line 51, in get_word_neighborhood
cnn = get_word_neighborhood(current_node, depth, numberbatch, cache_path, prefetch_path)
[Previous line repeated 986 more times]
File "/home/pepito/zeste123/zeste/ZeSTE/utils.py", line 47, in get_word_neighborhood
if current_node in stopwords.words('english'):
File "/home/pepito/.local/lib/python3.6/site-packages/nltk/corpus/reader/wordlist.py", line 21, in words
for line in line_tokenize(self.raw(fileids))
File "/home/pepito/.local/lib/python3.6/site-packages/nltk/corpus/reader/api.py", line 217, in raw
with self.open(f) as fp:
File "/home/pepito/.local/lib/python3.6/site-packages/nltk/corpus/reader/api.py", line 230, in open
stream = self._root.join(file).open(encoding)
File "/home/pepito/.local/lib/python3.6/site-packages/nltk/data.py", line 334, in join
return FileSystemPathPointer(_path)
File "/home/pepito/.local/lib/python3.6/site-packages/nltk/compat.py", line 41, in _decorator
return init_func(*args, **kwargs)
File "/home/pepito/.local/lib/python3.6/site-packages/nltk/data.py", line 310, in init
_path = os.path.abspath(_path)
File "/home/pepito/miniconda3/lib/python3.6/posixpath.py", line 370, in abspath
if not isabs(path):
File "/home/pepito/miniconda3/lib/python3.6/posixpath.py", line 65, in isabs
sep = _get_sep(s)
File "/home/pepito/miniconda3/lib/python3.6/posixpath.py", line 39, in _get_sep
def _get_sep(path):
RecursionError: maximum recursion depth exceeded while calling a Python object

albertoandreottiATgmail · 2021-11-30T22:12:04Z

It seems you have an ever lasting loop while looking for neighbours between 'dog' and 'woof',

I fixed that in utils by subtracting 1 from the depth in the recursion,

cnn = get_word_neighborhood(current_node, depth-1, numberbatch, cache_path, prefetch_path)

But then you have problems with folders,

pickle.dump(neighborhood, open(prefetch_path, 'wb'))

FileNotFoundError: [Errno 2] No such file or directory: 'cache_en/2/car.pickle'

it expects the '/2/' part of the path to be there.

Siliam · 2021-12-01T09:48:29Z

I see where the problem is coming from. To make iterating on the experiments faster, I have precomputed cached all the neighborhoods at a given depth to make the computing of neighborhoods quicker (so that we don't have to recompute the similarities between nodes every time). I used this as a shortcut in the code but I guess I did not fully document it :)

I will try to circumvent this and come back to you asap

Siliam · 2022-01-30T21:03:03Z

@albertoandreottiATgmail Sorry for my belated reply.
There were several minor bugs that were addressed in c96a449. Could you please try again and let me know if the code works now?

creatorrr · 2022-12-04T13:26:30Z

This is still a problem. A few different issues in the caching script as described here. But even after fixing those, the api/autocomplete endpoint returns and empty array [] for all queries. Not sure what's happening but are there any more up to date instructions for running the codebase?

cc/ @Siliam @ehrhart

ehrhart · 2022-12-05T10:16:05Z

Autocomplete require these two files to be put inside the zeste_cache folder:

They contain a list of terms used for autocompletion. Unfortunately, the method used to generate them has not been documented yet.

ehrhart · 2023-01-03T17:20:48Z

I've added instructions to download the vocabulary files with f46450f.

I will close this issue, but if you encounter another problem please open a new issue.

Mjuve360 · 2024-02-14T08:35:06Z

After downloading the required files and giving paths I encountered below message:
XXX:~# python3 /root/ZeSTE/generate_cache.py
Reading ConceptNet assertions..
Killed
Do you have any idea about the reason of this error?

Siliam · 2024-02-14T15:42:08Z

Hello @Mjuve360,
It seems like the process crashes when you're trying to load the assertions CSV, which is likely due to insufficient memory on your system (out of memory error). Can you test on a different system?

ehrhart closed this as completed Jan 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code is broken #2

Code is broken #2

albertoandreottiATgmail commented Nov 29, 2021

Siliam commented Nov 30, 2021 •

edited

albertoandreottiATgmail commented Nov 30, 2021

albertoandreottiATgmail commented Nov 30, 2021 •

edited

albertoandreottiATgmail commented Nov 30, 2021 •

edited

Siliam commented Dec 1, 2021

Siliam commented Jan 30, 2022 •

edited

creatorrr commented Dec 4, 2022

ehrhart commented Dec 5, 2022

ehrhart commented Jan 3, 2023

Mjuve360 commented Feb 14, 2024

Siliam commented Feb 14, 2024

Code is broken #2

Code is broken #2

Comments

albertoandreottiATgmail commented Nov 29, 2021

Siliam commented Nov 30, 2021 • edited

albertoandreottiATgmail commented Nov 30, 2021

albertoandreottiATgmail commented Nov 30, 2021 • edited

albertoandreottiATgmail commented Nov 30, 2021 • edited

Siliam commented Dec 1, 2021

Siliam commented Jan 30, 2022 • edited

creatorrr commented Dec 4, 2022

ehrhart commented Dec 5, 2022

ehrhart commented Jan 3, 2023

Mjuve360 commented Feb 14, 2024

Siliam commented Feb 14, 2024

Siliam commented Nov 30, 2021 •

edited

albertoandreottiATgmail commented Nov 30, 2021 •

edited

albertoandreottiATgmail commented Nov 30, 2021 •

edited

Siliam commented Jan 30, 2022 •

edited