New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code is broken #2
Comments
Hello, |
Hi @Siliam , I'm filtering the embeddings like this to keep only the ones in English, cat numberbatch-19.08.txt | grep "/en/" > numberbatch-19.08_en.txt I see that the English only version is available as well, will try with that. |
Now using the English embeddings this is what I get, |
It seems you have an ever lasting loop while looking for neighbours between 'dog' and 'woof', I fixed that in utils by subtracting 1 from the depth in the recursion, cnn = get_word_neighborhood(current_node, depth-1, numberbatch, cache_path, prefetch_path) But then you have problems with folders,
FileNotFoundError: [Errno 2] No such file or directory: 'cache_en/2/car.pickle' it expects the '/2/' part of the path to be there. |
I see where the problem is coming from. To make iterating on the experiments faster, I have precomputed cached all the neighborhoods at a given depth to make the computing of neighborhoods quicker (so that we don't have to recompute the similarities between nodes every time). I used this as a shortcut in the code but I guess I did not fully document it :) I will try to circumvent this and come back to you asap |
@albertoandreottiATgmail Sorry for my belated reply. |
This is still a problem. A few different issues in the caching script as described here. But even after fixing those, the |
Autocomplete require these two files to be put inside the They contain a list of terms used for autocompletion. Unfortunately, the method used to generate them has not been documented yet. |
I've added instructions to download the vocabulary files with f46450f. I will close this issue, but if you encounter another problem please open a new issue. |
After downloading the required files and giving paths I encountered below message: |
Hello @Mjuve360, |
Hi,
I'm trying to reproduce your results, I can generate the cache without issues.
Then when I run the zeste.py script, it seems you're trying to match the labels in the dataset(I'm using the 20 news group) against the number batch, and they just don't match.
The keys in the embeddings are prefixed with the language like in /c/en/potato, but the labels are not. So they never match, and you cannot retrieve any neighbour.
Has this code ever worked?
The text was updated successfully, but these errors were encountered: