Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cant change vocab vector #13

Open
DanielBeck93 opened this issue Mar 23, 2021 · 2 comments
Open

cant change vocab vector #13

DanielBeck93 opened this issue Mar 23, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@DanielBeck93
Copy link

When I try to change the word embeddings nothing happens

`
import spacy_universal_sentence_encoder
nlp_use = spacy_universal_sentence_encoder.load_model('en_use_lg')

vector = nlp_use('her').vector
nlp_use.vocab.set_vector("Ertha", vector)

print(vector)
print(nlp_use("Ertha"))
`
The two print statements should be equal.

@MartinoMensio
Copy link
Owner

MartinoMensio commented Mar 30, 2021

Hi @DanielBeck93,
At the moment this wrapper library did not allow to set the vectors, it provided only a read-only copy of the values from TensorFlow Hub.
Given your example, I understood that the feature is quite easy to be provided.

With the new v0.4.2, you can achieve setting any vectors to tokens, spans and even entire docs.
The comparison is done on the string that you pass to the vocab.set_vector function. Only exact matches will count.

import spacy_universal_sentence_encoder
nlp_use = spacy_universal_sentence_encoder.load_model('en_use_lg')

vector = nlp_use('her').vector
nlp_use.vocab.set_vector("Ertha", vector)

# compare the vectors, now they are equal
print((vector == nlp_use("Ertha").vector).all())

Please note that changing the vector to a single word does not change the vector of the spans/sentences/documents containing it.
If you try to compare the sentences "good for her" and "good for Ertha", they will be different, even if you have set the value of "Ertha" to be equal to "her". This happens because the USE model isn't a simple average of the word embeddings.
If you want to substitute the words in the document, my suggestion would be to substitute the string "Ertha" with "her" in the text before creating the document with nlp_use(text).

Let me know if this works for you.
Best,
Martino

@MartinoMensio MartinoMensio added the enhancement New feature or request label Mar 30, 2021
@MartinoMensio
Copy link
Owner

The version 0.4.2 has been yanked due to a problem found in #14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants