Doc2VecKeyedVectors doesn't effectively support setitem()/add() #2683

gojomo · 2019-11-21T16:53:13Z

Per user report on SO, neither assignment to a bracketed-access (as would be implemented by __setitem__()) nor use of the add() method will successfully mutate a Doc2VecKeyedVectors object.

Looking closer, it seems the superclass __setItem__() passes through to superclass add(), which was only ever implemented for word-centric sets of vectors – consulting/updating properties like .vocab that only exist as empty values in Doc2VecKeyedVectors because of the currently confused inheritance created by #1777.

The text was updated successfully, but these errors were encountered:

ThijsKranenburg · 2020-01-13T10:03:19Z

As an addition to the SO post, I want to add new documents to the model.

It seems this should be done with the add() method, but since this is not working I figured the following work-around out:

model = Doc2Vec.load(PATH_to_model)

# Add vector and identifier to original values
model.docvecs.vectors_docs =  np.vstack([model.docvecs.vectors_docs, new_vec])
model.docvecs.index2entity.append(new_identifier)

# Test if new document is included
model.docvecs.most_similar(positive = [new_vec])

Calling the most_similar() method returns results including this new document, also after saving and loading the model. So it seems to work.

My question is whether this is a 'correct' way of working around this bug, or if I am missing something.

gojomo · 2020-01-14T21:14:59Z

@ThijsKranenburg - If it works for your purposes, it's good enough! Note though you've not yet done enough to look-up the new vectors by identifier – that's also require adding entries to the model.docvecs.doctags dict. And the possible effects of such a workaround on any further training are unclear.

mpenkov added the feature Issue described a new feature label Dec 21, 2019

gojomo added the bug Issue described a bug label Dec 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc2VecKeyedVectors doesn't effectively support setitem()/add() #2683

Doc2VecKeyedVectors doesn't effectively support setitem()/add() #2683

gojomo commented Nov 21, 2019

ThijsKranenburg commented Jan 13, 2020

gojomo commented Jan 14, 2020

Doc2VecKeyedVectors doesn't effectively support __setitem__()/add() #2683

Doc2VecKeyedVectors doesn't effectively support __setitem__()/add() #2683

Comments

gojomo commented Nov 21, 2019

ThijsKranenburg commented Jan 13, 2020

gojomo commented Jan 14, 2020

Doc2VecKeyedVectors doesn't effectively support setitem()/add() #2683

Doc2VecKeyedVectors doesn't effectively support setitem()/add() #2683