Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix train error of ConcatenatedDoc2Vec in the notebook of doc2vec-IMDB #1377

Merged
merged 25 commits into from
Jul 7, 2017

Conversation

robotcator
Copy link
Contributor

No description provided.

" if not isinstance(train_model, ConcatenatedDoc2Vec):\n",
" train_model.train(doc_list, total_examples=train_model.corpus_count, epochs=train_model.iter)\n",
" else:\n",
" train_model.train(doc_list)\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler and more robust fix would be to change the ConcatenatedDoc2Vec class, in test_doc2vec.py, to make its (no-op) train() match the new train() parameters-signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. If the train() method is modified, the total_examples and epochs should be provided. But the ConcatenatedDoc2Vec class has no attribute 'corpus_count'.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call doesn't have to use train_model.corpus_count from inside the model - it can just use len(doc_list). And since the outside loop is handling the multiple passes, the epochs argument should be 1.

@menshikh-iv
Copy link
Contributor

@robotcator Please merge develop to your branch (missing tensorflow in Travis config).

def train(self, sentences, total_examples=None, total_words=None,
epochs=None, start_alpha=None, end_alpha=None,
word_count=0,
queue_factor=2, report_delay=1.0):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a no-op implementation that ignores any arguments, it can be specified even more compactly and generically, using Python's syntax for arbitrary positional/keyword arguments. EG:

def train(self, *ignored, **kwignored):

@menshikh-iv
Copy link
Contributor

@robotcator I open this notebook and get error Notebook validation failed: u'execution_count' is a required property: ..., can you please re-run all cells and commit it (this should help)

@robotcator
Copy link
Contributor Author

@menshikh-iv all the cells were re-ran and the error was fixed.

"*0.193200 : 1 passes : Doc2Vec(dbow,d100,n5,mc2,s0.001,t4)_inferred 35.3s 48.3s\n",
"*0.268640 : 1 passes : Doc2Vec(dm/m,d100,n5,w10,mc2,s0.001,t4) 48.6s 48.5s\n",
"*0.208000 : 1 passes : Doc2Vec(dm/m,d100,n5,w10,mc2,s0.001,t4)_inferred 48.6s 47.4s\n",
"*0.216160 : 1 passes : dbow+dmm 0.0s 168.9s\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in reported elapsed-time for these concatenated-model results, here and below, is so large it deserves a closer look. I'm no aware of any change, in this PR or prior related work, that could account for such changes – from <2.0 seconds to >170 seconds. It might be a spurious report, or something amiss.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the concatenated-model train method didn't do anything actually. I will check what' going on this.

@menshikh-iv
Copy link
Contributor

@robotcator What's a status of this PR?

@menshikh-iv
Copy link
Contributor

Ping @robotcator

@robotcator
Copy link
Contributor Author

@menshikh-iv sorry for late reply, I didn't receive last notification. This PR seems works fine and all the cell have been rerun. The warning of notebook is gone. It's able to be merged .

@menshikh-iv menshikh-iv merged commit 3e38e33 into piskvorky:develop Jul 7, 2017
@gojomo
Copy link
Collaborator

gojomo commented Jul 7, 2017

There's still the suspiciously-different reported-runtimes on some of the log lines.

@menshikh-iv
Copy link
Contributor

@gojomo As I see, we have only two code change here, and here. Another change from notebook re-run. I don't see any suspicious things in code changes, although elapsed time looks strange.

@gojomo
Copy link
Collaborator

gojomo commented Jul 8, 2017

In fact since the last time the notebook was run and its output committed, there have been many changes (outside this PR). So many things could cause the discrepancy; I just suggest the big runtime difference be understood alongside the updated notebook cells.

@piskvorky
Copy link
Owner

That's indeed very worrying. @menshikh-iv can you find which commit / PR caused this slowdown?

@robotcator
Copy link
Contributor Author

is there any possible that the performance of computer which results in the difference of output. Because I rerun that notebook twice and the second seems alright.

@menshikh-iv
Copy link
Contributor

@piskvorky accepted. I will search and fix if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants