Skip to content
This repository has been archived by the owner on Jan 26, 2021. It is now read-only.

"Invalid topic assignment from word proposal" error #15

Closed
zxvix opened this issue Dec 5, 2015 · 4 comments
Closed

"Invalid topic assignment from word proposal" error #15

zxvix opened this issue Dec 5, 2015 · 4 comments

Comments

@zxvix
Copy link

zxvix commented Dec 5, 2015

Hi, when running on my own dataset using the following command,

lightlda -num_vocabs 64253 -num_topics 1000 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 10 -num_blocks 1 -max_num_document 4000000 -input_dir my_dataset_dir -data_capacity 1600

I encounter this error immediately after training starts:

[INFO] [2015-12-05 19:30:24] INFO: block = 0, the number of slice = 1
[INFO] [2015-12-05 19:30:24] Server 0 starts: num_workers=1 endpoint=inproc://server
[INFO] [2015-12-05 19:30:24] Server 0: Worker registratrion completed: workers=1 trainers=10 servers=1
[INFO] [2015-12-05 19:30:24] Rank 0/1: Multiverso initialized successfully.
[INFO] [2015-12-05 19:30:26] Rank 0/1: Begin of configuration and initialization.
[INFO] [2015-12-05 19:31:00] Rank 0/1: End of configration and initialization.
[INFO] [2015-12-05 19:31:00] Rank 0/1: Begin of training.
[DEBUG] [2015-12-05 19:31:00] Request params. start = 1, end = 64252
[INFO] [2015-12-05 19:31:01] Rank = 0, Iter = 0, Block = 0, Slice = 0
[FATAL] [2015-12-05 19:31:01] Invalid topic assignment 1737313747 from word proposal
[FATAL] [2015-12-05 19:31:01] Invalid topic assignment 1866155390 from word proposal
[FATAL] [2015-12-05 19:31:01] Invalid topic assignment 725731578 from word proposal
Segmentation fault (core dumped)

How can I get an idea of what is going wrong?

@zxvix
Copy link
Author

zxvix commented Dec 5, 2015

Additional information: when looking through the options of the program, I tried warm_start and found that with warn_start on, it could train for 2 iterations before getting the error.

@feiga
Copy link
Contributor

feiga commented Dec 7, 2015

This is not related with warm start or cold start. Such unexpected crash usually is caused by the invalid input data. A possible reason may be the TF information is not proper. Please check your word_id file, make sure the TF of each word is at least greater than the really TF in your dataset.

@zxvix
Copy link
Author

zxvix commented Dec 7, 2015

Thanks for your reply! The TF field in my data indeed went wrong. Now the error is gone.

@koustuvsinha
Copy link

i am having issues with this problem for quite long. i checked and rechecked my word_id file and libsvm file, even made sure the TF of each word is at least greater than the real TF in my dataset. can you check what is wrong with this small dataset that i am trying to use? there are 4 documents and I am requesting 2 topics. this is driving me nuts!

issue.zip

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants