Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError("you must first build vocabulary before training the model") #1

Closed
asmundur opened this issue Nov 3, 2018 · 1 comment

Comments

@asmundur
Copy link

asmundur commented Nov 3, 2018

running the simple tutorial code, I get this error

Traceback (most recent call last): File "get_embeddings.py", line 25, in <module> model = Word2Vec(corpus, size=250, window=5, min_count=3) File "/home/asmundur/.local/lib/python3.6/site-packages/gensim/models/word2vec.py", line 767, in __init__ fast_version=FAST_VERSION) File "/home/asmundur/.local/lib/python3.6/site-packages/gensim/models/base_any2vec.py", line 763, in __init__ end_alpha=self.min_alpha, compute_loss=compute_loss) File "/home/asmundur/.local/lib/python3.6/site-packages/gensim/models/word2vec.py", line 892, in train queue_factor=queue_factor, report_delay=report_delay, compute_loss=compute_loss, callbacks=callbacks) File "/home/asmundur/.local/lib/python3.6/site-packages/gensim/models/base_any2vec.py", line 1081, in train **kwargs) File "/home/asmundur/.local/lib/python3.6/site-packages/gensim/models/base_any2vec.py", line 536, in train total_words=total_words, **kwargs) File "/home/asmundur/.local/lib/python3.6/site-packages/gensim/models/base_any2vec.py", line 1187, in _check_training_sanity raise RuntimeError("you must first build vocabulary before training the model") RuntimeError: you must first build vocabulary before training the model

@Alxmrphi
Copy link
Owner

Alxmrphi commented Nov 14, 2018

Sæll Ásmundur,

Takk fyrir að láta mig vita af þessu Já, ég sé að skjalið á vera keyrt í sömu möppunni (“MIM”) miðað við gamla kóðann en það á ekki að skipta máli. Er búinn að laga klassann í textanum. Afsakið hvað svarið kemur seint.

class MIM_Parser(object):
    def __init__(self, mim_folder):
        self.mim_folder = mim_folder
 
    def __iter__(self):
        for folder in os.listdir(self.mim_folder):
            if os.path.isdir(os.path.join(self.mim_folder, folder)):
                current_folder = os.path.join(self.mim_folder, folder)
                for file in os.listdir(current_folder):
                    if not file.endswith('.xml'):
                        continue
                    root = parse(os.path.join(current_folder, file))
                    for sentence in root.getElementsByTagName('s'):
                        words = sentence.getElementsByTagName('w')
                        cs = [] # current sentence
                        for word in words:
                            cs.append(word.getAttribute('lemma'))
                        yield cs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants