Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about the parameters in "config.ini" #2

Closed
hit-joseph opened this issue Dec 26, 2018 · 5 comments
Closed

about the parameters in "config.ini" #2

hit-joseph opened this issue Dec 26, 2018 · 5 comments

Comments

@hit-joseph
Copy link

HI @bzhangGo :
I found that in the file "config.ini" a parameters named oov num can been seen, can you explain what it means exactly. is it the number of Chinese oov words or English or total of both. should I count it and rewrite it before I make the entire project.?
I found the vector of "" in the begin of both ch and en vector file, so I want to ask is it necessary?
In the file of demo-data set ,I found nearly all of the Chinese phrases start with % or $ , what's it mean?
PS: if there are some training tricks please tell me.

@bzhangGo
Copy link
Contributor

  1. OOV num indicates the minimum frequency of a word to be included in the vocabulary.
  2. I don't understand the "" vector problem you pointed out. Can you give more details?
  3. The demo data come from our extracted bilingual phrases in our SMT system. They have their original meaning, i.e. % means percent, and $ means us dollar. They are not specific symbols required by our model.

@hit-joseph
Copy link
Author

hit-joseph commented Dec 28, 2018

Thank you so much , question 2 means vector of "< / s >", sorry for loss it if I type it directly , and I have known that it is just a special space character create by C-word2vec tools and have canceled in python-word2vec tools. By the way, can you add an instruction to this project about all the parameters in the file of Config.ini to help others to understand it . That will be great!
thank you so much again with best wishes!

@bzhangGo
Copy link
Contributor

Thanks for your suggestions. I will provide more explanation to the parameters in Config.ini.

@hit-joseph
Copy link
Author

sorry for asking for some more questions, when I run the code a few Iterations it come to an error "段错误 (核心已转储)", have you ever meet that problem? and can you give me some advice? thanks

@bzhangGo
Copy link
Contributor

bzhangGo commented Jan 1, 2019

Sorry, but I didnot remember this kind of error. Perhaps it's because the index of array is beyond its size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants