Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the log base of NgramLanguageModel.getLogProb() ? #12

Open
GoogleCodeExporter opened this issue Jul 16, 2015 · 5 comments
Open

Comments

@GoogleCodeExporter
Copy link

The method documentation doesn't say and it's very non-apparent from the code.

Original issue reported on code.google.com by denis.fi...@gmail.com on 6 May 2013 at 9:38

@GoogleCodeExporter
Copy link
Author

Answering my own question, it looks like it's 10.
It should probably be in the method description. Thanks!

Original comment by denis.fi...@gmail.com on 7 May 2013 at 1:38

@GoogleCodeExporter
Copy link
Author

Sorry, I thought I had replied.

That particular method doesn't actually know what base it is, since the LM was 
probably constructed from an ARPA file, and those files can be in whatever base 
they want (they are stored as logarithms). Building an LM with BerkeleyLM done 
in base 10 to mimic the behaviour of SRILM. So the answer is almost certainly 
"10", unless you constructed your LM in a non-standard way. 

Original comment by adpa...@google.com on 7 May 2013 at 2:19

@GoogleCodeExporter
Copy link
Author

Thanks! 
I still think it should be a part of the specification because the contract of 
an n-gram LM is a proper distribution p(w_i|...), and we can't have it without 
knowing the log base. It's true that many applications don't care about the log 
base, but some do (e.g., perplexity, text generation).

Original comment by denis.fi...@gmail.com on 7 May 2013 at 9:49

@GoogleCodeExporter
Copy link
Author

I glanced at the code, and it looks as though StupidBackoff is using log base 
e, while the Kneser-Ney models are using log base 10.  I've been using this 
package for my research, and it'd be nice to know what exactly the values are 
supposed to be.

Original comment by acgris...@gmail.com on 15 Nov 2013 at 7:56

@GoogleCodeExporter
Copy link
Author

Sorry, I missed this somehow.

I've added some comments in the latest SVN to hopefully clear this up. I'm not 
going to change it, just because I want to mimic SRILM in constructing 
Kneser-Ney LMs, and also don't want to change the logarithm base on 
StupidBackoffLms because that would change the models on people who are 
currently using them. Hope that clarifies things!

Original comment by adpa...@gmail.com on 6 Dec 2013 at 6:30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant