Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting NAN on last trigram when using google binary #20

Open
GoogleCodeExporter opened this issue Jul 16, 2015 · 1 comment
Open

Getting NAN on last trigram when using google binary #20

GoogleCodeExporter opened this issue Jul 16, 2015 · 1 comment

Comments

@GoogleCodeExporter
Copy link

Hi
Adding to my previous posts in issues 19, I am trying to use google binary 
(from google books) and get log probabilities of trigrams from some text. I am 
getting NAN from the last trigrams. Attached is the code of what I am trying to 
do. I am slightly modified these files and added some System.out.printlns to 
see the outputs.

I text I am testing with is "Hello how are you". So essentially it is giving me 
a sent [7380255 15474 152 26 45 7380256]. 7380255 is the start symbol and 
7380256 is the stop symbol.

I am first getting the log probability of the bigram 7380255 15474, by passing 
startpos as 0 and endpos as 2. Thereafter I am getting the log probabilities of 
trigrams starting with startpos 0, like the code below

for (int i = 0; i <= sent.length - 3; i++) {
    System.out.println("Getting score from " + sent[i] + " to " + sent[i+2]);
    score = lm_.getLogProb(sent, i, i+3);
    System.out.println("score " + score);
    if(Float.isNaN(score))
    System.out.println("Returned NaN");
    else
    sentScore += score;
}

The problem is happening with within StupidBackoffLm in the following line 
probContext = localMap.getValueAndOffset(probContext, probContextOrder, 
ngram[i], scratch);
only with the last trigram when startpost is 3 and end pos is 6.
scratch.value is returning -1 with ngram[i] being the end symbol or 7380256. 
This is resulting in a NAN logprob. 

I tried the same with scoreSentence, it gives the same problem.


Can you please help me in understanding what mistake I am doing ?

Thanks
Regards
Debanjan

Original issue reported on code.google.com by b.deban...@gmail.com on 24 Mar 2014 at 11:36

Attachments:

@GoogleCodeExporter
Copy link
Author

Any chance you can give me a command line and data set that reproduces the 
crash?

Original comment by adpa...@gmail.com on 7 Sep 2014 at 7:06

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant