New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Cannot convert non-finite values (NA or inf) to integer #20
Comments
Thanks for the report. Could you please output the categories as well? |
Also, what is your "clean_text" function? |
Ah. I was able to recreate the error if all the documents were labeled with the category "a". This should probably result in a more descriptive error, ideally within TermDocMatrixFactory base class. |
And this will happen in 0.0.2.13, which should be coming soon! |
Sorry for not getting back to you in time. Ok, I am glad you can replicate the error. Another area for improved error message would be a bad clean_text. IE one that returns u'' for all documents. The error message would be unclear for debugging. |
Hello,
I am playing with the internals of scatter text to utilize Bokeh as the front end visualization. I found for larger corpus's the time for the javascript to load to be excessive. With Bokeh I can serve up the text on the fly and dynamically re-parse the document based on filtering and such. Right now I am using several of the internal functions to generate the term document matrix to populate the graph data.
In my case I am playing with patent text documents. The results are looking very nice so far but I have encountered an issue shown below with the set of 5 documents (I get the same error for a larger set of documents as well). I replicated the problem in Juptyer notebook with a dump of the problematic document set (embedded in issue below as well).
sc = ScatterChartBokeh(corpus)
chart_dict = sc.to_dict(category='a', category_name='a', not_category_name='b',)
corpus.get_texts().tolist()
The text was updated successfully, but these errors were encountered: