New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in LDAResults #42
Comments
The fact that text_processors is not seen points to an installation issue. I see you're on Windows. I'm not sure to what extent windows + rosetta has been tested. |
Other functions up to this point appear to work, following https://github.com/columbia-applied-data-science/rosetta/blob/master/examples/vw_helpers.md. |
I am probably missing something, but the bit that seems to cause the error is this: But when I look on github and class SFileFilter under text_processors I dont see a load function. There is a load_sfile function though. |
the load() method comes from the SaveLoad class which SFileFilter inherits perhaps you can trace through and point out exact where the error occurs? also as Ian has pointed out we really don't develop or test in a windows environment so it's a bit hard to see what might or might not work |
Any suggestion on how to trace to the error? I have never seen any instances were a pure python lib failed on Windows, but I am sure it must happen. |
Since it's an import error, can you import text processors? I.e. does the following work?
Could you post a (ideally minimal) gist that causes this error? |
Yes, I can import that which is what is so strange. I don't have a gist set up, but here is what i was attempting. Note: I wasn't sure how to process a single file that had documents as rows, versus representing each document as a file in a folder. So....I broke up such a document into multiple documents (is this required?). #imports#################################################### #GENERATE DATA############################################ def clean(s): dat=dat.apply(clean) #WRITE OUT THE DOCS IN A FOLDER####################################### #CREATE VW FILE################################################# #create the VW format file #load the file again #Remove extremes #Create filtered file for VW #THEN RUN THIS############################################################################################################################################### #THIS IS WHAT THROWS THE ERROR: |
The markup is removing one of the backslashes in the code above, the paths do have double '\'. All the steps before the LDAResults appear to work fine. |
See here for a way to make your code show up as more readable: https://guides.github.com/features/mastering-markdown/#GitHub-flavored-markdown I'm not totally sure I understand what you're doing. How about you do the following:
If you do all that, I can maybe figure out what's going wrong. |
I can try that, I am not that great a python programmer as should be I am trying to follow the example : On 1/26/2015 4:24 PM, Thomas Nyberg wrote:
|
I gave up and installed rosetta on Ubuntu via Virtual Box. The install worked fine (except one failed test which is already noted as an issue). The examples from the above were all run through w/o issue. So, indeed the issue appears to be Windows (install, use of relative imports ?). |
Firstly, can you post information about which test fails? I thought the tests were all passing now and would like to know if they're not... And sorry I didn't reply to your previous message. As a general rule when submitting error reports (which you're doing in an informal manner) it's good to (1) provide a script and necessary data to reproduce the error and (2) to try to make the script and data as "minimal" as possible. So really try to really think if any lines of code can be removed and still keep the error and if the datafile can be minimized as well (it might be the case that you only need one line in your data file to produce the error...in that case don't upload a file with 10000 lines). That reduces noise for others to look at. Once you have that, you should write it in a way where you can just run 'python test.py' and have a failure. That failure should produce a stack trace which you should post in full. Everything that is code or error should be posted with either the markdown formatting that I linked earlier or simply uploaded as a gist. That helps keep the formatting from getting screwed up. Regardless, it might still be hard to help because I think almost all users of the library have either Linux or Macs and so testing/bugs have focused on that. However, if you post a clean script with a full stack trace we might be able to see the error just by reading that. It's good to hear that it's working in Ubuntu/Virtual Box (I use that myself on the one Windows laptop I have), but I think these pointers will help you communicate errors to this or other projects more effectively in the future. |
Following the example in https://github.com/columbia-applied-data-science/rosetta/blob/master/examples/vw_helpers.md
There is an error when I run LDAResults() the following error prints:
ImportError Traceback (most recent call last)
in ()
3 lda = LDAResults('C:\Users\Desktop\DATA\LDA\topics.dat',
4 'C:\Users\Desktop\DATA\LDA\predictions.dat', 'C:/Users/Desktop/DATA/LDA' + '/sff_basic.pkl',
----> 5 num_topics=num_topics)
6 lda.print_topics()
C:\Anaconda\lib\site-packages\rosetta\text\vw_helpers.pyc in init(self, topics_file, predictions_file, sfile_filter, num_topics, alpha, verbose)
230
231 if not isinstance(sfile_filter, text_processors.SFileFilter):
--> 232 sfile_filter = text_processors.SFileFilter.load(sfile_filter)
233
234 self.sfile_frame = sfile_filter.to_frame()
C:\Anaconda\lib\site-packages\rosetta\common_abc.pyc in load(cls, loadfile)
40 """
41 with smart_open(loadfile, 'rb') as f:
---> 42 return cPickle.load(f)
ImportError: No module named text_processors
The text was updated successfully, but these errors were encountered: