Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract text for paper machines not working #22

Open
konini22 opened this issue Nov 6, 2012 · 11 comments
Open

Extract text for paper machines not working #22

konini22 opened this issue Nov 6, 2012 · 11 comments

Comments

@konini22
Copy link

konini22 commented Nov 6, 2012

Hi Chris,

I'm using the latest version of Zotero for Firefox (Firefox version 16.0.2). Once it finishes extracting text, Firefox opens a blank new tab, but all visualization options remain greyed out. I looked into past issues #5 and #12 and tried to replicate the solutions, but it still doesn't work. I extracted text of My Library and also a collection, but both attempts failed.

I got numerous error messages, saying Error in parsing value for ... Declaration dropped.

Can you help me with this, please?

Sorry, Chris,

The problem's been solved. I just had to relocate the phython file. Cheers.

@corajr
Copy link
Contributor

corajr commented Nov 7, 2012

Glad to hear it's working now! Out of curiosity, where was Python before?

@mebrett
Copy link

mebrett commented Nov 7, 2012

konini22, can you explain where you relocated to/from? I'm having the same problem. Thanks!

@konini22
Copy link
Author

konini22 commented Nov 8, 2012

Hi Chris and mebrett,
I stored Python in C:\Users\inoueko\AppData\Roaming\Mozilla\Firefox\Profiles\1nc1d3r.default\zotero, along with zotero files (I work in a library and do have much say in where things get stored). I pasted the path in Paper Machines' preferences, but didn't work for some reason. So I moved the file to its default save destination C:\Python27 (C drive is used for OS disk if it means anything) and reinstalled Paper Machines. Classification is still greyed out, though. I'm most interested in the feature for collection development purposes.

@corajr
Copy link
Contributor

corajr commented Nov 9, 2012

It ought to have been able to find Python there if you gave it the path explicitly; I'll have to do some testing on Windows to sort that one out.

Classification is currently marked as an "experimental feature"; it's not quite ready for use yet. Currently, it takes documents that are manually organized into subcollections as training data for a Maximum Entropy classifier. This classifier can then be used on another, larger set of documents to give the probability that each one belongs to a given category from the training data.

I've been meaning to have it create a new hierarchy of subcollections with the documents reorganized accordingly, but I haven't gotten to it yet. If you're curious what the raw data looks like, you can enable it from "Paper Machines Preferences..." by checking "Enable Experimental Features"; running the trainer and then testing it will give you the raw probabilities.

What did you specifically want to do with it? Were you hoping to do unsupervised clustering, or to do some other kind of classification than the one I describe? I'd be happy to expand Paper Machines to be useful for your application if it seems doable.

P.S. mebrett, can you please tell me:

  • what OS you are running (e.g. Mac OS X 10.6.8, Windows 7 SP1, etc.)
  • what version of Python you have
  • whether you've installed Paper Machines into Zotero for Firefox or Zotero Standalone
  • whether you see any errors in the logs that mention Paper Machines (in Firefox, go to the Tools menu -> Web Developer -> Error Console; in Zotero Standalone you can go to Preferences -> Advanced -> Debug Output -> Enable, then View).

@mebrett
Copy link

mebrett commented Nov 9, 2012

I'm running OS X 10.7.5, Python 2.7.3 ( I was running 2.7.1 but recently updated. I'm running in the standalone (version 3.0.8).

I found once I updated Zotero yesterday I'm no longer having problems with extract text not working. I did not check what version I was running before, but I think it must have been a very early 3.0.x

@corajr
Copy link
Contributor

corajr commented Nov 9, 2012

Glad to hear it! Zotero 3.0.9 is soon to come out, as is Firefox 17, so I'll be sure to test with those versions. Please let me know if you run into any other problems.

@konini22
Copy link
Author

Thank you, Chris. I'm searching for a collection analysis/development tool that is compatible with Zotero. I'm keeping book requests from academics (sorted into collections by academics and departments). Each entry contains LC classification as call number and LC subject headings in tags (headings and sub headings are tagged separately, ex. Romanticism, 19th century, History and Criticism, Great Britain,etc).

I was hoping that Paper Machines create word cloud with phrases from tags, but haven't figured out how to do it (if it does at all). That would help glance at the collection. I haven't tried SEARS because of the warning.

If I could group LC classification numbers in some ways, that would help me target specific classification ranges to build library collections on, but at the moment I have to manually analyse classification numbers.

If I could get my Zotero collections to talk to WorldCat or Open Library through tags and call number and get suggestions (like Amazon's recommendations) on what else to buy based on my collections, that would streamline my collection development tasks hugely.

@brekhusr
Copy link

I'm running Zotero 3.0.13 for Firefox (I have Firefox 18.0.2), , on a Windows machine. I have a collection with about 900 attached PDF's but when I right-click on my collection folder and click Extract Text for Paper Machines, nothing happens. The Word Cloud, Phrase Net, etc. menus remain grayed out. Restarting Firefox does not help.

Question: where would I find out what version of Python I am running? If I am not running Python, how do I get Python?

I'm a librarian and excited about teaching a graduate student workshop on Paper Machines in mid-March, but I need to make it work on my own collection first!

@brekhusr
Copy link

So I got help installing Python 2.7.3. Then I removed and reinstalled Paper Machines. This time, I was able to extract the texts - or at least, I got a progress bar saying "Searching for files to extract" for a few minutes; then a Firefox tab opened and I got Extracting [my collection's name] and a new progress bar, for another few minutes. Then a message appeared in the tab saying "Extracted 878 out of 878 new texts. This window can now be closed." I closed the window. But the World Cloud, Phrase Net, etc. are still grayed out in the right-click menu when I right-click my collection folder.

@corajr corajr closed this as completed Feb 13, 2013
@corajr corajr reopened this Feb 13, 2013
@corajr
Copy link
Contributor

corajr commented Feb 13, 2013

Hi there, (sorry for the accidental close of issue)

In general, clicking on another collection (or the trash, etc.) in the left-hand pane, then back on the original collection will make the visualization options appear. Failing this, a restart of the application will usually enable them; sometimes the database doesn't update properly on the first extraction.

Please let me know if this doesn't work -- for troubleshooting purposes, it'd be great to see any output in the Error Console that mentions Paper Machines (it is located under the Tools menu -> Web Developer).

Also, given the difficulties of getting the correct version of Python installed, I have been working on a version that requires only Java to run. The release is pending, but you are welcome to try it by installing the following:

https://www.dropbox.com/s/rwooxwwlls991w0/papermachines-0.4.0pre2.xpi

There are several other improvements in this version that clarify the user interface for topic modeling in particular. It will receive an "official" release once our new website is launched.

@brekhusr
Copy link

Thanks! The tools were no longer grayed out when I restarted Firefox this morning. I am now happily applying them to the extracted text. I will mention the Java-only download to the grad students during my workshop March 15-16 - do you know how soon the "official" release and new website will launch? Are you thinking days, weeks, or months? Thanks again for your assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants