Extract text for paper machines not working #22

konini22 · 2012-11-06T23:01:09Z

Hi Chris,

I'm using the latest version of Zotero for Firefox (Firefox version 16.0.2). Once it finishes extracting text, Firefox opens a blank new tab, but all visualization options remain greyed out. I looked into past issues #5 and #12 and tried to replicate the solutions, but it still doesn't work. I extracted text of My Library and also a collection, but both attempts failed.

I got numerous error messages, saying Error in parsing value for ... Declaration dropped.

Can you help me with this, please?

Sorry, Chris,

The problem's been solved. I just had to relocate the phython file. Cheers.

corajr · 2012-11-07T02:05:06Z

Glad to hear it's working now! Out of curiosity, where was Python before?

mebrett · 2012-11-07T03:36:01Z

konini22, can you explain where you relocated to/from? I'm having the same problem. Thanks!

konini22 · 2012-11-08T00:22:19Z

Hi Chris and mebrett,
I stored Python in C:\Users\inoueko\AppData\Roaming\Mozilla\Firefox\Profiles\1nc1d3r.default\zotero, along with zotero files (I work in a library and do have much say in where things get stored). I pasted the path in Paper Machines' preferences, but didn't work for some reason. So I moved the file to its default save destination C:\Python27 (C drive is used for OS disk if it means anything) and reinstalled Paper Machines. Classification is still greyed out, though. I'm most interested in the feature for collection development purposes.

corajr · 2012-11-09T03:43:41Z

It ought to have been able to find Python there if you gave it the path explicitly; I'll have to do some testing on Windows to sort that one out.

Classification is currently marked as an "experimental feature"; it's not quite ready for use yet. Currently, it takes documents that are manually organized into subcollections as training data for a Maximum Entropy classifier. This classifier can then be used on another, larger set of documents to give the probability that each one belongs to a given category from the training data.

I've been meaning to have it create a new hierarchy of subcollections with the documents reorganized accordingly, but I haven't gotten to it yet. If you're curious what the raw data looks like, you can enable it from "Paper Machines Preferences..." by checking "Enable Experimental Features"; running the trainer and then testing it will give you the raw probabilities.

What did you specifically want to do with it? Were you hoping to do unsupervised clustering, or to do some other kind of classification than the one I describe? I'd be happy to expand Paper Machines to be useful for your application if it seems doable.

P.S. mebrett, can you please tell me:

what OS you are running (e.g. Mac OS X 10.6.8, Windows 7 SP1, etc.)
what version of Python you have
whether you've installed Paper Machines into Zotero for Firefox or Zotero Standalone
whether you see any errors in the logs that mention Paper Machines (in Firefox, go to the Tools menu -> Web Developer -> Error Console; in Zotero Standalone you can go to Preferences -> Advanced -> Debug Output -> Enable, then View).

mebrett · 2012-11-09T03:51:53Z

I'm running OS X 10.7.5, Python 2.7.3 ( I was running 2.7.1 but recently updated. I'm running in the standalone (version 3.0.8).

I found once I updated Zotero yesterday I'm no longer having problems with extract text not working. I did not check what version I was running before, but I think it must have been a very early 3.0.x

corajr · 2012-11-09T03:53:52Z

Glad to hear it! Zotero 3.0.9 is soon to come out, as is Firefox 17, so I'll be sure to test with those versions. Please let me know if you run into any other problems.

konini22 · 2012-11-13T20:28:36Z

Thank you, Chris. I'm searching for a collection analysis/development tool that is compatible with Zotero. I'm keeping book requests from academics (sorted into collections by academics and departments). Each entry contains LC classification as call number and LC subject headings in tags (headings and sub headings are tagged separately, ex. Romanticism, 19th century, History and Criticism, Great Britain,etc).

I was hoping that Paper Machines create word cloud with phrases from tags, but haven't figured out how to do it (if it does at all). That would help glance at the collection. I haven't tried SEARS because of the warning.

If I could group LC classification numbers in some ways, that would help me target specific classification ranges to build library collections on, but at the moment I have to manually analyse classification numbers.

If I could get my Zotero collections to talk to WorldCat or Open Library through tags and call number and get suggestions (like Amazon's recommendations) on what else to buy based on my collections, that would streamline my collection development tasks hugely.

brekhusr · 2013-02-13T14:31:41Z

I'm running Zotero 3.0.13 for Firefox (I have Firefox 18.0.2), , on a Windows machine. I have a collection with about 900 attached PDF's but when I right-click on my collection folder and click Extract Text for Paper Machines, nothing happens. The Word Cloud, Phrase Net, etc. menus remain grayed out. Restarting Firefox does not help.

Question: where would I find out what version of Python I am running? If I am not running Python, how do I get Python?

I'm a librarian and excited about teaching a graduate student workshop on Paper Machines in mid-March, but I need to make it work on my own collection first!

brekhusr · 2013-02-13T19:12:23Z

So I got help installing Python 2.7.3. Then I removed and reinstalled Paper Machines. This time, I was able to extract the texts - or at least, I got a progress bar saying "Searching for files to extract" for a few minutes; then a Firefox tab opened and I got Extracting [my collection's name] and a new progress bar, for another few minutes. Then a message appeared in the tab saying "Extracted 878 out of 878 new texts. This window can now be closed." I closed the window. But the World Cloud, Phrase Net, etc. are still grayed out in the right-click menu when I right-click my collection folder.

corajr · 2013-02-13T21:44:24Z

Hi there, (sorry for the accidental close of issue)

In general, clicking on another collection (or the trash, etc.) in the left-hand pane, then back on the original collection will make the visualization options appear. Failing this, a restart of the application will usually enable them; sometimes the database doesn't update properly on the first extraction.

Please let me know if this doesn't work -- for troubleshooting purposes, it'd be great to see any output in the Error Console that mentions Paper Machines (it is located under the Tools menu -> Web Developer).

Also, given the difficulties of getting the correct version of Python installed, I have been working on a version that requires only Java to run. The release is pending, but you are welcome to try it by installing the following:

https://www.dropbox.com/s/rwooxwwlls991w0/papermachines-0.4.0pre2.xpi

There are several other improvements in this version that clarify the user interface for topic modeling in particular. It will receive an "official" release once our new website is launched.

brekhusr · 2013-02-15T17:43:48Z

Thanks! The tools were no longer grayed out when I restarted Firefox this morning. I am now happily applying them to the extracted text. I will mention the Java-only download to the grad students during my workshop March 15-16 - do you know how soon the "official" release and new website will launch? Are you thinking days, weeks, or months? Thanks again for your assistance.

corajr closed this as completed Feb 13, 2013

corajr reopened this Feb 13, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract text for paper machines not working #22

Extract text for paper machines not working #22

konini22 commented Nov 6, 2012

corajr commented Nov 7, 2012

mebrett commented Nov 7, 2012

konini22 commented Nov 8, 2012

corajr commented Nov 9, 2012

mebrett commented Nov 9, 2012

corajr commented Nov 9, 2012

konini22 commented Nov 13, 2012

brekhusr commented Feb 13, 2013

brekhusr commented Feb 13, 2013

corajr commented Feb 13, 2013

brekhusr commented Feb 15, 2013

Extract text for paper machines not working #22

Extract text for paper machines not working #22

Comments

konini22 commented Nov 6, 2012

corajr commented Nov 7, 2012

mebrett commented Nov 7, 2012

konini22 commented Nov 8, 2012

corajr commented Nov 9, 2012

mebrett commented Nov 9, 2012

corajr commented Nov 9, 2012

konini22 commented Nov 13, 2012

brekhusr commented Feb 13, 2013

brekhusr commented Feb 13, 2013

corajr commented Feb 13, 2013

brekhusr commented Feb 15, 2013