Skip to content
This repository

Phrase net is full of stop words #31

Open
brekhusr opened this Issue · 4 comments

2 participants

brekhusr Chris Johnson-Roberson
brekhusr

Phrase-Net-x-a-y
Phrase-Net-x-y jpg

These two phrase nets did not tell me very much about my texts...is there a way to avoid this kind of result when working with PDFs with a lot of embedded text/metadata?

Chris Johnson-Roberson
Owner

By adding your own stop words (1 per line) to the file "stopwords.txt" in the Paper Machines data folder, you should be able to get a clearer picture of your data. I will shortly add the ability to add stop words through a comma-separated list in the preferences.

brekhusr

When I open the text files (stopwords, stopwords_en, stopwords_pt, search_stopwords) that come up when I search my computer for files called stopwords.txt and select results from the Paper Machines data folder, I don't see "lines" that would allow me to add 1 stopword per line. I just see a sort of unbroken stream of stopwords that don't even have spaces between them.
stopwords. Should I just go to the end and start typing additional stopwords? If so, how will it know where I mean to delimit them? Thanks, and sorry to be ignorant!

Chris Johnson-Roberson
Owner

Ah, the line endings are in Unix format rather than Windows, so it shows up for you without line breaks. I've already implemented a preference pane that will allow additional entries, one per line, so you won't have to navigate to the file or anything. That will be released probably tonight, or as soon as I figure out a bug with geodict (it's about 90% there).

brekhusr

Terrific! Meanwhile, I'll try writing to the Austrian National Library, which maintains http://europeana-geo.isti.cnr.it/geoparser, in German, and ask them if/when they're planning to bring that back online.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.