Skip to content
This repository

Phrase net is full of stop words #31

brekhusr opened this Issue · 4 comments

2 participants

brekhusr Chris Johnson-Roberson

Phrase-Net-x-y jpg

These two phrase nets did not tell me very much about my there a way to avoid this kind of result when working with PDFs with a lot of embedded text/metadata?

Chris Johnson-Roberson

By adding your own stop words (1 per line) to the file "stopwords.txt" in the Paper Machines data folder, you should be able to get a clearer picture of your data. I will shortly add the ability to add stop words through a comma-separated list in the preferences.


When I open the text files (stopwords, stopwords_en, stopwords_pt, search_stopwords) that come up when I search my computer for files called stopwords.txt and select results from the Paper Machines data folder, I don't see "lines" that would allow me to add 1 stopword per line. I just see a sort of unbroken stream of stopwords that don't even have spaces between them.
stopwords. Should I just go to the end and start typing additional stopwords? If so, how will it know where I mean to delimit them? Thanks, and sorry to be ignorant!

Chris Johnson-Roberson

Ah, the line endings are in Unix format rather than Windows, so it shows up for you without line breaks. I've already implemented a preference pane that will allow additional entries, one per line, so you won't have to navigate to the file or anything. That will be released probably tonight, or as soon as I figure out a bug with geodict (it's about 90% there).


Terrific! Meanwhile, I'll try writing to the Austrian National Library, which maintains, in German, and ask them if/when they're planning to bring that back online.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.