This is a program in two parts. The clojure section reads json in stdin (see core.clj) and returns json characterizing parts of speech in that text using corenlp. This is used in an emacs extension (here on MELPA) which allows you to highlight text and have it colored and tagged according to its part of speech as shown in the image above.
Emacs and the java
command on your PATH. For development, requires lein to build clojure.
For users: just install the speech-tagger
package from MELPA. Downloads a pretty large jar file on first use, and also on updates, so don't be scared if it takes a while to download and looks like it's frozen the first time; it only happens once.
For devs: load
the el file, or add it to load-path
and (require 'speech-tagger)
. Run lein uberjar
to produce the standalone jar, then move that to emacs/speech-tagger.jar
, or alternatively just customize speech-tagger-jar-path
to point to the compiled jar. (I could make a build system that does this automatically, but eh.) Set speech-tagger-is-development
to non-nil to avoid re-downloading the jar file.
The interactive functions exported by this extension follow a common protocol: if a region is active, then modify the region; otherwise modify the entire buffer. If a prefix argument is provided, they read in a buffer to modify the entirety of. A given region will be expanded to whitespace boundaries (so if region is around the l
characters in he|ll|o
, the entirety of |hello|
will be selected).
(defun speech-tagger-tag-dwim (pfx)
Tag parts of speech in the appropriate region. "dwim" is an abbreviation for "do what I mean;" hopefully what I the developer mean is close enough to what you the user mean. Tagging, as shown in the image above, colors a part of speech and adds a tooltip to it so that if you mouse over or move point over the part of speech, you get a description of the part of speech and example of that part of speech.
(defun speech-tagger-clear-tags-dwim (pfx)
As above, but clears the region of all such tags.
(defcustom speech-tagger-jar-path
Path to folder containing jar file used to tag parts of speech. Normal users won't need to touch this since it's included in the MELPA package, but developers may find it useful.
(defun speech-tagger-clear-state ()
Useful in the case that something screws up and you wish to debug. Should revert all lisp code back to the same as when first loaded. This isn't autoloaded, so you'll have to explicitly (require 'speech-tagger)
or have already used one of the autoloaded functions.
(defun speech-tagger-force-refresh-jar ()
Re-downloads jar file and md5sum. Useful for debugging jar downloads.
Honestly? I was bored. I've taken enough language classes to be able to characterize parts of speech without a computer by my side. Maybe I'll think of a use for nlp integration in text editors at some point in the future.
A friend of mine said it might be cool if I was to tag parts of speech in a separate language, then offer a translation of that text. It might be pretty cool to offer that, in addition to a google translation of the text. Tagging parts of speech in all the languages corenlp supports might be useful for teaching? We'll see.