Extract the comments from DOC or DOCX documents.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.



Extract the comments/annotations from a Word DOC or DOCX document or a PDF file, and dump them to the console (for now).


Because I grade student papers by putting the grades in Word and PDF comments, and I wanted to be able to extract them from the command line, without running Word or Acrobat themselves, or a hack like AppleScript.


You need to have Apache Maven installed. On Mac OS X, this is just brew install maven, and on Ubuntu you're looking for sudo apt-get install maven. To compile the JAR file, run:

git co (this repository)
mvn install
java -jar target/get_comments-(VERSION)-jar-with-dependencies.jar

For everyday use, you might want to drop the JAR somewhere memorable and write a little shell script:

java -jar (PATH_TO)/get_comments.jar $?


Just call java -jar PATH_TO_JAR_FILE [OPTIONS] FILENAME, and the comments from the file will be printed to standard output. There are two command-line options:

  • --quiet or -q: Only print the comments themselves. By default, each comment will be prefixed by "Comment #N: "; setting this option disables that.
  • --limit N or -l N: Only print the first N comments from the document. By default, all comments in the document will be printed.

For example, to print only the value of the document's first comment, you can call java -jar (PATH_TO_JAR_FILE) --quiet --limit 1 (FILENAME).


Copyright (C) 2012 Charles Pence, and released under the MIT license.