OPDS feed retriever and rewriter
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
org.aulfa.opdsget.api
org.aulfa.opdsget.checkstyle
org.aulfa.opdsget.cmdline
org.aulfa.opdsget.tests
org.aulfa.opdsget.vanilla
.gitignore
.travis.yml
README-CHANGES.xml
README-LICENSE.txt
README.md
pom.xml

README.md

opdsget

An OPDS feed retrieval and rewriting tool.

Features

  • Efficient parallel downloading of large feeds
  • Transparent rewriting of feed URIs to make feeds readable offline
  • Byte-for-byte reproducible feed archives (including fixing of time-related fields from feeds)
  • Well designed modular API for use in Java 9 programs
  • Command line interface

Requirements

  • Java 9+

How To Build

$ mvn clean package

If the above fails, it's a bug. Report it!

Usage

Usage: opdsget [options]
  Options:
    --authentication
      The file containing authentication information
  * --feed
      The URI of the remote feed
    --log-level
      The logging level
      Default: info
      Possible Values: [error, info, debug, trace]
    --output-archive
      The zip archive that will be created for the feed
  * --output-directory
      The directory that will contain the downloaded feed objects
    --uri-rewrite-scheme
      The scheme that will be used for rewritten URIs
      Default: file

To download a feed http://example.com/feed.atom to directory /tmp/out, assuming that http://example.com/feed.atom requires no authentication, simply run the following:

$ java -jar org.aulfa.opdsget.cmdline-0.0.1-main.jar \
  --feed http://example.com/feed.atom \
  --output-directory /tmp/out

The opdsget package uses jcommander to parse command line arguments and therefore also supports placing command line options into a file that can be referenced with @:

$ cat arguments.txt
--feed
http://example.com/feed.atom
--output-directory
/tmp/out

$ java -jar org.aulfa.opdsget.cmdline-0.0.1-main.jar @arguments.txt

Archiving/Rewriting

The opdsget program is capable of producing a reproducible zip archive of any feed that it downloads. A reproducible zip is a zip archive with entries in alphabetical order, with any time-related fields in the entry set to fixed values. The opdsget API also sets frequently-changing time-related fields from feeds to fixed values in order to help ensure reproducible results. To produce a zip file /tmp/out.zip, use the --output-archive option:

$ java -jar org.aulfa.opdsget.cmdline-0.0.1-main.jar \
  --feed http://example.com/feed.atom \
  --output-directory /tmp/out \
  --output-archive /tmp/out.zip

Each downloaded object is stored in the output directory (and therefore, by extension, the resulting zip file) both by the SHA256 hash of the original URI and the type of the file. Links inside feeds are rewritten so that they point to files within the output directory (using relative paths). For example, a feed at http://www.example.com/feed.atom will be placed in the output directory out at out/feeds/DD1E9BA1ECF8D7B30994CB07D62320DE5F8912D8DF336B874489FD2D9985AEB2.atom. Any reference to http://www.example.com/feed.atom in subsequent feeds will be rewritten to file://feeds/DD1E9BA1ECF8D7B30994CB07D62320DE5F8912D8DF336B874489FD2D9985AEB2.atom by default. It's possible to specify a custom URI scheme that will be used for rewritten links. This is useful for applications that wish to embed OPDS feeds and need to use a custom URI scheme to refer to bundled content in a manner distinct from non-bundled remote content. The --uri-rewrite-scheme is used to specify the scheme:

$ java -jar org.aulfa.opdsget.cmdline-0.0.1-main.jar \
  --feed http://example.com/feed.atom \
  --output-directory /tmp/out \
  --uri-rewrite-scheme bundled-example

This will result in feeds containing links such as bundled-example://feeds/DD1E9BA1ECF8D7B30994CB07D62320DE5F8912D8DF336B874489FD2D9985AEB2.atom.

Authentication

The opdsget command line program supports a flexible pattern-based method to supply authentication data when downloading feeds. Many real-life OPDS feeds are spread across multiple domains and a single feed can require different types of authentication (or sometimes no authentication at all for specific links).

The opdsget command uses a simple line-based file format to specify patterns against which URIs are matched. Patterns are matched against URIs in the order that they are given in the file, stopping at the first pattern that matches. If no pattern matches the incoming URI, then no authentication data is assumed. Patterns use Java regular expression syntax and are matched against the entire URI including the scheme (http://, https://, etc).

An example authentication file:

# URIs ending with "download" refer to books and specifically require no authentication
http[s]?://www\.example\.com/media/.*/download  none

# Otherwise, feeds and images require auth, but no other domains do
http[s]?://www\.example\.com(/.*)?  basic:rblake:SizingFrightfulStiltRemovedMarsupialJukebox

The syntax of the file is given by the following EBNF:

pattern = ? any java.util.Pattern ?

line_terminator = ( U+000D U+000A | U+000A ) ;

line = pattern , authentication , line_terminator ;

authentication =
    'none'
  | 'basic' , ':' , user , ':' , password ;

file = { line } ;

The two currently supported authentication types are none (meaning no credentials are sent with any request), and basic (HTTP Basic authentication).

Additionally, lines containing only whitespace, or starting with # are ignored.

Assuming the above example authentication file in authentication.map, the feed can be fetched using authentication information with:

$ java -jar org.aulfa.opdsget.cmdline-0.0.1-main.jar \
  --feed http://example.com/feed.atom \
  --output-directory /tmp/out \
  --output-archive /tmp/out.zip \
  --authentication authentication.map