Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Simplified XPath Library for Clojure
Clojure
branch: master

This branch is 82 commits behind kyleburton:master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
src/clj_xpath
test/clj_xpath/test
.gitignore
README.textile
build-all.sh
project.clj

README.textile

clj-xpath

Description

Simplified XPath Library for Clojure. XML Parsers and an XPath implementation now comes with Java 6, though using the api directly can be verbose and confusing. This library provides a thin layer around basic parsing and XPath interaction for common use cases. I have personally found the ability to interactively tweak my xpath expressions to be a great productivity boost – even using this library only for that has helped me in my learning of and using xpath. I hope you find it useful and would love to hear your feedback and suggestions.

Trying It Out

The build (see ‘Building’ for instructions on building this software) creates a REPL script (bin/repl) that you can use to quickly try the library out:


kyle@indigo64 ~/personal/projects/clj-xpath[master]$ bin/repl
Clojure 1.0.0
user=> (use 'org.clojars.kyleburton.clj-xpath)
nil
user=> (use 'clojure.contrib.duck-streams)
nil
user=> (def xdoc (slurp* "http://github.com/kyleburton/clj-xpath/raw/master/pom.xml"))
#'user/xdoc
user=> ($x:tag "/*" xdoc)
:project
user=> ($x:text* "/project/developers/developer/name" xdoc)
("Kyle Burton")
user=> (doseq [node ($x "/project/dependencies/dependency" xdoc)]
  (prn (format "%s %s %s"
               ($x:text "./groupId"    node)
               ($x:text "./artifactId" node)
               ($x:text "./version"    node))))
"org.clojure clojure 1.0.0"
"org.clojure clojure-contrib 1.0.0"
"commons-math commons-math 1.2"
"log4j log4j 1.2.14"
"commons-logging commons-logging 1.1.1"
"junit junit 3.8.1"
nil
user=> (System/exit 0)
kyle@indigo64 ~/personal/projects/clj-xpath[master]$

Usage

The main functions in the library are $x and those named with a prefix of $x: (eg: $x:text). The rationale for choosing $x as a name was based on the FireBug xpath function and it being a short and uncommon name. These xpath functions all take the xpath expression to be executed and an XML document. They attempt to be flexible with respect to the form of the XML document may represent. If it is a string it is treated as XML, if a byte array it is used directly, if already a Document or Node (from org.w3c.dom) they are used as-is.

There are two forms of many of the functions, with one set ending with a star ‘*’. The non-suffixed forms will perform their operation on a single result node, raising an exception if there isn’t exactly one result from applying the xpath expression to the XML document. The ‘*’ suffixed forms return all of the matched nodes for further processing.

$x function returns a map containing the XML tag (as a symbol), dom Node, the text (as a string), and a map of the attributes where the keys have been converted into symbols and the values remain Strings.


(ns example
  (use [org.clojars.kyleburton.clj-xpath :only [$x $x:tag $x:text $x:attrs $x:attrs* $x:node]]))


(def *some-xml*
     "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<books>
  <book title=\"Some Guide To XML\">
    <author>
      <name>P.T. Xarnum</name>
      <email>pt@x.m.l</email>
    </author>
    <description>
      Simply the most comprehensive XML Book on the market today.
    </description>
  </book>
  <book title=\"Some Guide To Functional Programming\">
    <author>
      <name>S. Hawking</name>
      <email>universe@cambridge.ed.u</email>
    </author>
    <description>
      This book is too smart for you, try 'Head first Quantum Mechanics for Dummies' instead.
    </description>
  </book>
</books>")


;; get the top level tag:
(prn ($x:tag "/*" *some-xml*))
;; :books

;; find all :book nodes, pull the title from the attributes:
(prn (map #(-> % :attrs :title) ($x "//book" *some-xml*)))
;; ("Some Guide To XML" "Some Guide To Functional Programming")

;; same result using the $x:attrs* function:
(prn ($x:attrs* "//book" *some-xml* :title))
;; ("Some Guide To XML" "Some Guide To Functional Programming")

;; first select the :book element who's title has 'XML' in it
;; from that node, get and print the author's name (text content):
(prn ($x:text "./author/name"
              ($x:node "//book[contains(@title,'XML')]" *some-xml*)))
;; "P.T. Xarnum"

API Reference

($x xpexpr doc) => (map [...])

This is the main xpath application function. It takes an xpath expression (as a string) and a document and returns a sequence of matched elements as maps. The map contains the node’s tag, attributes, text content and the dom Node itself.


  { :tag    :tag-name
    :attrs  { :a "map" :of "the attributes" }
    :text   "the body content of the node and it's children"
    :node   org.w3c.dom.Node }

($x:tag* xpexpr doc) => [:tag-name ...]
($x:text* xpexpr doc) => ["the content" ..]
($x:attrs* xpexpr doc) => [{:the "attrs" ...} ...]
($x:node* xpexpr doc) => [org.w3c.dom.Node ...]

These functions apply the xpath expression to the given document and return a sequence of the requested property.

($x:tag? xpexpr doc) => [:tag-name ...]
($x:text? xpexpr doc) => ["the content" ..]
($x:attrs? xpexpr doc) => [{:the "attrs" ...} ...]
($x:node? xpexpr doc) => [org.w3c.dom.Node ...]

These functions apply the xpath expression to the given document and return either nil or the single result found. Unlinke the singleton forms, these do not throw and exception (just returning nil).

($x:tag+ xpexpr doc) => [:tag-name ...]
($x:text+ xpexpr doc) => ["the content" ..]
($x:attrs+ xpexpr doc) => [{:the "attrs" ...} ...]
($x:node+ xpexpr doc) => [org.w3c.dom.Node ...]

These functions apply the xpath expression to the given document and return multiple results – throwing an exception if the xpath expression matches no elements.

($x:tag xpexpr doc) => :tag-name
($x:text xpexpr doc) => "the content"
($x:attrs xpexpr doc) => {:the "attrs" ...}
($x:node xpexpr doc) => org.w3c.dom.Node

These functions return the requested property from the single result of executing the xpath expression. If the xpath expression identifies less than 1 result or more than 1 result in the given document, an exception is raised.

(xml->doc doc) => Document

This function takes xml that is of one of the following types and returns a Document: String, byte array or org.w3c.dom.Document. In cases of repeated usage of the document (eg: executing multiple xpath expressions against the same document) this will improve performance.

($x:compile xpexpr) => javax.xml.xpath.XPathExpression

Pre-compiles the xpath expression. In cases of repeated execution of the xpath expression this will improve performance.

Parsing and XPath Compilation

The $x and related functions support Strings, and in many cases, other convenient types for these arguments. In all cases where it expects an XML Document it can be given a String, a byte array or a Document. Where an xpath expression is expected it will take either a String or a pre-compiled XPathExpression. The act of parsing an XML document or compiling an xpath expression is an expensive activity. With this flexibility, clj-xpath supports the convenience of in-line usage (with String data), as well as pre-parsed and pre-compiled instances for better performance.


  (let [expr (xp:compile "/*")
        doc  (xml->doc "<authors><author><name>P.T. Xarnum</name></author></authors>")]
    ($x:tag expr doc))

Validation

Validation now off by default. Validation is controlled by optional parameters passed to xml-bytes->dom, or by overriding the atom *validation* to false:


  (ns your.namespace
    (:use org.clojars.kyleburton.clj-xpath))

  (binding [*validation* false]
    ($x:text "/this" "<this>foo</this>"))

Building

clj-xpath is built with maven at this time. It requires both clojure (1.0) and clojure-contrib (1.0) to be installed in an accessible maven repository. The software includes a script to install them into your local (~/.m2) repository for you


  kyle@indigo64 ~/personal/projects/clj-xpath[master]$ bash bin/maven-bootstrap.sh
  kyle@indigo64 ~/personal/projects/clj-xpath[master]$ mvn install

After running mvn install, clj-xpath-1.0.jar will be available for you to utilize in the target directory. To build a stand-alone jar file for clj-xpath which will contain its own code as well as clojure and clojure-contrib, run the assembly plugin:


  kyle@indigo64 ~/personal/projects/clj-xpath[master]$ mvn assembly:assembly

This will create the clj-xpath-1.0-jar-with-dependencies.jar jar in the target directory, which will contain all of the required dependencies for using the clj-xpath library (it will include clojure and clojure-contrib).

Maven Coordinates For clj-xpath


    <dependency>
      <groupId>org.clojars.kyleburton</groupId>
      <artifactId>clj-xpath</artifactId>
      <version>1.0.9</version>
    </dependency>

Leiningen Dependency

  [org.clojars.kyleburton/clj-xpath "1.0.9"]

Emacs and Slime

The build creates an Emacs-lisp file in the bin directory which you can pull into your running Emacs to add it to the list of lisp implementations for SLIME.


(add-to-list 'slime-lisp-implementations
             '(clj-xpath ("/Users/kburton/personal/projects/clj-xpath/target/../bin/repl")
                        :init swank-clojure-init
                        :init-function krb-swank-clojure-init) t)

Development Roadmap

  • create well named Java accessible static function names for all of the main functions that contain Java-incompatible names (eg, ‘$’, ‘+’, ‘*’ and so on are not valid in Java method names, create meaningful names that can be used from Java code).
  • provide type hints / decorations on functions to help the compiler and to help document expected argument types.

Releases

  • 1.0.4 – Implemented 1/more operators via ‘+’
  • 1.0.3 – implemented 0/1 operators, matches
  • 1.0.2 – UTF-8 as default encoding (implemented by Paul Santa Clara)
  • 1.0.1 – support for more variety of document formats, including pre-compilation of the document (performance optimization by supporting a cached parsed document), usage guide
  • 1.0.0 – initial functional library and release

Authors

  • Kyle Burton <kyle.burton@gmail.com>
  • Trotter Cashion <cashion@gmail.com>
Something went wrong with that request. Please try again.