Enlive is an extraction and transformation library for HTML and XML documents written in Clojure. It uses CSS-like selectors.
Usual Enlive applications include templating and screenscraping.
The Enlive approach to templating is functional and decouples design and presentation logic.
Each template or template part (snippet) is a plain function thus you can easily compose templates. There is a kind of inversion of control here. In most mainstream templating systems, templates drive the presentation logic. Here the presentation logic drives templates.
Templates are backed by source files which are plain HTML (no special tags or attributes, no code). This allows for easy round-tripping with designers or easy theming of your app.
Namespace declaration, import and dependencies:
(ns screenscraping (:use net.cgrand.enlive-html) (:import java.net.URL))
Retrieve the url of the latest Penny Arcade
(-> "http://www.penny-arcade.com/comic/" URL. html-resource (select [:.body :img]) first :attrs :src)
Google group or mail me if you can’t publicly discuss your issues.
If you use Leiningen, add
[enlive "1.0.0"] to your
project.clj dependencies. (This won’t work with Clojure 1.0.)
If you use Clojure 1.0 (or 1.1 without Leiningen), git clone this repository (or use github’s download feature) and add the
src directory and
lib/tagsoup-1.2.jar to your classpath. Enlive does not require to be compiled.
Selectors are at the core of Enlive, the file
syntax.html at the root of the repository is a comprehensive syntax reference that can also be browsed online.
Enlive selectors are simply CSS selectors written in Clojure. A selector is always surrounded by square brackets: CSS
div is written
span.bar a#foo becomes
A trickier to translate CSS selector is
a[href] which is
'[[:a (attr? :href)]] with Enlive. No it’s not a typo, there are two pairs of square brackets. The outer one is the mandatory one (see above paragraph) and the inner one denotes intersection (aka and).
At this point, you should understand that
[:.foo [:a (attr? :href)] :em] is CSS’s
.foo a[href] em.
xml-resource are helper functions to build a tree suitable for processing with Enlive. They take one only arg which can be: a string (denoting a resource on the classpath), a java.io.File, a java.io.Reader, a java.io.InputStream, a java.net.URI or a java.net.URL, a collection of nodes, a single nodes. (Nodes are maps.)
html-resource uses Tagsoup to parse the resource,
xml-resource uses the default SAX parser.
Note that both
xml-resource return their argument when it is already a collection of nodes.
snippet implicitly wrap their source argument in a
html-resource call. This means that if you specify your resources as strings they will be searched on the classpath.
No support for namespaces.