Skip to content
Tanya Nevskaya edited this page Dec 14, 2020 · 44 revisions

What is Enlive?

Enlive is an extraction and transformation library for HTML and XML documents written in Clojure. It uses CSS-like selectors.

Usual Enlive applications include templating and screenscraping.

The Enlive approach to templating is functional and decouples design and presentation logic.
Each template or template part (snippet) is a plain function thus you can easily compose templates. There is a kind of inversion of control here. In most mainstream templating systems, templates drive the presentation logic. Here the presentation logic drives templates.
Templates are backed by source files which are plain HTML (no special tags or attributes, no code). This allows for easy round-tripping with designers or easy theming of your app.

Screenscraping examples

Namespace declaration, import and dependencies:

(ns screenscraping
  (:require [net.cgrand.enlive-html :refer :all])
  (:import java.net.URL))

Retrieve the url of the latest Penny Arcade

(-> "http://www.penny-arcade.com/comic/" URL. html-resource 
  (select [:body :img]) first :attrs :src) 

Need help?

Google group or mail me if you can’t publicly discuss your issues.

Install

If you use Leiningen, add [enlive "1.1.5"] to your project.clj dependencies. (This won’t work with Clojure 1.0.)

If you use Clojure 1.0 (or 1.1 without Leiningen), git clone this repository (or use github’s download feature) and add the src directory and lib/tagsoup-1.2.jar to your classpath. Enlive does not require to be compiled.

Selectors

Selectors are at the core of Enlive, the file syntax.html at the root of the repository is a comprehensive syntax reference.

Enlive selectors are simply CSS selectors written in Clojure. A selector is always surrounded by square brackets: CSS div is written [:div]. span.bar a#foo becomes [:span.bar :a#foo].

A trickier to translate CSS selector is a[href] which is '[[:a (attr? :href)]​] with Enlive. No it’s not a typo, there are two pairs of square brackets. The outer one is the mandatory one (see above paragraph) and the inner one denotes intersection (aka and).

At this point, you should understand that [:.foo [:a (attr? :href)] :em] is CSS’s .foo a[href] em.

Resources

html-resource and xml-resource are helper functions to build a tree suitable for processing with Enlive. They take one only arg which can be: a string (denoting a resource on the classpath), a java.io.File, a java.io.Reader, a java.io.InputStream, a java.net.URI or a java.net.URL, a collection of nodes, a single nodes. (Nodes are maps.)

html-resource uses Tagsoup to parse the resource, xml-resource uses the default SAX parser.

Note that both html-resource and xml-resource return their argument when it is already a collection of nodes.

deftemplate, defsnippet, template and snippet implicitly wrap their source argument in a html-resource call. This means that if you specify your resources as strings they will be searched on the classpath.

Tutorial

Getting started

XML limitations

No support for namespaces.