Build Status

Utilities for working with Pail in Cascalog.


Add pail-cascalog to your project's dependencies. If you're using Leiningen, your project.clj should look something like this:

(defproject ...
  :dependencies [[pail-cascalog VERSION]])

Where VERSION is the latest version on Clojars.

Creating a Tap from a PailStructure

In order to create a Cascalog tap from a PailStructure, it is necessary to first create a PailSpec from the PailStructure. clj-pail provides a method that does that for us. Once we have a PailSpec, pail-cascalog can be used to create a tap:

(require '[clj-pail.core :as pail]
(require '[pail-cascalog.core :as pail-cascalog])

; can be any PailStructure
(def structure (com.backtype.hadoop.pail.DefaultPailStructure.))

(def tap (-> structure
           (pail-cascalog/tap-options :field-name "object")
           (pail-cascalog/tap "path/to/data")))

The tap can be customized by the options passed to tap-options. In the presence of a vertically-partitioned PailStructure, a subset of data can be consumed by specifying paths with the :attributes option:

; read data only from "foo/bar" and "baz/qux" directories
(pail-cascalog/tap-options structure :attributes [["foo" "bar"] ["baz" "qux"]])

Creating a Tap from an Existing Pail

Existing pails can be opened as Cascalog taps using pail-cascalog.core/pail->tap:

(require '[clj-pail.core :as pail])
(require '[pail-cascalog.core :as pail-cascalog])

; open an existing pail
(def pail (pail/pail "path/to/data"))

; convert it to a Cascalog tap
(def tap (pail-cascalog/pail->tap pail))

The pail->tap function accepts the same options as tap-options:

; customize the tap to have a custom field name ans read from two partitions: "foo" and "bar"
(def tap (pail-cascalog/pail->tap pail :field-name "object" :attributes [["foo"] ["bar"]]))


Copyright © 2013 David Cuddeback

Distributed under the MIT License.