Utilities for working with Pail in Cascalog.
Add pail-cascalog
to your project's dependencies. If you're using Leiningen, your project.clj
should look something like this:
(defproject ...
:dependencies [[pail-cascalog VERSION]])
Where VERSION
is the latest version on Clojars.
In order to create a Cascalog tap from a PailStructure
, it is necessary to first create a
PailSpec
from the PailStructure
. clj-pail
provides a
method that does that for us. Once we have a PailSpec
, pail-cascalog
can be used to create a
tap:
(require '[clj-pail.core :as pail]
(require '[pail-cascalog.core :as pail-cascalog])
; can be any PailStructure
(def structure (com.backtype.hadoop.pail.DefaultPailStructure.))
(def tap (-> structure
(pail/spec)
(pail-cascalog/tap-options :field-name "object")
(pail-cascalog/tap "path/to/data")))
The tap can be customized by the options passed to tap-options
. In the presence of a
vertically-partitioned PailStructure
, a subset of data can be consumed by specifying paths with
the :attributes
option:
; read data only from "foo/bar" and "baz/qux" directories
(pail-cascalog/tap-options structure :attributes [["foo" "bar"] ["baz" "qux"]])
Existing pails can be opened as Cascalog taps using pail-cascalog.core/pail->tap
:
(require '[clj-pail.core :as pail])
(require '[pail-cascalog.core :as pail-cascalog])
; open an existing pail
(def pail (pail/pail "path/to/data"))
; convert it to a Cascalog tap
(def tap (pail-cascalog/pail->tap pail))
The pail->tap
function accepts the same options as tap-options
:
; customize the tap to have a custom field name ans read from two partitions: "foo" and "bar"
(def tap (pail-cascalog/pail->tap pail :field-name "object" :attributes [["foo"] ["bar"]]))
Copyright © 2013 David Cuddeback
Distributed under the MIT License.