by John Cromartie
Wrap a normal input stream with java.util.zip.GZIPInputStream to get uncompressed data:
(with-open [in (java.util.zip.GZIPInputStream.
(clojure.java.io/input-stream
"file.txt.gz"))]
(slurp in))
Wrap a normal output stream with java.util.zip.GZIPOutputStream to compress data as it is written:
(with-open [w (-> "output.gz"
clojure.java.io/output-stream
java.util.zip.GZIPOutputStream.
clojure.java.io/writer)]
(binding [*out* w]
(println "This will be compressed on disk.")))
gzip, based on the DEFLATE algorithm, is a common compression format on Unix-like systems and is used extensively for compression on the Web. It is a good choice for compressing text in particular and can result in huge reductions for source code, or Clojure or JSON data.
Many of Clojure’s I/O functions will accept any type of Java stream. The GZIPInputStream simply wraps any other input stream and attempts to decompress the original stream. The output variant behaves similarly.
By wrapping a normal input stream, as returned by clojure.java.io/input-stream, you can pass it to slurp or line-seq (or any other function that takes an input stream) and easily read the entire decompressed contents.
You can also leverage this technique to read a large compressed file line by line, or to read back Clojure forms written with pr or pr-str. You can also decompress data in a similar way from any other kind of stream; for example, one backed by a network socket or a byte array.
By binding an output stream to *out*, we can use println, pr, etc. to output small amounts of data at a time to the stream, which will be compressed on disk when the stream is closed.
A nearly identical approach can be used for writing data in the ZIP compression format, using the java.util.zip.ZipInputStream and java.util.zip.ZipOutputStream classes.
-
[sec_local_io_clojure_data_to_disk], for information on reading Clojure data from files on disk
-
The GZIPInputStream API documentation