Skip to content

Latest commit

 

History

History
75 lines (59 loc) · 2.71 KB

4-21_read-write-gzip.asciidoc

File metadata and controls

75 lines (59 loc) · 2.71 KB

Reading and Writing Compressed Files

by John Cromartie

Problem

You want to read or write a file compressed with gzip (i.e., a .gz file).

Solution

Wrap a normal input stream with java.util.zip.GZIPInputStream to get uncompressed data:

(with-open [in (java.util.zip.GZIPInputStream.
                (clojure.java.io/input-stream
                 "file.txt.gz"))]
  (slurp in))

Wrap a normal output stream with java.util.zip.GZIPOutputStream to compress data as it is written:

(with-open [w (-> "output.gz"
                  clojure.java.io/output-stream
                  java.util.zip.GZIPOutputStream.
                  clojure.java.io/writer)]
  (binding [*out* w]
    (println "This will be compressed on disk.")))

Discussion

gzip, based on the DEFLATE algorithm, is a common compression format on Unix-like systems and is used extensively for compression on the Web. It is a good choice for compressing text in particular and can result in huge reductions for source code, or Clojure or JSON data.

Many of Clojure’s I/O functions will accept any type of Java stream. The GZIPInputStream simply wraps any other input stream and attempts to decompress the original stream. The output variant behaves similarly.

By wrapping a normal input stream, as returned by clojure.java.io/input-stream, you can pass it to slurp or line-seq (or any other function that takes an input stream) and easily read the entire decompressed contents.

You can also leverage this technique to read a large compressed file line by line, or to read back Clojure forms written with pr or pr-str. You can also decompress data in a similar way from any other kind of stream; for example, one backed by a network socket or a byte array.

By binding an output stream to *out*, we can use println, pr, etc. to output small amounts of data at a time to the stream, which will be compressed on disk when the stream is closed.

A nearly identical approach can be used for writing data in the ZIP compression format, using the java.util.zip.ZipInputStream and java.util.zip.ZipOutputStream classes.

See Also