Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Call me Ishmael. Dix is a utility for quantifying large amounts of plaintext data using a revolutionary metric: Moby-Dicks.
Python
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
bin
dix
.gitignore
LICENSE
README.md
setup.py

README.md

dix

Call me Ishmael. Dix is a utility for quantifying large amounts of plaintext data using a revolutionary metric: Moby-Dicks.

Motivation

Have you ever found yourself analyzing text data and thinking, "Wow, this data is BIG. This is some BIG DATA"?

Of course you have. And if you're like us, you're frustrated with the current tools and metrics at your disposal. How do you quantify how big your data is? Bits and bytes and word counts just don't cut it in the fast moving Data Age.

It's time for a new standard. One that's timeless, yet fully capable of expressing bigness. That's why we created dix.

About

dix is a command line utility that quantifies the size of plaintext data in relation to Herman Melville's classic novel Moby-Dick; or, The Whale, first published in 1851 and considered to be one of the Great American Novels.

Moby Dick is a sizeable book.

Installation

Prerequisites: Python 2.6+, wc (which is included on most *nix OSs)

Run sudo pip install dix to install dix from PyPI (dix needs sudo access to set permissions so you can run it from anywhere).

More installation options coming soon.

Examples

dix is run from the command line on a plaintext file, as follows.

$> dix text.txt

You can also pipe things into dix if desired:

$> echo “for there is no folly of the beast of the earth...” | dix

dix also supports a multitude of options. For example, if you feel bad about the size of your data, choose a smaller unit of comparison:

$> dix --tiny text.txt

You can see all the options and how to use them by calling dix -h.

Advanced usage

You can also redirect the output of dix. For example, pipe dix to cowsay for a more pleasing visual experience:

$> curl -s 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvsection=0&titles=Moby-Dick&rvprop=content&format=json' | python -m json.tool | grep "*" | dix | cowsay

 ____________________________________ 
/ 0.0022 Moby-Dicks                  \
|                                    |
\ You call that BIG data?! Please... /
 ------------------------------------ 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

Contribution

We welcome issues and pull requests if you find problems with dix or want to enhance it! You can also reach its creators at dix.heads@datascopeanaytics.com.

Something went wrong with that request. Please try again.