#EMD Earth mover's distance on Nvidia GPUs
##CUDA CUDA and relevant NVIDIA drivers have to be installed from their site. This should be done first.
The NVIDIA website https://developer.nvidia.com/cuda-downloads has everything you need.
##Building To build, simply run
./build
in the directory EMD/
.
If you are on a linux machine, this will simply copy some binaries instead of building from source (which takes quite some time due to dependencies)
EMD require devIL
(an image library) and imagemagick
to run. If ./build
fails to install these for you, you will have to install them manually.
If './build' fails completely, jump to the Manual Build section.
After this, the haskemd module can be used from any directory. Test it with:
python3 test.py
##Python The provided python wrapper takes two (one-dimensional) numpy arrays and computes the EMD:
python3
>>>import haskemd
>>>arrs = haskemd.equalize(haskemd.sinksrand(1023),haskemd.sourcesrand(1023))
>>>haskemd.emd(a[0],a[1])
232386
sinksrand(1024)
and sourcesrand(1024)
just generates test data with 1024 bins, while the equalize
function makes the distributions have equal mass each.
To try your own, consider the following example:
>>>import haskemd
>>>import numpy as np
>>>a1 = np.array([1,2,3,4])
>>>a2 = np.array([4,3,2,1])
>>>haskemd.emd(a1, a2)
6
You might be prompted for the sudo password while computing the EMD. This is because of a bug upstream in the code for Haskell Accelerate.
##Ground distance In our case, the "ground distance" is the hamming distance. Moreover, the matrix for the metric is automatically generated. This means that an array like
np.array([0.25,0.5])
must have the correct values in the correct places. dij is based on the indices. Here, d01 would mean the distance between the bin containing 0.25 and the bin containing 0.5. In general, we have dij=hammingDistance i j
##Hamming distance To get the hamming distance from a integer indices, say 3 and 7, we first convert to binary to get 011 and 111. Then we can compute the hamming distance to get 1 (in this case). Hence d37=1
##Stored matrices
After the matrix for a metric is computed, it is stored in the data/
directory. This means that if you compute an EMD between distributions with 8192 bins, every subsequent calculation between distributions with 8192 bins will be faster. As the number of bins gets larger, it becomes necessary to store these matrices in order to guarantee quick runtime.
##Computers without an NVIDIA GPU
If you want to run the code on your CPU, you can can edit the code in src/Edmonds.hs
Open it up and the first lines should look like this:
{-# LANGUAGE FlexibleContexts #-}
module Edmonds
(
exec) where
import Data.Array.Accelerate as A
import Data.Array.Accelerate.IO
import Data.Array.Accelerate.CUDA
--import Data.Array.Accelerate.Interpreter
import System.Environment
import System.IO.Unsafe (unsafePerformIO)
Comment out the line with Data.Array.Accelerate.CUDA
and uncomment the line with Data.Array.Accelerate.Interpreter
Your file should now look like this:
{-# LANGUAGE FlexibleContexts #-}
module Edmonds
(
exec) where
import Data.Array.Accelerate as A
import Data.Array.Accelerate.IO
--import Data.Array.Accelerate.CUDA
import Data.Array.Accelerate.Interpreter
import System.Environment
import System.IO.Unsafe (unsafePerformIO)
Save the file and run stack install
to build and install it. It will be reasonably fast for up to 1000 bins, and after that it will be noticeably slower than the GPU
##Manual Build There are three parts to building manually: downloading dependencies, building the Haskell code, and installing the python module.
###Dependencies The only outside dependencies are: devIL (an image processing library) and imagemagick. The website for devIL is http://openil.sourceforge.net/ and the website for imagemagick is http://www.imagemagick.org/script/binary-releases.php
###Get Stack
Stack is the tool that buils haskell. To install stack
simply type
wget -qO- https://get.haskellstack.org/ | sh
or go to haskellstack.org for help if that fails.
###Building the Haskell To build the Haskell code, type
stack setup && stack install
in the directory EMD/
. This may take a long time.
###Installing the python module
To install the python module, simply cd
into haskemd/
and then type
sudo python3 setup.py install
At this point, everything should be working, so run python3 test.py
to make sure this is the case.