Skip to content
This repository


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Example code for "Web-Scale Computer Vision using MapReduce for Multimedia Data Mining"

branch: master

Fetching latest commit…


Cannot retrieve the latest commit at this time

Octocat-spinner-32 bgsub
Octocat-spinner-32 bof
Octocat-spinner-32 classtrain
Octocat-spinner-32 compose
Octocat-spinner-32 input
Octocat-spinner-32 kmeans
Octocat-spinner-32 normalize
Octocat-spinner-32 performance
Octocat-spinner-32 slidingwindow
Octocat-spinner-32 wordcount
Octocat-spinner-32 .gitignore
Octocat-spinner-32 COPYING
Octocat-spinner-32 MDMKDD10-bwhite.pdf
Octocat-spinner-32 README
Example code for "Web-scale computer vision using mapreduce for multimedia data mining"

Notes about these examples
- Goal: Ground the examples in the paper with actual implementations to improve understanding
- As similar in function to the paper as possible
- Variable names may differ if it improves readability (often the 'type' as it is more descriptive)
- These are not the implementations used for the performance tests as those mix C/Python, use more complex Python libraries, use the TypedBytes input format, etc.  The performance test code is in a folder called "performance" and is less documented than the others.  This shows how to integrate C code, etc.
- Portions of the algorithms may be 'mocked' out if their functionality is not of focus
- Independent to make them easier to understand at the expense of duplicative code
- A copy of the paper is provided in the project root
- Any external test data is in the "input" folder in the project root

- python (2.6.5)
- hadoopy (0.1)
- numpy (1.3.0)
- PIL (1.1.7)
- nose (0.11.1)

Additional Requirements (if you want to run Hadoop cluster examples)
- cxfreeze (4.0.1)
- hadoop (Cloudera CDH3 0.20.2+228)

Running Tests
At the project root you can run "nosetest" from the project root if you have it installed.  Otherwise, each test can be run individually from the project root.
For example (in BASH shell in project root)
$ python kmeans/
Ran 2 tests in 0.013s


Mapping from paper Algorithm #'s to code
	   	   	       	      	    	  Written    PEP8	 Tests		 Hadoop Example Run/Data
Algorithm 1:  wordcount/		  X	     X		 X
Algorithm 2:  normalize/		  X	     X		 X
Algorithm 3:  classtrain/		  X	     X		 X
Algorithm 4:  slidingwindow/	  X	     X	         X
Algorithm 5:  kmeans/			  X	     X		 X
Algorithm 6:  kmeans/		  X	     X		 X
Algorithm 7:  bof/		  X	     X		 X
Algorithm 8:  bof/			  X	     X		 X
Algorithm 9:  bgsub/			  X	     X		 X
Algorithm 10: compose/		  X	     X		 X

If you use this in your research, please cite as
B. White, T. Yeh, J. Lin, and L. Davis, "Web-scale computer vision using mapreduce for multimedia data mining," in MDMKDD, 2010.

author = {Brandyn White and Tom Yeh and Jimmy Lin and Larry Davis},
booktitle = {MDMKDD},
title = {Web-Scale Computer Vision using MapReduce for Multimedia Data Mining},
year = {2010}
Something went wrong with that request. Please try again.