Skip to content

alexy/fundata1

Repository files navigation

fundata1 README

Karmic Social Capital Benchmark

We provide textual documents in the Markdown format. While being just plain .txt documents, they look best when rendered with markdown-aware tools -- e.g. on github this README.markdown can be seen at http://github.com/alexy/fundata1.

FunData is a functional data shootout. The current shootout, the very first one, is for processing real-world Twitter data.

The results are in!

fundata1-results.markdown

The server specs where the timings were obtained.

Some interesting lessons are being gathered in

fundata1-lessons.markdown

The data format, as distributed, are described in

fundata1-replier-graph-format.markdown

Getting the data is described in the aptly named

fundata1-getting-the-data.markdown

The question we're solving is computing Khrabrov and Cybenko's Karmic Social Capital (KSC) for all users communicating via Twitter as present in the data. The mathematical definition is in the file

khrabrov-mind-economy-eccs2010.pdf

A textual description of KSC is in

fundata1-khrabrov-karmic-social-capital.markdown

This git repository is in fact an umbrella for the three submodules comprising the currently available three reference functional representations of the KSC algorithm. Each of them is also hosted on github, here in the order of appearance in the target language:

Each of those languages' repos contains further notes on the choices and possible improvements available in their respective implemetations. Since JVM languages lack an obvious efficient general-purpose serialization, we relax the rules for them a bit.

The machine is a SunFire 4240 server with 64 GB of RAM and 8 CPUs.

The purpose of having separate repos by language is to facilitate forking and improvement of their implementations, potentially beating other languages. You're welcome to supply an implementation of the KSC conforming to the rules in other languages, not necessarily functional.

Some observation on these implementations are posted at functional.tv.

Join the Fundata Google Group to discuss the shootout and provide alternative implementations.

NOTE: I am submitting my Ph.D. in data mining to UPenn/Dartmouth and am looking for a cool job in the Valley/Seattle, hence my bandwidth in improving my own implementations will be somewhat limited for a while into 2011. If you have a self-contained implementation installable on CentOS 5 or Gentoo Prefix, or from source with clear steps, I'd be happy to run it, in the Wide Finder spirit. If you want to speed up the existing implementations, see the TODO.

About

Functional Data Shootout

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published