Skip to content
IMDB database example
Java C# Makefile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
cell
csharp
dataset
java
projects
stdlib
.gitignore
README.md
makefile

README.md

IMDB database benchmarks

Cell, Java and C# code for a program that manipulates a moderately sized dataset (about 140 MBs in CSV format). It is meant to compare the relative performance of the three languages for data intensive applications. For more information check this link.

Building

You'll need a recent version of the Java SKD in order to build the Cell and Java versions, and a recent version of the .NET Core SDK for the C# one. To build the Cell version, just type make imdb.jar; for the Java version, make imdb-java.jar; and for the C# one make imdb.dll.

Running the benchmarks

There are four sets of tests that are common to all languages. The syntax is the same for all of them. The first one loads the data from the set of CSV file into memory. Here's how you run the Cell version:

    $ java -jar imdb.jar -l 10 dataset/
     2072, 4642, 303, 844, 660, 10474
     1094, 2490, 259, 449, 377,  8405
      382, 2154,  51, 229, 263,  7810
      695, 1847, 104, 225, 226,  8451
      414, 2341,  50, 227, 242,  7846
      414, 2093,  51, 315, 181,  7853
      388, 2202,  51, 225, 245,  8356
      452, 1992,  50, 235, 307,  8390
      447, 1812, 136, 227, 180,  7830
      401, 2012,  50, 224, 275,  8036

This is java:

    $ java -jar imdb-java.jar -l 10 dataset/
      918, 5588,   73,  312,  240, 7044
      269, 4132,   62,  265,  195, 4302
      740, 2026,   60,  161,   78, 7357
      134, 2105,   26, 1372,   78, 4077
      136, 3100,   26,  143,   78, 4016
      134, 1735,  882,  140,   78, 8285
      133, 3136,   26,  143,   78, 4086
      132, 2850,   26,  147,   77, 4064
      577, 2288,   26,  170,  855, 6349
      133, 3894,   26,  185,   79, 4207

And this is C#:

    $ dotnet imdb.dll -l 10 dataset/
      456, 4028,   36,  216,  168,  9428
     1040, 6636,   36,  212,  168,  8728
      808, 4112,   36,  200,  120,  8736
     1168, 2884,   56,  220,  168,  8984
     1820, 3428,   40,  204,  152,  9977
     1164, 6964,  364,  208,  776,  7620
     1204, 3628,   52, 1064,  164,  8636
      292, 4032,  916,  200,  128, 11352
     1172, 5456,   40, 1916,  568, 11664
     1000, 6536,   36,  212, 1004,  7920

Each column in the output represents a specific subtests (see the link above for details) and each column a single test run.

The first argument, -l, select which of the four sets of tests to run. The second one is the number of times the test has to be repeated (The JVM needs some time to warm up, so the first few runs of the test are slower. The .NET runtimes on the other hand doesn't seem to need any warm-up). The last argument is the directory where the CSV files are located.

To see the other command line options, just invoke any version of the application without arguments:

    $ java -jar imdb-java.jar
    Usage: java -jar imdb-java.jar [-l|-u|-q|-uq] <repetitions> <input directory>
      -l   load dataset only
      -u   run updates
      -q   run queries
      -uq  run queries on updated dataset

Check the link at the top of the page for details.

Loading and storing the application state

There's also a fourth version of the application, that is meant to test the performance of the code generated by the Cell compiler when embedded into a Java application. To build it type make imdb-embedded.jar. It supports the same four basic sets of tests as the other versions, but it also has a few extra ones:

    $ java -jar imdb-embedded.jar 
    Usage: java -jar imdb-embedded.jar [-l|-u|-q|-uq] <repetitions> <input directory>
      -l   load dataset from csv files
      -u   run updates
      -q   run queries
      -uq  run queries on updated dataset
    
    or: java -jar imdb-embedded.jar [-w|-uw] <repetitions> <input directory> <output file>
      -w   load dataset and write state to specified output file
      -uw  load dataset and write updated state to specified output file
    
    or: java -jar imdb-embedded.jar [-r] <repetitions> <input file>
      -r   read a previously saved (with the -w or -uw options) state

This is how you can test how long it takes to save the entire dataset (in the standard Cell textual format) to a file:

    $ java -jar imdb-embedded.jar -w 1 dataset/ imdb.txt
     12095

Use the -uw option to save the dataset after running the standard set of updates:

    $ java -jar imdb-embedded.jar -uw 1 dataset/ imdb-small.txt
      5792

To test how long it takes to load a previously saved dataset, use the -r option:

    $ taskset -c 1 java -jar imdb-embedded.jar -r 1 imdb.txt
     25742

In all cases, you can run the tests more than once to see what performance is like once the JVM has warmed up:

    $ java -jar imdb-embedded.jar -uw 10 dataset/ imdb-small.txt
      5607
      5578
      2925
      2904
      3004
      2982
      2949
      2904
      3215
      3214
You can’t perform that action at this time.