Cell, Java and C# code for a program that manipulates a moderately sized dataset (about 140 MBs in CSV format). It is meant to compare the relative performance of the three languages for data intensive applications. For more information check this link.
You'll need a recent version of the Java SKD in order to build the Cell and Java versions, and a recent version of the .NET Core SDK for the C# one. To build the Cell version, just type make imdb.jar
; for the Java version, make imdb-java.jar
; and for the C# one make imdb.dll
.
There are four sets of tests that are common to all languages. The syntax is the same for all of them. The first one loads the data from the set of CSV file into memory. Here's how you run the Cell version:
$ java -jar imdb.jar -l 10 dataset/
2072, 4642, 303, 844, 660, 10474
1094, 2490, 259, 449, 377, 8405
382, 2154, 51, 229, 263, 7810
695, 1847, 104, 225, 226, 8451
414, 2341, 50, 227, 242, 7846
414, 2093, 51, 315, 181, 7853
388, 2202, 51, 225, 245, 8356
452, 1992, 50, 235, 307, 8390
447, 1812, 136, 227, 180, 7830
401, 2012, 50, 224, 275, 8036
This is java:
$ java -jar imdb-java.jar -l 10 dataset/
918, 5588, 73, 312, 240, 7044
269, 4132, 62, 265, 195, 4302
740, 2026, 60, 161, 78, 7357
134, 2105, 26, 1372, 78, 4077
136, 3100, 26, 143, 78, 4016
134, 1735, 882, 140, 78, 8285
133, 3136, 26, 143, 78, 4086
132, 2850, 26, 147, 77, 4064
577, 2288, 26, 170, 855, 6349
133, 3894, 26, 185, 79, 4207
And this is C#:
$ dotnet imdb.dll -l 10 dataset/
456, 4028, 36, 216, 168, 9428
1040, 6636, 36, 212, 168, 8728
808, 4112, 36, 200, 120, 8736
1168, 2884, 56, 220, 168, 8984
1820, 3428, 40, 204, 152, 9977
1164, 6964, 364, 208, 776, 7620
1204, 3628, 52, 1064, 164, 8636
292, 4032, 916, 200, 128, 11352
1172, 5456, 40, 1916, 568, 11664
1000, 6536, 36, 212, 1004, 7920
Each column in the output represents a specific subtests (see the link above for details) and each column a single test run.
The first argument, -l
, select which of the four sets of tests to run. The second one is the number of times the test has to be repeated (The JVM needs some time to warm up, so the first few runs of the test are slower. The .NET runtimes on the other hand doesn't seem to need any warm-up). The last argument is the directory where the CSV files are located.
To see the other command line options, just invoke any version of the application without arguments:
$ java -jar imdb-java.jar
Usage: java -jar imdb-java.jar [-l|-u|-q|-uq] <repetitions> <input directory>
-l load dataset only
-u run updates
-q run queries
-uq run queries on updated dataset
Check the link at the top of the page for details.
There's also a fourth version of the application, that is meant to test the performance of the code generated by the Cell compiler when embedded into a Java application. To build it type make imdb-embedded.jar
. It supports the same four basic sets of tests as the other versions, but it also has a few extra ones:
$ java -jar imdb-embedded.jar
Usage: java -jar imdb-embedded.jar [-l|-u|-q|-uq] <repetitions> <input directory>
-l load dataset from csv files
-u run updates
-q run queries
-uq run queries on updated dataset
or: java -jar imdb-embedded.jar [-w|-uw] <repetitions> <input directory> <output file>
-w load dataset and write state to specified output file
-uw load dataset and write updated state to specified output file
or: java -jar imdb-embedded.jar [-r] <repetitions> <input file>
-r read a previously saved (with the -w or -uw options) state
This is how you can test how long it takes to save the entire dataset (in the standard Cell textual format) to a file:
$ java -jar imdb-embedded.jar -w 1 dataset/ imdb.txt
12095
Use the -uw
option to save the dataset after running the standard set of updates:
$ java -jar imdb-embedded.jar -uw 1 dataset/ imdb-small.txt
5792
To test how long it takes to load a previously saved dataset, use the -r
option:
$ taskset -c 1 java -jar imdb-embedded.jar -r 1 imdb.txt
25742
In all cases, you can run the tests more than once to see what performance is like once the JVM has warmed up:
$ java -jar imdb-embedded.jar -uw 10 dataset/ imdb-small.txt
5607
5578
2925
2904
3004
2982
2949
2904
3215
3214