Permalink
Switch branches/tags
Nothing to show
Find file Copy path
7eb65c0 Mar 27, 2018
2 contributors

Users who have contributed to this file

@john5f35 @RuiChen08
330 lines (242 sloc) 12.7 KB
dacapo-9.12-bach-MR1 RELEASE NOTES 2018-01-13
This is a maintenance release of the second major release of the
DaCapo benchmark suite.
These notes are structured as follows:
1. Overview
2. Usage
3. Changes
4. Known problems and limitations
5. Contributions and Acknowledgements
1. Overview
-----------
The DaCapo benchmark suite is slated to be updated every few years.
The 9.12 release is the first major update of the suite, and is
strictly incompatible with previous releases: new benchmarks have been
added, old benchmarks have been removed, all other benchmarks have
been substantially updated and the inputs have changed for every
program. It is for this reason that in any published use of the suite,
the version of the suite must be explicitly stated.
The release sees the retirement of a number of single-threaded
benchmarks (antlr, bloat and chart), the replacement of hsqldb by
h2, the addition of six completely new benchmarks, and the upgrade
of all other benchmarks to reflect the current release state of the
applications from which the benchmarks were derived. These changes are
consistent with the original goals of the DaCapo project, which
include the desire for the suite to remain relevant and reflect the
current state of deployed Java applications.
2. Usage
--------
2.1 Downloading
o Download the binary jar and/or source zip from:
https://sourceforge.net/projects/dacapobench/files/
Note that as of 9.12-MR1, there are two binary releases; the
default, and one for Java 6 JVMs. The default is built on Java 8
and runs on Java 8 (with the exception of tomcat, see note
below). The Java 6 jar is built on Java 6 for Java 6.
o Access the source from git via
https://github.com/dacapobench/dacapobench
2.2 Running
o It is essential that you read and observe the usage guidelines
that appear in README.md
o Run a benchmark:
java -jar <dacapo-jar-name>.jar <benchmark>
o For usage information, run with no arguments.
2.3 Building
o You must have a working, recent version of ant installed. Change
to the benchmarks directory and then run:
ant -p
for instructions on how to build.
3. Changes
----------
3.0. Changes introduced by 9.13
Benchmark updates:
- avrora: 20091224 -> 20131011
- batik: 1.7 -> 1.9
- eclipse: 3.5.1 -> 4.7.2
- fop: 0.95 -> 2.2
- h2: 1.5 -> 1.8
- jython: 2.5.2 -> 2.7.1
- pmd: 4.2.5 -> 6.1.0
- tomcat: 6.0.2 -> 9.0.2
- daytrader: 2.4.1 -> 3.0-SNAPSHOT
- Apache Geronimo: 2.4.1 -> 3.0.1
- xalan: 2.7.1 -> 2.7.2
- lucene: 2.4.1 -> 7.1
Shipped tooling update:
- Maven: 2.1.0 -> 3.5.2
3.1. Changes introduced by 9.12-MR1 (since 9.12)
lusearch-fix is introduced as a new benchmark. The lusearch-fix
and lusearch benchmarks differ by a single line of code. This
is a bug fix to lucene, which dramatically changes the performance of
lusearch, reducing the amount of allocation greatly.
https://issues.apache.org/jira/browse/LUCENE-1800
https://dl.acm.org/citation.cfm?id=2048092
We encourage you to use lusearch-fix in place of lusearch. We
retain the unpatched lusearch in this release for historical consistency.
URLs used by the build system have been systematically updated
so that the source distribution works correctly.
Other issues in the source distribution have been fixed to ensure that
the suite builds with Java 8 VMs.
3.2. Benchmark additions since 2006-10-MR2
avrora: AVRORA is a set of simulation and analysis tools in a
framework for AVR micro-controllers. The benchmark
exhibits a great deal of fine-grained concurrency. The
benchmark is courtesy of Ben Titzer (Sun Microsystems)
and was developed at UCLA.
batik: Batik is an SVG toolkit produced by the Apache foundation.
The benchmark renders a number of svg files.
h2: h2 is an in-memory database benchmark, using the
h2 database produced by h2database.com, and executing an
implementation of the TPC-C workload produced by the
Apache foundation for its derby project. h2 replaces
derby, which in turn replaced hsqldb.
sunflow: Sunflow is a raytracing rendering system for photo-realistic
images.
tomcat: Tomcat uses the Apache Tomcat servelet container to run
some sample web applications.
tradebeans: Tradebeans runs the Apache daytrader workload "directly"
(via EJB) within a Geronimo application server. Daytrader
is derived from the IBM Trade6 benchmark.
tradesoap: Tradesoap is identical to the tradebeans workload, except
that client/server communications is via soap protocols
(and the workloads are reduced in size to compensate the
substantially higher overhead).
Note that tradebeans and tradesoap were intentionally
added as a pair to allow researchers to evaluate and
analyze the overheads and behavior of communicating
through a protocol such as SOAP. Tradesoap's "large"
configuration uses exactly the same workload as
tradebeans' "default" configuration, and tradesoap's
"huge" uses exactly the same workload as tradebeans'
"large", allowing researchers to directly compare the
two systems.
3.3. Benchmark deletions
antlr: Antlr is single threaded and highly repetitive. The
most recent version of jython uses antlr; so antlr
remains represented within the DaCapo suite.
bloat: Bloat is not as widely used as our other workloads
and the code exhibited some pathologies that were
arguably not representative or desirable in a suite that
was to be representative of modern Java applications.
chart: Chart was repetitive and used a framework that appears
not to be as widely used as most of the other DaCapo
benchmarks. The Batik workload has some similarities
with chart (both are render vector graphics), but is
part of a larger heavily used framework from Apache.
derby: Derby has been replaced by h2, which runs a much
richer workload and uses a more widely used and higher
performing database engine (derby was not in any
previous release, but had been slated for inclusion in
this release).
hsqldb: Hsqldb has been replaced by h2, which runs a much
richer workload and uses a more widely used and higher
performing database engine.
3.4. Benchmark updates
All other benchmarks have been updated to reflect the latest release
of the underlying application.
3.5. Other Notable Changes
The packaging of the DaCapo suite has been completely re-worked and
the source code is entirely re-organized.
We've changed the naming scheme for the releases. Rather than
"dacapo-YYYY-MM", we've moved to "dacapo-Y.M-TAG", where TAG is a
nickname for the release. Given the theme for this project, we're
using musical names, and since this release is our second, we've given
this one the nick-name "bach". The release can therefore be referred
to by its nickname, which rolls off the tounge a little more easily
than our old names. Of course we've borrowed this scheme from other
projects (such as Ubuntu) which follow a similar pattern.
The command-line arguments have be rationalized and now follow
posix conventions.
Threading has been rationalized. Benchmarks are now characterized
in terms of their external and internal concurrency. (For example
a benchmark such as eclipse is single-threaded externally, but
internally uses a thread pool). All benchmarks which are externally
multi-threaded now by default run a number of threads scaled to
match the available processors, and the number of externally defined
threads may also be configured via the "-t" and "-k" command line
options which specify, respectively the absolute number of external
threads and a multiplier against the number of available processors.
Some benchmarks are both internally and externally multithreaded,
such as tradebeans and tradesoap, where the number of client threads
may be specified externally, but the number of server threads is
determined within the server, and cannot be directly controlled by
the user.
We have introduced a "huge" size for a number of benchmarks, which
scales the workload to run for much longer and consume significant
memory. We have also retired "large" sizes for some benchmarks where
"large" was not distinctly different from "default". Thus there are
now four sizes: "small", "default", "large", and "huge", and "large"
and "huge" are only available for some benchmarks. If you attempt
to run a benchmark at an unsupported size you will get an error
message.
4. Known Issues
---------------
Please consult the bug tracker for a complete and up-to-date list of
known issues (https://github.com/dacapobench/dacapobench/issues).
DaCapo is an open source community project. We welcome all assistance
in addressing bugs and shortcomings in the suite.
A few notable unresolved high priority issues are listed here:
4.1 Socket use by tradebeans, tradesoap and tomcat
Each of these benchmarks use sockets to communicate between their
clients and server. We have observed that connections are used very
liberally (we have seen more than 64,000 connections in use when
running tradebeans in its "huge" configuration, according to netstat).
We believe that this phenomena can lead to spurious failures,
particularly on tradesoap, where the benchmark fails with an error
message that indicates a garbled bean (stock name seen when userid
expected). At the time of writing, we believe these issues are
platform-sensitive and are due to the underlying systems rather than
our particular use of them. As with all issues, we welcome feedback
and fixes from the community.
4.2 Tomcat
Tomcat remains less interesting than we would have liked. Performance
results show that tomcat currently has a remarkably flat warm-up curve
when compared to other benchmarks.
Furthermore, the version of tomcat (6.0.20) used by dacapo 9.12 does
not work with OpenJDK since 8u77.
https://bugs.openjdk.java.net/browse/JDK-8155588
4.3 Validation
Validation continues to use summarization via a checksum, so we are
unable to provide a diff between expected and actual output in the
case of failure. We hope to update this, and welcome community
contributions.
4.4 Support for whole-program static analysis
Despite significant help from the community, we have had to drop
support for whole-program static analysis that was available in the
last major release. The main reason for this is that the more
systematic and extensive use of reflection and the enormous internal
complexity of workloads such as tradebeans and tradesoap has made it
very difficult to produce a straightforward mechanism that would
facilitate such analyses. While we regret this omission, such an
addition should have no effect on the workloads themselves.
Therefore, if the community is able to contribute enhancements or
extensions to the suite that facilitate such static analysis, we
should be able to include such a contribution in a maintenance
release, rather than having to wait for the next major release of the
DaCapo benchmark suite.
5. Contributions and Acknowledgements
-------------------------------------
The generous financial support of Intel was crucial to the successful
completion of the 9.12 release, and the generous financial support
of Oracle led to the 9.12 MR1 release.
The production of the 9.12 release was jointly led by:
Steve Blackburn, Australian National University
Kathryn S McKinley, University of Texas at Austin
The 9.12 release of the DaCapo suite was developed primarily by:
Steve Blackburn, Australian National University
Daniel Frampton, Australian National University
Robin Garner, Australian National University
John Zigman, Australian National University
The 9.12 MR1 release was developed primarily by:
Rui Chen, Australian National University
John Zhang, Australian National University
We receieved considerable assistance from a number of people,
including:
Jon Bell
Eric Bodden, Technische Universität Darmstadt
Sam Guyer, Tufts
Chris Kulla
Nick Mitchell, IBM
Gary Sevitsky, IBM
Ben Titzer, Sun Microsystems
Many other people provided valuable feedback, bug fixes and advice.