add upstream ramspeed 3.5.0 source code

cruvolo · Jul 11, 2018 · 6af3330 · 6af3330
commit 6af3330
Show file tree

Hide file tree

Showing 37 changed files with 35,003 additions and 0 deletions.
diff --git a/HISTORY b/HISTORY
@@ -0,0 +1,74 @@
+v3.5.0
+10th of August, 2009
+- MMX and SSE memory arrays were forced to align on 4Kb page size boundary
+- some enhancements and optimisations (base source, i386 and amd64 assembly)
+
+v3.4.1
+1st of November, 2007
+- performance improvements for non-temporal MMXmem and SSEmem
+- several small bugs were eliminated
+
+v3.4.0
+1st of September, 2007
+- non-temporal MMX and SSE benchmarks were written (i386 and amd64 assembly) 
+
+v3.3.1
+19th of May, 2006
+- cosmetic changes
+
+v3.3.0
+26th of October, 2005
+- now and forth distributed under the terms of The Alasir Licence
+- new build system was introduced
+- INT*, FLOAT*, MMX* and SSE* benchmarks were written in amd64 assembly
+- i386 assembly benchmarks were tuned a little
+
+v2.3.1 and v3.2.1
+22nd of January, 2005
+- cosmetic changes
+
+v2.3.0 and v3.2.0
+12th of October, 2004
+- INT* and FLOAT* benchmarks were written in alpha assembly
+- most C and all i386 assembly sources were rewritten
+
+v2.2.0 and v3.1.0
+17th of September, 2004
+- SSEmark and SSEmem were written (i386 assembly)
+- minor changes in most benchmarking routines
+
+v2.1.0 and v3.0.0
+29th of August, 2004
+- MMXmark and MMXmem were written (i386 assembly)
+- main() was redesigned and advanced
+
+v2.0.1
+28th of May, 2004
+- a little update
+
+v2.0.0
+25th of March, 2004
+- everything was rewritten and optimised
+- benchmark routines were also coded in i386 assembly
+
+v1.12
+4th of March, 2004
+- unneeded const and volatile declarations were removed
+
+v1.11
+16th of February, 2004
+- ambiguous declarations in FLOATmark were fixed
+
+v1.10
+10th of February, 2004
+- main() was reshaped significantly
+- LongRun mode was implemented
+- general code clean-up
+
+v1.01
+16th of November, 2003
+- output was reformatted
+
+v1.00
+15th of July, 2003
+- initial public release
diff --git a/LICENCE b/LICENCE
@@ -0,0 +1,48 @@
+
+                              The Alasir Licence
+
+
+    This is a free software. It's provided as-is and carries absolutely no
+warranty or responsibility by the author and the contributors, neither in
+general nor in particular. No matter if this software is able or unable to
+cause any damage to your or third party's computer hardware, software, or any
+other asset available, neither the author nor a separate contributor may be
+found liable for any harm or its consequences resulting from either proper or
+improper use of the software, even if advised of the possibility of certain
+injury as such and so forth.
+
+    The software isn't a public domain, it's a copyrighted one. In no event
+shall the author's or a separate contributor's copyright be denied or violated
+otherwise. No copyright may be removed unless together with the code
+contributed to the software by a holder of the respective copyright. A
+copyright itself indicates the rights of ownership over the code contributed.
+Back and forth, the author is defined as the one who holds the oldest
+copyright over the software. Furthermore, the software is defined as either
+source or binary computer code, which is organised in the form of a single
+computer file usually.
+
+    The software (the whole or a part of it) is prohibited from being sold or
+leased in any form or manner with the only possible exceptions:
+
+a) money may be charged for a physical medium used to transfer the software;
+b) money may be charged for optional warranty or support services related to
+   the software.
+
+    Nevertheless, if the software (the whole or a part of it) is desired to
+become an object of sale or lease (the whole or a part of it), then a separate
+non-exclusive licence agreement must be negotiated from the author. Benefits
+accrued should be distributed between the contributors or likewise at the
+author's option.
+
+    Whenever and wherever the software is distributed, in either source or
+binary form, either in whole or in part, it must include the complete
+unchanged text of this licence agreement unless different conditions have been
+negotiated. In case of a binary-only distribution, the names of the copyright
+holders must be mentioned in the documentation supplied with the software.
+This is supposed to protect rights and freedom of those who have contributed
+their time and labour to free software development, because otherwise the
+development itself and this licence agreement are of a very little sense.
+
+    Nothing else but this licence agreement grants you rights to use, modify
+and distribute the software. Any violation of this licence agreement is
+recognised as an action prohibited by an applicable legislation.
diff --git a/README b/README
@@ -0,0 +1,268 @@
+
+RAMspeed/SMP, a cache and memory benchmarking tool
+
+(for multiprocessor machines running UNIX-like operating systems)
+
+v3.5.0
+
+August, 2009
+
+
+This command line utility measures effective bandwidth of both cache and memory
+subsystems. It has been written entirely in C for portability purposes, though
+benchmark routines are also available in several assembly languages for
+performance reasons. So far, it's known to compile and run on the following
+operating systems and hardware platforms with assembly-level optimisations:
+
+* Linux (i386, amd64, alpha)
+* FreeBSD (i386, amd64, alpha)
+* NetBSD (i386, amd64, alpha)
+* Digital UNIX (alpha)
+
+Digital UNIX is also known as Digital OSF/1 and Compaq (HP) Tru64 UNIX.
+
+RAMspeed/SMP v3.x.x is a multiprocessed application utilising System V shared
+memory for IPC (Inter-Process Communication). RAMspeed/SMP v2.x.x was a POSIX
+multithreaded application developed no longer because of compatibility and
+performance reasons.
+
+
+GENERAL INFORMATION
+
+The software consists of two major components:
+
+1) INTmark and FLOATmark, they measure the maximum possible cache and memory
+performance while reading and writing certain blocks of data (starting from 1Kb
+and further in power of 2) continuously through ALU and FPU respectively. All
+data streams are linear (sequential) to achieve the maximal performance. In
+other words, these benchmarks allow to determine real bandwidth of cache and
+memory subsystems regardless of what has been advertised by manufacturers.
+
+2) INTmem and FLOATmem, they are synthetic simulations, but tied closely with
+the real world of computing. Each consists of four subtests (Copy, Scale, Add,
+Triad) to measure different aspects of memory performance. It's important to
+realise that even if a particular hardware offers very good linear read\write
+results, it may (or may not) deliver much worse results while switching
+continuosly between read and write operations like real life software titles
+do. These benchmarks are highly sensitive to memory latencies of any kind.
+
+Copy is the simplest among them. It just transfers data from one memory
+location to another, i. e. copies it (A = B).
+
+Scale is a little more advanced. It modifies the data before writing by
+multiplying with a certain constant value, i. e. scales it (A = m*B).
+
+Add reads data from the first memory location, then reads from the second, adds
+them up and writes the result to the third place (A = B + C).
+
+Triad is a merge of Add and Scale. It reads data from the first memory
+location, scales it, then adds data from the second one and writes to the third
+place (A = m*B + C).
+
+There are also MMXmark with MMXmem and SSEmark with SSEmem serving the same
+purpose as explained above but utilising the MMX and SSE instruction sets and
+respective registers. In general, they're supposed to be better performers
+than INTmark\INTmem and FLOATmark\FLOATmem. Of course, they're available for
+i386 and amd64 only.
+
+Non-temporal versions of MMXmark\MMXmem and SSEmark\SSEmem are supported since
+v3.4.0 of this UNIX/SMP port. They minimise cache pollution on memory reads and
+eliminate it completely on writes. In addition, they operate with a built in
+aggressive data prefetching algorithm. As a result, they offer significant
+performance improvements over regular MMX and SSE benchmarks. In some cases,
+non-temporal MMXmark and SSEmark can deliver almost 100% of theoretical
+bandwidth while reading. However, these non-temporal MMX benchmarks require
+support for the Extended MMX instruction set (MMX+) which is available since
+Intel Pentium III and AMD K6-2+ processors.
+
+INTmark\INTmem transfer data in either doublewords (32 bits) or quadwords
+(64 bits) which is hardware platform dependent. FLOATmark\FLOATmem and
+MMXmark\MMXmem utilise quadwords, SSEmark and SSEmem -- octawords (128 bits).
+For data calculations, MMXmem benchmarks prefer packed words, SSEmem ones --
+packed doublewords. FLOATmark\FLOATmem require a real floating-point unit or
+mathprocessor installed, though some fast emulator might be an acceptable
+solution as well, but that's a whole different story. Other benchmarks
+utilise floating-point capabilities for result calculations only. SSEmark and
+SSEmem require SSE support by both a processor and an operating system.
+
+There is also the BatchRun mode (*mem benchmarks only) known formerly as the
+LongRun mode but renamed to avoid a possible confusion with the power saving
+technology of Transmeta. This mode designed for high precision benchmarking and
+hardware stressing. When in this mode, benchmarks are run a defined number of
+times with average results calculated and displayed.
+
+
+RUN-TIME OPTIONS
+
+USAGE: ramsmp -b ID [-g size] [-m size] [-l runs] [-p processes]
+-b  runs a specified benchmark (by an ID number):
+     1 -- INTmark [writing]          4 -- FLOATmark [writing]
+     2 -- INTmark [reading]          5 -- FLOATmark [reading]
+     3 -- INTmem                     6 -- FLOATmem
+-g  specifies a # of Gbytes per pass (default is 8)
+-m  specifies a # of Mbytes per array (default is 32)
+-l  enables the BatchRun mode (for *mem benchmarks only),
+    and specifies a # of runs (suggested is 5)
+-p  specifies a # of processes to fork (default is 2)
+-r  displays speeds in real megabytes per second (default: decimal)
+
+The following ID numbers appear if compiled with either the i386 or amd64
+assembly sources:
+
+     7 -- MMXmark [writing]         10 -- SSEmark [writing]
+     8 -- MMXmark [reading]         11 -- SSEmark [reading]
+     9 -- MMXmem                    12 -- SSEmem
+    13 -- MMXmark (nt) [writing]    16 -- SSEmark (nt) [writing]
+    14 -- MMXmark (nt) [reading]    17 -- SSEmark (nt) [reading]
+    15 -- MMXmem (nt)               18 -- SSEmem (nt)
+
+The -b option is required, others are recommended.
+
+See SOFTWARE PREFETCHING below for information on the -t switch.
+
+The -i switch has no benchmarking meaning. It activates built in CPUinfo
+library which collects and displays various information about your processor.
+This option is available on i386 only.
+
+Since the very beginning, RAMspeed has used to calculate and display speeds
+in so-called real megabytes per second which equal to 2^20 (1,048,576) bytes
+each. It was considered that memory performance has something to do with
+operating memory size which is measured in real megabytes as well as internal
+pass and array sizing. However, it seems to be common these days to advertise
+size of storage devices, bandwidth of networks and so on in so-called decimal
+megabytes which equal to 10^6 (1,000,000) bytes each. Most cache and memory
+benchmarks report their performance in decimal megabytes too. We feel sick of
+arguing, and that's why default behaviour has changed towards decimal data
+since v3.6.0 of this UNIX/SMP port. It is still possible to display output in
+real megabytes per second by using the -r switch. To avoid possible mistakes,
+real megabytes per second are still referred as Mb/s while decimal megabytes
+per second are displayed as MB/s.
+
+There are no built in logging capabilities, but you may redirect output to a
+file instead of stdout:
+
+./ramsmp [options] > yourcomp.log
+
+Default values of memory array size and pass size do well for a wide range of
+computer hardware, but you may need to decrease them if torturing something
+pretty old, and vice versa, to increase in case of some fast and furious
+equipment.
+
+Note that the *mark benchmarks require [by default] 32Mb of memory array space
+like mentioned above, but the *mem ones demand two to three times more. The
+same applies to pass size.
+
+Don't forget that every process coming up requires additional memory space. In
+other words, -m32 -p8 setting requires four times more operating memory than
+-m32 -p2 (a gigabyte at least). Number of processes spawned must be a power of
+2 and not to exceed 256.
+
+
+SOFTWARE PREFETCHING
+
+As it has been mentioned above, non-temporal versions of the MMX and SSE
+benchmarks benefit from use of software data prefetching. It needs to note that
+the MMX+ instruction set has introduced several instructions for this purpose:
+PREFETCHNTA (prefetch with minimal cache pollution), PREFETCHT0 (prefetch to
+all cache levels), and PREFETCHT1 with PREFETCHT2 which are of no use almost.
+In theory, there is no reason to use T0 prefetching for our benchmarking needs,
+but it has been observed that some memory controllers behave pretty poorly in
+Add and Triad subtests with NTA prefetching enabled. So, it has been decided to
+set up the default settings with NTA prefetching for Copy and Scale, while
+using T0 prefetching for Add and Triad. However, it has been made possible to
+override this decision with the -t switch and to use either NTA or T0 code for
+all four memory subtests:
+
+-t0 (NTA code for Copy and Scale, T0 code for Add and Triad)
+-t1 (NTA code for Copy, Scale, Add and Triad)
+-t2 (T0  code for Copy, Scale, Add and Triad)
+
+Note that this switch applies to MMXmem (nt) and SSEmem (nt) only on i386 and
+amd64. MMXmark (nt) and SSEmark (nt) ignore it and use NTA code always.
+
+
+COMPILATION
+
+The software is known to have no problems with the GNU C compiler (GCC) and the
+GNU assembler (GAS) as well as with the DEC C compiler & assembler. However,
+there should be no problems with other compilers and assemblers (of AT&T style,
+of course).
+
+A new build system has been introduced starting with v3.3.0. Now it isn't a
+Makefile but a shell script which is supposed to be more flexible. In most
+cases, it's just enough to run it and follow with the options suggested.
+Sometimes the script cannot guess your operating system and/or hardware
+platform, thus needs a hint passed through command line. For example, some
+Linux distributions don't define a hardware platform properly, so this issue
+should be worked around, say, this way:
+
+# ./build.sh Linux amd64
+
+There should be no problem of adding support for new operating systems and
+hardware platforms in the future. Your feedback is welcome.
+
+If the script fails to detect your environment, it falls back to generic
+settings which imply the C source code only.
+
+
+RESULTS AND COMPARISONS
+
+Results shown are real and may be compared with those obtained from other
+benchmarking titles indeed. There are many of them, and they measure cache and
+memory performance in different ways using different algorithms. The oldest and
+most notable among them is open source STREAM by John D. McCalpin, though there
+are several well known software suites with memory benchmarking capabilities.
+To name a few, SiSoft Sandra by Catalin-Adrian Silasi, EVEREST by Lavalys Inc.
+and ScienceMark by Alexander Goodrich, Tim Wilkens and Sean Stanek. Although
+all three are some STREAM derivatives in means of memory benchmarking.
+
+STREAM itself is a very good benchmark. It has been used as a reference for
+INTmem and FLOATmem back in the past. Although everything has been coded from a
+scratch, the idea remains the same. Nevertheless, STREAM has been written in C
+only. It utilises a low pass size, displays the highest results only, operates
+through FPU only, doesn't accept command line parametres and much less accurate
+overall.
+
+
+ISSUES
+
+Some compilers may optimise the code in such ways that the benchmarks are no
+longer what they are meant to be. For example, GCC 3.x.x optimises the
+floating-point benchmarks by substituting some of their code with the integer
+one. It seems there is no way to work around this issue but to use the assembly
+code.
+
+Sometimes on i386-compatible CPUs write performance of FLOATmark may be better
+than read. That's not a bug but an issue specific to how i387-compatible FPUs
+work, i.e. data store requires one instruction, when data load requires one
+instruction for actually loading, and one instruction to flush a register.
+
+Some CISC processors (Intel 386 to Pentium, AMD 386 to 5x86, Cyrix 486) deliver
+strange very much write performance of *mark benchmarks: it's constant all the
+way with no respect to any cache levels and their write policies. These
+processors don't seem to support write allocation or whatever else forces them
+to perform these direct memory writes.
+
+Not really an issue, but results shown may and will differ when received under
+different operating systems, sometimes significantly.
+
+
+UNIX SPECIFIC NOTES
+
+RAMspeed runs well from any system\serial console, though any virtual terminal
+should be all right as well.
+
+It's suggested strongly to reduce background activity before running. Power
+management (APM or ACPI) may produce undesirable effects too.
+
+
+FINAL NOTES
+
+The latest version can always be downloaded from
+
+  http://www.alasir.com/software/ramspeed
+
+Relax & enjoy!
+
+
+PVB