-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add upstream ramspeed 3.5.0 source code
- Loading branch information
0 parents
commit 6af3330
Showing
37 changed files
with
35,003 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
v3.5.0 | ||
10th of August, 2009 | ||
- MMX and SSE memory arrays were forced to align on 4Kb page size boundary | ||
- some enhancements and optimisations (base source, i386 and amd64 assembly) | ||
|
||
v3.4.1 | ||
1st of November, 2007 | ||
- performance improvements for non-temporal MMXmem and SSEmem | ||
- several small bugs were eliminated | ||
|
||
v3.4.0 | ||
1st of September, 2007 | ||
- non-temporal MMX and SSE benchmarks were written (i386 and amd64 assembly) | ||
|
||
v3.3.1 | ||
19th of May, 2006 | ||
- cosmetic changes | ||
|
||
v3.3.0 | ||
26th of October, 2005 | ||
- now and forth distributed under the terms of The Alasir Licence | ||
- new build system was introduced | ||
- INT*, FLOAT*, MMX* and SSE* benchmarks were written in amd64 assembly | ||
- i386 assembly benchmarks were tuned a little | ||
|
||
v2.3.1 and v3.2.1 | ||
22nd of January, 2005 | ||
- cosmetic changes | ||
|
||
v2.3.0 and v3.2.0 | ||
12th of October, 2004 | ||
- INT* and FLOAT* benchmarks were written in alpha assembly | ||
- most C and all i386 assembly sources were rewritten | ||
|
||
v2.2.0 and v3.1.0 | ||
17th of September, 2004 | ||
- SSEmark and SSEmem were written (i386 assembly) | ||
- minor changes in most benchmarking routines | ||
|
||
v2.1.0 and v3.0.0 | ||
29th of August, 2004 | ||
- MMXmark and MMXmem were written (i386 assembly) | ||
- main() was redesigned and advanced | ||
|
||
v2.0.1 | ||
28th of May, 2004 | ||
- a little update | ||
|
||
v2.0.0 | ||
25th of March, 2004 | ||
- everything was rewritten and optimised | ||
- benchmark routines were also coded in i386 assembly | ||
|
||
v1.12 | ||
4th of March, 2004 | ||
- unneeded const and volatile declarations were removed | ||
|
||
v1.11 | ||
16th of February, 2004 | ||
- ambiguous declarations in FLOATmark were fixed | ||
|
||
v1.10 | ||
10th of February, 2004 | ||
- main() was reshaped significantly | ||
- LongRun mode was implemented | ||
- general code clean-up | ||
|
||
v1.01 | ||
16th of November, 2003 | ||
- output was reformatted | ||
|
||
v1.00 | ||
15th of July, 2003 | ||
- initial public release |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
|
||
The Alasir Licence | ||
|
||
|
||
This is a free software. It's provided as-is and carries absolutely no | ||
warranty or responsibility by the author and the contributors, neither in | ||
general nor in particular. No matter if this software is able or unable to | ||
cause any damage to your or third party's computer hardware, software, or any | ||
other asset available, neither the author nor a separate contributor may be | ||
found liable for any harm or its consequences resulting from either proper or | ||
improper use of the software, even if advised of the possibility of certain | ||
injury as such and so forth. | ||
|
||
The software isn't a public domain, it's a copyrighted one. In no event | ||
shall the author's or a separate contributor's copyright be denied or violated | ||
otherwise. No copyright may be removed unless together with the code | ||
contributed to the software by a holder of the respective copyright. A | ||
copyright itself indicates the rights of ownership over the code contributed. | ||
Back and forth, the author is defined as the one who holds the oldest | ||
copyright over the software. Furthermore, the software is defined as either | ||
source or binary computer code, which is organised in the form of a single | ||
computer file usually. | ||
|
||
The software (the whole or a part of it) is prohibited from being sold or | ||
leased in any form or manner with the only possible exceptions: | ||
|
||
a) money may be charged for a physical medium used to transfer the software; | ||
b) money may be charged for optional warranty or support services related to | ||
the software. | ||
|
||
Nevertheless, if the software (the whole or a part of it) is desired to | ||
become an object of sale or lease (the whole or a part of it), then a separate | ||
non-exclusive licence agreement must be negotiated from the author. Benefits | ||
accrued should be distributed between the contributors or likewise at the | ||
author's option. | ||
|
||
Whenever and wherever the software is distributed, in either source or | ||
binary form, either in whole or in part, it must include the complete | ||
unchanged text of this licence agreement unless different conditions have been | ||
negotiated. In case of a binary-only distribution, the names of the copyright | ||
holders must be mentioned in the documentation supplied with the software. | ||
This is supposed to protect rights and freedom of those who have contributed | ||
their time and labour to free software development, because otherwise the | ||
development itself and this licence agreement are of a very little sense. | ||
|
||
Nothing else but this licence agreement grants you rights to use, modify | ||
and distribute the software. Any violation of this licence agreement is | ||
recognised as an action prohibited by an applicable legislation. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,268 @@ | ||
|
||
RAMspeed/SMP, a cache and memory benchmarking tool | ||
|
||
(for multiprocessor machines running UNIX-like operating systems) | ||
|
||
v3.5.0 | ||
|
||
August, 2009 | ||
|
||
|
||
This command line utility measures effective bandwidth of both cache and memory | ||
subsystems. It has been written entirely in C for portability purposes, though | ||
benchmark routines are also available in several assembly languages for | ||
performance reasons. So far, it's known to compile and run on the following | ||
operating systems and hardware platforms with assembly-level optimisations: | ||
|
||
* Linux (i386, amd64, alpha) | ||
* FreeBSD (i386, amd64, alpha) | ||
* NetBSD (i386, amd64, alpha) | ||
* Digital UNIX (alpha) | ||
|
||
Digital UNIX is also known as Digital OSF/1 and Compaq (HP) Tru64 UNIX. | ||
|
||
RAMspeed/SMP v3.x.x is a multiprocessed application utilising System V shared | ||
memory for IPC (Inter-Process Communication). RAMspeed/SMP v2.x.x was a POSIX | ||
multithreaded application developed no longer because of compatibility and | ||
performance reasons. | ||
|
||
|
||
GENERAL INFORMATION | ||
|
||
The software consists of two major components: | ||
|
||
1) INTmark and FLOATmark, they measure the maximum possible cache and memory | ||
performance while reading and writing certain blocks of data (starting from 1Kb | ||
and further in power of 2) continuously through ALU and FPU respectively. All | ||
data streams are linear (sequential) to achieve the maximal performance. In | ||
other words, these benchmarks allow to determine real bandwidth of cache and | ||
memory subsystems regardless of what has been advertised by manufacturers. | ||
|
||
2) INTmem and FLOATmem, they are synthetic simulations, but tied closely with | ||
the real world of computing. Each consists of four subtests (Copy, Scale, Add, | ||
Triad) to measure different aspects of memory performance. It's important to | ||
realise that even if a particular hardware offers very good linear read\write | ||
results, it may (or may not) deliver much worse results while switching | ||
continuosly between read and write operations like real life software titles | ||
do. These benchmarks are highly sensitive to memory latencies of any kind. | ||
|
||
Copy is the simplest among them. It just transfers data from one memory | ||
location to another, i. e. copies it (A = B). | ||
|
||
Scale is a little more advanced. It modifies the data before writing by | ||
multiplying with a certain constant value, i. e. scales it (A = m*B). | ||
|
||
Add reads data from the first memory location, then reads from the second, adds | ||
them up and writes the result to the third place (A = B + C). | ||
|
||
Triad is a merge of Add and Scale. It reads data from the first memory | ||
location, scales it, then adds data from the second one and writes to the third | ||
place (A = m*B + C). | ||
|
||
There are also MMXmark with MMXmem and SSEmark with SSEmem serving the same | ||
purpose as explained above but utilising the MMX and SSE instruction sets and | ||
respective registers. In general, they're supposed to be better performers | ||
than INTmark\INTmem and FLOATmark\FLOATmem. Of course, they're available for | ||
i386 and amd64 only. | ||
|
||
Non-temporal versions of MMXmark\MMXmem and SSEmark\SSEmem are supported since | ||
v3.4.0 of this UNIX/SMP port. They minimise cache pollution on memory reads and | ||
eliminate it completely on writes. In addition, they operate with a built in | ||
aggressive data prefetching algorithm. As a result, they offer significant | ||
performance improvements over regular MMX and SSE benchmarks. In some cases, | ||
non-temporal MMXmark and SSEmark can deliver almost 100% of theoretical | ||
bandwidth while reading. However, these non-temporal MMX benchmarks require | ||
support for the Extended MMX instruction set (MMX+) which is available since | ||
Intel Pentium III and AMD K6-2+ processors. | ||
|
||
INTmark\INTmem transfer data in either doublewords (32 bits) or quadwords | ||
(64 bits) which is hardware platform dependent. FLOATmark\FLOATmem and | ||
MMXmark\MMXmem utilise quadwords, SSEmark and SSEmem -- octawords (128 bits). | ||
For data calculations, MMXmem benchmarks prefer packed words, SSEmem ones -- | ||
packed doublewords. FLOATmark\FLOATmem require a real floating-point unit or | ||
mathprocessor installed, though some fast emulator might be an acceptable | ||
solution as well, but that's a whole different story. Other benchmarks | ||
utilise floating-point capabilities for result calculations only. SSEmark and | ||
SSEmem require SSE support by both a processor and an operating system. | ||
|
||
There is also the BatchRun mode (*mem benchmarks only) known formerly as the | ||
LongRun mode but renamed to avoid a possible confusion with the power saving | ||
technology of Transmeta. This mode designed for high precision benchmarking and | ||
hardware stressing. When in this mode, benchmarks are run a defined number of | ||
times with average results calculated and displayed. | ||
|
||
|
||
RUN-TIME OPTIONS | ||
|
||
USAGE: ramsmp -b ID [-g size] [-m size] [-l runs] [-p processes] | ||
-b runs a specified benchmark (by an ID number): | ||
1 -- INTmark [writing] 4 -- FLOATmark [writing] | ||
2 -- INTmark [reading] 5 -- FLOATmark [reading] | ||
3 -- INTmem 6 -- FLOATmem | ||
-g specifies a # of Gbytes per pass (default is 8) | ||
-m specifies a # of Mbytes per array (default is 32) | ||
-l enables the BatchRun mode (for *mem benchmarks only), | ||
and specifies a # of runs (suggested is 5) | ||
-p specifies a # of processes to fork (default is 2) | ||
-r displays speeds in real megabytes per second (default: decimal) | ||
|
||
The following ID numbers appear if compiled with either the i386 or amd64 | ||
assembly sources: | ||
|
||
7 -- MMXmark [writing] 10 -- SSEmark [writing] | ||
8 -- MMXmark [reading] 11 -- SSEmark [reading] | ||
9 -- MMXmem 12 -- SSEmem | ||
13 -- MMXmark (nt) [writing] 16 -- SSEmark (nt) [writing] | ||
14 -- MMXmark (nt) [reading] 17 -- SSEmark (nt) [reading] | ||
15 -- MMXmem (nt) 18 -- SSEmem (nt) | ||
|
||
The -b option is required, others are recommended. | ||
|
||
See SOFTWARE PREFETCHING below for information on the -t switch. | ||
|
||
The -i switch has no benchmarking meaning. It activates built in CPUinfo | ||
library which collects and displays various information about your processor. | ||
This option is available on i386 only. | ||
|
||
Since the very beginning, RAMspeed has used to calculate and display speeds | ||
in so-called real megabytes per second which equal to 2^20 (1,048,576) bytes | ||
each. It was considered that memory performance has something to do with | ||
operating memory size which is measured in real megabytes as well as internal | ||
pass and array sizing. However, it seems to be common these days to advertise | ||
size of storage devices, bandwidth of networks and so on in so-called decimal | ||
megabytes which equal to 10^6 (1,000,000) bytes each. Most cache and memory | ||
benchmarks report their performance in decimal megabytes too. We feel sick of | ||
arguing, and that's why default behaviour has changed towards decimal data | ||
since v3.6.0 of this UNIX/SMP port. It is still possible to display output in | ||
real megabytes per second by using the -r switch. To avoid possible mistakes, | ||
real megabytes per second are still referred as Mb/s while decimal megabytes | ||
per second are displayed as MB/s. | ||
|
||
There are no built in logging capabilities, but you may redirect output to a | ||
file instead of stdout: | ||
|
||
./ramsmp [options] > yourcomp.log | ||
|
||
Default values of memory array size and pass size do well for a wide range of | ||
computer hardware, but you may need to decrease them if torturing something | ||
pretty old, and vice versa, to increase in case of some fast and furious | ||
equipment. | ||
|
||
Note that the *mark benchmarks require [by default] 32Mb of memory array space | ||
like mentioned above, but the *mem ones demand two to three times more. The | ||
same applies to pass size. | ||
|
||
Don't forget that every process coming up requires additional memory space. In | ||
other words, -m32 -p8 setting requires four times more operating memory than | ||
-m32 -p2 (a gigabyte at least). Number of processes spawned must be a power of | ||
2 and not to exceed 256. | ||
|
||
|
||
SOFTWARE PREFETCHING | ||
|
||
As it has been mentioned above, non-temporal versions of the MMX and SSE | ||
benchmarks benefit from use of software data prefetching. It needs to note that | ||
the MMX+ instruction set has introduced several instructions for this purpose: | ||
PREFETCHNTA (prefetch with minimal cache pollution), PREFETCHT0 (prefetch to | ||
all cache levels), and PREFETCHT1 with PREFETCHT2 which are of no use almost. | ||
In theory, there is no reason to use T0 prefetching for our benchmarking needs, | ||
but it has been observed that some memory controllers behave pretty poorly in | ||
Add and Triad subtests with NTA prefetching enabled. So, it has been decided to | ||
set up the default settings with NTA prefetching for Copy and Scale, while | ||
using T0 prefetching for Add and Triad. However, it has been made possible to | ||
override this decision with the -t switch and to use either NTA or T0 code for | ||
all four memory subtests: | ||
|
||
-t0 (NTA code for Copy and Scale, T0 code for Add and Triad) | ||
-t1 (NTA code for Copy, Scale, Add and Triad) | ||
-t2 (T0 code for Copy, Scale, Add and Triad) | ||
|
||
Note that this switch applies to MMXmem (nt) and SSEmem (nt) only on i386 and | ||
amd64. MMXmark (nt) and SSEmark (nt) ignore it and use NTA code always. | ||
|
||
|
||
COMPILATION | ||
|
||
The software is known to have no problems with the GNU C compiler (GCC) and the | ||
GNU assembler (GAS) as well as with the DEC C compiler & assembler. However, | ||
there should be no problems with other compilers and assemblers (of AT&T style, | ||
of course). | ||
|
||
A new build system has been introduced starting with v3.3.0. Now it isn't a | ||
Makefile but a shell script which is supposed to be more flexible. In most | ||
cases, it's just enough to run it and follow with the options suggested. | ||
Sometimes the script cannot guess your operating system and/or hardware | ||
platform, thus needs a hint passed through command line. For example, some | ||
Linux distributions don't define a hardware platform properly, so this issue | ||
should be worked around, say, this way: | ||
|
||
# ./build.sh Linux amd64 | ||
|
||
There should be no problem of adding support for new operating systems and | ||
hardware platforms in the future. Your feedback is welcome. | ||
|
||
If the script fails to detect your environment, it falls back to generic | ||
settings which imply the C source code only. | ||
|
||
|
||
RESULTS AND COMPARISONS | ||
|
||
Results shown are real and may be compared with those obtained from other | ||
benchmarking titles indeed. There are many of them, and they measure cache and | ||
memory performance in different ways using different algorithms. The oldest and | ||
most notable among them is open source STREAM by John D. McCalpin, though there | ||
are several well known software suites with memory benchmarking capabilities. | ||
To name a few, SiSoft Sandra by Catalin-Adrian Silasi, EVEREST by Lavalys Inc. | ||
and ScienceMark by Alexander Goodrich, Tim Wilkens and Sean Stanek. Although | ||
all three are some STREAM derivatives in means of memory benchmarking. | ||
|
||
STREAM itself is a very good benchmark. It has been used as a reference for | ||
INTmem and FLOATmem back in the past. Although everything has been coded from a | ||
scratch, the idea remains the same. Nevertheless, STREAM has been written in C | ||
only. It utilises a low pass size, displays the highest results only, operates | ||
through FPU only, doesn't accept command line parametres and much less accurate | ||
overall. | ||
|
||
|
||
ISSUES | ||
|
||
Some compilers may optimise the code in such ways that the benchmarks are no | ||
longer what they are meant to be. For example, GCC 3.x.x optimises the | ||
floating-point benchmarks by substituting some of their code with the integer | ||
one. It seems there is no way to work around this issue but to use the assembly | ||
code. | ||
|
||
Sometimes on i386-compatible CPUs write performance of FLOATmark may be better | ||
than read. That's not a bug but an issue specific to how i387-compatible FPUs | ||
work, i.e. data store requires one instruction, when data load requires one | ||
instruction for actually loading, and one instruction to flush a register. | ||
|
||
Some CISC processors (Intel 386 to Pentium, AMD 386 to 5x86, Cyrix 486) deliver | ||
strange very much write performance of *mark benchmarks: it's constant all the | ||
way with no respect to any cache levels and their write policies. These | ||
processors don't seem to support write allocation or whatever else forces them | ||
to perform these direct memory writes. | ||
|
||
Not really an issue, but results shown may and will differ when received under | ||
different operating systems, sometimes significantly. | ||
|
||
|
||
UNIX SPECIFIC NOTES | ||
|
||
RAMspeed runs well from any system\serial console, though any virtual terminal | ||
should be all right as well. | ||
|
||
It's suggested strongly to reduce background activity before running. Power | ||
management (APM or ACPI) may produce undesirable effects too. | ||
|
||
|
||
FINAL NOTES | ||
|
||
The latest version can always be downloaded from | ||
|
||
http://www.alasir.com/software/ramspeed | ||
|
||
Relax & enjoy! | ||
|
||
|
||
PVB |
Oops, something went wrong.