Updated benchmark results.

More tidying up.
ckolivas · Nov 5, 2010 · a66dafe · a66dafe
1 parent 2965349
commit a66dafe
Show file tree

Hide file tree

Showing 5 changed files with 73 additions and 66 deletions.
diff --git a/doc/README.benchmarks b/doc/README.benchmarks
@@ -1,28 +1,29 @@
-These are benchmarks performed on a 3GHz quad core Intel Core2 with 8GB ram
-using lrzip v0.42.
 
 The first comparison is that of a linux kernel tarball (2.6.31). In all cases
 the default options were used. 3 other common compression apps were used for
 comparison, 7z which is an excellent all-round lzma based compression app,
 gzip which is the benchmark fast standard that has good compression, and bzip2
 which is the most common linux used compression.
 
-In the following tables, lrzip means lrzip default options, lrzip(lzo) means
-lrzip using the lzo backend, lrzip(gzip) means using the gzip backend,
-lrzip(bzip2) means using the bzip2 backend and lrzip(zpaq) means using the zpaq
+In the following tables, lrzip means lrzip default options, lrzip -l means
+lrzip using the lzo backend, lrzip -g means using the gzip backend,
+lrzip -b means using the bzip2 backend and lrzip -z means using the zpaq
 backend.
 
 
 linux-2.6.31.tar
 
+These are benchmarks performed on a 3GHz quad core Intel Core2 with 8GB ram
+using lrzip v0.42.
+
 Compression	Size		Percentage	Compress	Decompress
 None		365711360	100
 7z		53315279	14.6		2m4.770s	0m5.360s
 lrzip		52372722	14.3		2m48.477s	0m8.336s
-lrzip(zpaq)	43455498	11.9		10m11.335	10m14.296
-lrzip(lzo)	112151676	30.7		0m14.913s	0m5.063s
-lrzip(gzip)	73476127	20.1		0m29.628s	0m5.591s
-lrzip(bzip2)	60851152	16.6		0m43.539s	0m12.244s
+lrzip -z	43455498	11.9		10m11.335	10m14.296
+lrzip -l	112151676	30.7		0m14.913s	0m5.063s
+lrzip -g	73476127	20.1		0m29.628s	0m5.591s
+lrzip -b	60851152	16.6		0m43.539s	0m12.244s
 bzip2		62416571	17.1		0m44.493s	0m9.819s
 gzip		80563601	22.0		0m14.343s	0m2.781s
 
@@ -37,29 +38,34 @@ What lrzip offers at this end of the spectrum is extreme compression if
 desired.
 
 
-Let's take two kernel trees one version apart as a tarball, linux-2.6.31 and
-linux-2.6.32-rc8. These will show lots of redundant information, but hundreds
+Let's take six kernel trees one version apart as a tarball, linux-2.6.31 to
+linux-2.6.36. These will show lots of redundant information, but hundreds
 of megabytes apart, which lrzip will be very good at compressing. For
 simplicity, only 7z will be compared since that's by far the best general
 purpose compressor at the moment:
 
+These are benchmarks performed on a 2.53Ghz dual core Intel Core2 with 4GB ram
+using lrzip v0.5.1. Note that it was running with a 32 bit userspace so only
+2GB addressing was posible. However the benchmark was run with the -U option
+allowing the whole file to be treated as one large compression window.
 
-Tarball of two kernel trees, one version apart.
+Tarball of 6 consecutive kernel trees.
 
 Compression	Size		Percentage	Compress	Decompress
-None		749066240	100
-7z		108710624	14.5		4m4.260s	0m11.133s
-lrzip		57943094	7.7		3m08.788s	0m10.747s
-lrzip(lzo)	124029899	16.6		0m18.997s	0m7.107s
+None		2373713920	100
+7z		344088002	14.5		17m26s		1m22s
+lrzip -U	73356070	3.1		08m53s		43s
+lrzip -Ul	158851141	6.7		04m31s		35s
 
 Things start getting very interesting now when lrzip is really starting to
-shine. Note how it's not that much larger for 2 kernel trees than it was for
+shine. Note how it's not that much larger for 6 kernel trees than it was for
 one. That's because all the similar data in both kernel trees is being
 compressed as one copy and only the differences really make up the extra size.
 All compression software does this, but not over such large distances. If you
 copy the same data over multiple times, the resulting lrzip archive doesn't
 get much larger at all.
 
+
 Using the first example (linux-2.6.31.tar) and simply copying the data multiple
 times over gives these results with lrzip(lzo):
 
@@ -70,7 +76,7 @@ Copies		Size		Compressed	Compress	Decompress
 
 
 I had the amusing thought that this compression software could be used as a
-bullshit detector if you were to compress peoples' speeches because if their
+bullshit detector if you were to compress people's speeches because if their
 talks were full of catchphrases and not much actual content, it would all be
 compressed down. So the larger the final archive, the less bullshit =)
 
@@ -83,31 +89,31 @@ system and some basic working software on it. The default options on the
 
 10GB Virtual image:
 
+These benchmarks were done on the quad core with version 0.5.1
+
 Compression	Size		Percentage	Compress Time	Decompress Time
 None		10737418240	100.0
 gzip		2772899756	 25.8		05m47.35s	2m46.77s
 bzip2		2704781700	 25.2		20m34.269s	7m51.362s
 xz		2272322208	 21.2		58m26.829s	4m46.154s
 7z		2242897134	 20.9		29m28.152s	6m35.952s
-lrzip*		1354237684	 12.6		29m13.402s	6m55.441s
-lrzip M*	1079528708	 10.1		23m44.226s	4m05.461s
-lrzip(lzo)*	1793312108	 16.7		05m13.246s	3m12.886s
-lrzip(lzo)M*	1413268368	 13.2		04m18.338s	2m54.650s
-lrzip(zpaq)*	1299844906	 12.1		04h32m14s	04h33m
-lrzip(zpaq)M*	1066902006	  9.9		04h07m14s	04h08m
+lrzip		1354237684	 12.6		29m13.402s	6m55.441s
+lrzip -M	1079528708	 10.1		23m44.226s	4m05.461s
+lrzip -l	1793312108	 16.7		05m13.246s	3m12.886s
+lrzip -lM	1413268368	 13.2		04m18.338s	2m54.650s
+lrzip -z	1299844906	 12.1		04h32m14s	04h33m
+lrzip -zM	1066902006	  9.9		04h07m14s	04h08m
 
-(The benchmarks with * were done with version 0.5.1)
 
 At this end of the spectrum things really start to heat up. The compression
 advantage is massive, with the lzo backend even giving much better results than
-7z, and over a ridiculously short time. Note that it's not much longer than it
-takes to just *read* a 10GB file. What appears to be a big disappointment is
-actually zpaq here which takes more than 8 times longer than lzma for a measly
-.2% improvement. The reason is that most of the advantage here is achieved by
-the rzip first stage since there's a lot of redundant space over huge distances
-on a virtual image. The -M option which works the memory subsystem rather hard
-making noticeable impact on the rest of the machine also does further wonders
-for the compression and times.
+7z, and over a ridiculously short time. What appears to be a big disappointment
+is actually zpaq here which takes more than 8 times longer than lzma for a
+measly .2% improvement. The reason is that most of the advantage here is
+achieved by the rzip first stage since there's a lot of redundant space over
+huge distances on a virtual image. The -M option which works the memory
+subsystem rather hard making noticeable impact on the rest of the machine also
+does further wonders for the compression and times.
 
 This should help govern what compression you choose. Small files are nicely
 compressed with zpaq. Intermediate files are nicely compressed with lzma.
@@ -117,4 +123,4 @@ Or, to make things easier, just use the default settings all the time and be
 happy as lzma gives good results. :D
 
 Con Kolivas
-Tue, 4th Nov 2010
+Tue, 5th Nov 2010
diff --git a/main.c b/main.c
@@ -312,7 +312,7 @@ static void decompress_file(void)
 		print_output("Output filename is: %s: ", control.outfile);
         print_progress("[OK] - %lld bytes                                \n", expected_size);
 
-	if (unlikely(close(fd_hist) != 0 || close(fd_out) != 0))
+	if (unlikely(close(fd_hist) || close(fd_out)))
 		fatal("Failed to close files\n");
 
 	if (TEST_ONLY | STDOUT) {
@@ -501,7 +501,7 @@ static void compress_file(void)
 	if (STDOUT)
 		dump_tmpoutfile(fd_out);
 
-	if (unlikely(close(fd_in) != 0 || close(fd_out)))
+	if (unlikely(close(fd_in) || close(fd_out)))
 		fatal("Failed to close files\n");
 
 	if (STDOUT) {

diff --git a/runzip.c b/runzip.c
@@ -179,7 +179,7 @@ static i64 runzip_chunk(int fd_in, int fd_out, int fd_hist, i64 expected_size, i
 	if (unlikely(ofs == -1))
 		fatal("Failed to seek input file in runzip_fd\n");
 
-	if (fstat(fd_in, &st) != 0 || st.st_size - ofs == 0)
+	if (fstat(fd_in, &st) || st.st_size - ofs == 0)
 		return 0;
 
 	ss = open_stream_in(fd_in, NUM_STREAMS);

diff --git a/rzip.c b/rzip.c
@@ -124,10 +124,11 @@ static void remap_low_sb(void)
 
 static inline void remap_high_sb(i64 p)
 {
-	if (unlikely(munmap(sb.buf_high, sb.size_high) != 0))
+	if (unlikely(munmap(sb.buf_high, sb.size_high)))
 		fatal("Failed to munmap in remap_high_sb\n");
 	sb.size_high = sb.high_length; /* In case we shrunk it when we hit the end of the file */
 	sb.offset_high = p;
+	/* Make sure offset is rounded to page size of total offset */
 	sb.offset_high -= (sb.offset_high + sb.orig_offset) % 4096;
 	if (unlikely(sb.offset_high + sb.size_high > sb.orig_size))
 		sb.size_high = sb.orig_size - sb.offset_high;
@@ -138,10 +139,10 @@ static inline void remap_high_sb(i64 p)
 
 /* We use a "sliding mmap" to effectively read more than we can fit into the
  * compression window. This is done by using a maximally sized lower mmap at
- * the beginning of the block, and a one-page-sized mmap block that slides up
- * and down as is required for any offsets beyond the lower one. This is
- * 100x slower than mmap but makes it possible to have unlimited sized
- * compression windows. */
+ * the beginning of the block which slides up once the hash search moves beyond
+ * it, and a 64k mmap block that slides up and down as is required for any
+ * offsets outside the range of the lower one. This is much slower than mmap
+ * but makes it possible to have unlimited sized compression windows. */
 static uchar *get_sb(i64 p)
 {
 	i64 low_end = sb.offset_low + sb.size_low;
@@ -152,14 +153,14 @@ static uchar *get_sb(i64 p)
 		return (sb.buf_low + p - sb.offset_low);
 	if (p >= sb.offset_high && p < (sb.offset_high + sb.size_high))
 		return (sb.buf_high + (p - sb.offset_high));
-	/* (p > sb.size_low &&  p < sb.offset_high) */
+	/* p is not within the low or high buffer range */
 	remap_high_sb(p);
 	return (sb.buf_high + (p - sb.offset_high));
 }
 
 static inline void put_u8(void *ss, int stream, uchar b)
 {
-	if (unlikely(write_stream(ss, stream, &b, 1) != 0))
+	if (unlikely(write_stream(ss, stream, &b, 1)))
 		fatal("Failed to put_u8\n");
 }
 
@@ -226,7 +227,7 @@ int write_sbstream(void *ss, int stream, i64 p, i64 len)
 		p += n;
 		len -= n;
 		if (sinfo->s[stream].buflen == sinfo->bufsize) {
-			if (unlikely(flush_buffer(sinfo, stream) != 0))
+			if (unlikely(flush_buffer(sinfo, stream)))
 				return -1;
 		}
 	}
@@ -407,7 +408,7 @@ static inline i64 match_len(struct rzip_state *st, i64 p0, i64 op, i64 end,
 	if (end < st->last_match)
 		end = st->last_match;
 
-	while (p > end && op > 0 && *get_sb(op - 1) == *get_sb(p-1)) {
+	while (p > end && op > 0 && *get_sb(op - 1) == *get_sb(p - 1)) {
 		op--;
 		p--;
 	}
@@ -673,7 +674,7 @@ static void init_sliding_mmap(struct rzip_state *st, int fd_in, i64 offset)
 	i64 size = st->chunk_size;
 
 	if (sizeof(long) == 4 && size > two_gig) {
-		print_verbose("Limiting to 2G due to 32 bit limitations\n");
+		print_verbose("Limiting to 2GB due to 32 bit limitations\n");
 		size = two_gig;
 	}
 	sb.orig_offset = offset;
@@ -689,14 +690,14 @@ static void init_sliding_mmap(struct rzip_state *st, int fd_in, i64 offset)
 		/* Better to shrink the window to the largest size that works than fail */
 		if (sb.buf_low == MAP_FAILED) {
 			size = size / 10 * 9;
-			size -= size % 4096; /* Round to page size */
+			size -= size % 4096;
 			if (unlikely(!size))
 				fatal("Unable to mmap any ram\n");
 			goto retry;
 		}
 		print_maxverbose("Succeeded in preallocating %lld sized mmap\n", size);
 		if (!STDIN) {
-			if (unlikely(munmap(sb.buf_low, size) != 0))
+			if (unlikely(munmap(sb.buf_low, size)))
 				fatal("Failed to munmap\n");
 		} else
 			st->chunk_size = size;
@@ -707,7 +708,7 @@ static void init_sliding_mmap(struct rzip_state *st, int fd_in, i64 offset)
 		sb.buf_low = (uchar *)mmap(sb.buf_low, size, PROT_READ, MAP_SHARED, fd_in, offset);
 		if (sb.buf_low == MAP_FAILED) {
 			size = size / 10 * 9;
-			size -= size % 4096; /* Round to page size */
+			size -= size % 4096;
 			if (unlikely(!size))
 				fatal("Unable to mmap any ram\n");
 			goto retry;
@@ -718,7 +719,7 @@ static void init_sliding_mmap(struct rzip_state *st, int fd_in, i64 offset)
 
 	if (size < st->chunk_size) {
 		if (UNLIMITED && !STDIN)
-			print_verbose("File is beyond window size, will proceed MUCH slower in unlimited mode with a sliding_mmap buffer\n");
+			print_verbose("File is beyond window size, will proceed in unlimited mode with a sliding_mmap buffer but may be much slower\n");
 		else {
 			print_verbose("Needed to shrink window size to %lld\n", size);
 			st->chunk_size = size;
@@ -866,10 +867,10 @@ void rzip_fd(int fd_in, int fd_out)
 			eta_hours = (unsigned int)(finish_time - elapsed_time) / 3600;
 			eta_minutes = (unsigned int)((finish_time - elapsed_time) - eta_hours * 3600) / 60;
 			eta_seconds = (unsigned int)(finish_time - elapsed_time) - eta_hours * 60 - eta_minutes * 60;
-			chunkmbs=(last_chunk / 1024 / 1024) / (double)(current.tv_sec-last.tv_sec);
+			chunkmbs = (last_chunk / 1024 / 1024) / (double)(current.tv_sec-last.tv_sec);
 			print_verbose("\nPass %d / %d -- Elapsed Time: %02d:%02d:%02d. ETA: %02d:%02d:%02d. Compress Speed: %3.3fMB/s.\n",
-					pass, passes, elapsed_hours, elapsed_minutes, elapsed_seconds,
-					eta_hours, eta_minutes, eta_seconds, chunkmbs);
+				       pass, passes, elapsed_hours, elapsed_minutes, elapsed_seconds,
+				       eta_hours, eta_minutes, eta_seconds, chunkmbs);
 		}
 		last.tv_sec = current.tv_sec;
 		last.tv_usec = current.tv_usec;

diff --git a/stream.c b/stream.c
@@ -102,7 +102,7 @@ static void zpaq_compress_buf(struct stream *s, int *c_type, i64 *c_len)
 
 	zpipe_compress(in, out, control.msgout, s->buflen, (int)(SHOW_PROGRESS));
 
-	if (unlikely(memstream_update_buffer(out, &c_buf, &dlen) != 0))
+	if (unlikely(memstream_update_buffer(out, &c_buf, &dlen)))
 	        fatal("Failed to memstream_update_buffer in zpaq_compress_buf");
 
 	fclose(in);
@@ -387,7 +387,7 @@ static int lzma_decompress_buf(struct stream *s, size_t c_len)
 	/* With LZMA SDK 4.63 we pass control.lzma_properties
 	 * which is needed for proper uncompress */
 	lzmaerr = LzmaUncompress(s->buf, &dlen, c_buf, &c_len, control.lzma_properties, 5);
-	if (unlikely(lzmaerr != 0)) {
+	if (unlikely(lzmaerr)) {
 		print_err("Failed to decompress buffer - lzmaerr=%d\n", lzmaerr);
 		return -1;
 	}
@@ -675,11 +675,11 @@ void *open_stream_in(int f, int n)
 		if (control.major_version == 0 && control.minor_version < 4) {
 			u32 v132, v232, last_head32;
 
-			if (read_u32(f, &v132) != 0)
+			if (unlikely(read_u32(f, &v132)))
 				goto failed;
-			if (read_u32(f, &v232) != 0)
+			if (unlikely(read_u32(f, &v232)))
 				goto failed;
-			if (read_u32(f, &last_head32) != 0)
+			if ((read_u32(f, &last_head32)))
 				goto failed;
 
 			v1 = v132;
@@ -708,11 +708,11 @@ void *open_stream_in(int f, int n)
 			print_err("Unexpected initial tag %d in streams\n", c);
 			goto failed;
 		}
-		if (unlikely(v1 != 0)) {
+		if (unlikely(v1)) {
 			print_err("Unexpected initial c_len %lld in streams %lld\n", v1, v2);
 			goto failed;
 		}
-		if (unlikely(v2 != 0)) {
+		if (unlikely(v2)) {
 			print_err("Unexpected initial u_len %lld in streams\n", v2);
 			goto failed;
 		}
@@ -791,11 +791,11 @@ static int fill_buffer(struct stream_info *sinfo, int stream)
 	if (control.major_version == 0 && control.minor_version < 4) {
 		u32 c_len32, u_len32, last_head32;
 
-		if (read_u32(sinfo->fd, &c_len32) != 0)
+		if (unlikely(read_u32(sinfo->fd, &c_len32)))
 			return -1;
-		if (read_u32(sinfo->fd, &u_len32) != 0)
+		if (unlikely(read_u32(sinfo->fd, &u_len32)))
 			return -1;
-		if (read_u32(sinfo->fd, &last_head32) != 0)
+		if (unlikely(read_u32(sinfo->fd, &last_head32)))
 			return -1;
 		c_len = c_len32;
 		u_len = u_len32;
@@ -911,13 +911,13 @@ int close_stream_out(void *ss)
 
 	/* reallocate buffers to try and save space */
 	for (i = 0; i < sinfo->num_streams; i++) {
-		if (sinfo->s[i].buflen != 0) {
+		if (sinfo->s[i].buflen) {
 			if (unlikely(!realloc(sinfo->s[i].buf, sinfo->s[i].buflen)))
 				fatal("Error Reallocating Output Buffer %d\n", i);
 		}
 	}
 	for (i = 0; i < sinfo->num_streams; i++) {
-		if (unlikely(sinfo->s[i].buflen != 0 && flush_buffer(sinfo, i)))
+		if (unlikely(sinfo->s[i].buflen && flush_buffer(sinfo, i)))
 			return -1;
 		if (sinfo->s[i].buf)
 			free(sinfo->s[i].buf);