Skip to content

arra1997/parallelized-gzip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the file README for the parallelized gzip prototype.

To get parallelized gzip working, make sure you have the zlib library
and other dependencies installed and run the following commands.
1. autoreconf -f -i
2. ./configure
3. make

The latest additions to this project, which remains a work-in-progress
are the --processes/-p option and the --independent/-i option. Try
'gzip --help' for more information on how to use them.

Known bugs/Features to complete:
1. Out of the original 21 test cases in the Makefile, we currently pass 17.
2. The --rsyncable and --test options do not currently work as intended
3. We wish to implement a parallelized version of --rsyncable using an
algorithm similar to Mark Adler's pigz.
4. Ideally, we wish to rewrite gzip to improve code quality and readibility -
we want to make it modular, since it currently is not
5. The data structures and functions in parallel.h & parallel.c need to be
tested more thoroughly, and for any new functionality we want to add tests
to the Makefile.


New tests that are not part of make check, must be run individually:
The new tests are in the tests folder, and is run using the command ./newTests.sh

1) Parallel Compression with 4 Threads decompressed file integrity test -- Pass
2) Parallel Compression with 8 Threads decompressed file integrity test -- Pass
3) Appended decompressed files same as appended original files -- Fail
4) Decompressed file of 2 Thread compression and 4 Thread are the same -- Pass
5) Decompressed file of 4 Thread compression and 8 Thread are the same -- Pass

--------------------------------------------------------------------

gzip (GNU zip) is a compression utility designed to be a replacement
for 'compress'. Its main advantages over compress are much better
compression and freedom from patented algorithms.  The GNU Project
uses it as the standard compression program for its system.

This gzip currently uses by default the LZ77 algorithm and exercises zlib
for to do so, and uses all the cores available on the system.
The gzip format was however designed to accommodate several compression
algorithms. See below for a comparison of zip and gzip.

gunzip can currently decompress files created by gzip, compress or
pack. The detection of the input format is automatic.  For the
gzip format, gunzip checks a 32 bit CRC. For pack, gunzip checks the
uncompressed length.  The 'compress' format was not designed to allow
consistency checks. However gunzip is sometimes able to detect a bad
.Z file because there is some redundancy in the .Z compression format.
If you get an error when uncompressing a .Z file, do not assume that
the .Z file is correct simply because the standard uncompress does not
complain.  This generally means that the standard uncompress does not
check its input, and happily generates garbage output.

gzip produces files with a .gz extension. Previous versions of gzip
used the .z extension, which was already used by the 'pack'
Huffman encoder. gunzip is able to decompress .z files (packed
or gzip'ed).

Several planned features are not yet supported (see the file TODO).
See the file NEWS for a summary of changes since the last release.
See the file INSTALL for installation instructions.

WARNING: gzip is sensitive to compiler bugs, particularly when
optimizing.  Use "make check" to check that gzip was compiled
correctly.  Try compiling gzip without any optimization if you have a
problem.

Please send all comments and bug reports by electronic mail to
<bug-gzip@gnu.org>.

Bug reports should ideally include:

    * The complete output of "gzip -V" (or the contents of revision.h
      if you can't get gzip to compile)
    * The hardware and operating system (try "uname -a")
    * The compiler used to compile (if it is gcc, use "gcc -v")
    * A description of the bug behavior
    * The input to gzip, that triggered the bug

If you send me patches for machines I don't have access to, please test them
very carefully. gzip is used for backups, it must be extremely reliable.

The znew and gzexe shell scripts provided with gzip benefit from
(but do not require) the (non-GNU) cpmod utility to transfer file attributes.

The sample programs zread.c, sub.c and add.c in subdirectory sample
are provided as examples of useful complements to gzip. Read the
comments inside each source file.  The perl script ztouch is also
provided as example (not installed by default since it relies on perl).


gzip is free software, you can redistribute it and/or modify it under
the terms of the GNU General Public License, a copy of which is
provided under the name COPYING. The latest version of gzip is always
available from https://ftp.gnu.org/gnu/gzip or in any of the GNU
mirror sites.

Many thanks to those who provided me with bug reports and feedback.
See the files THANKS and ChangeLog for more details.


                Note about zip vs. gzip:

The name 'gzip' was a very unfortunate choice, because zip and gzip
are two really different programs, although the actual compression and
decompression sources were written by the same persons. A different
name should have been used for gzip, but it is too late to change now.

zip is an archiver: it compresses several files into a single archive
file. gzip is a simple compressor: each file is compressed separately.
Both share the same compression and decompression code for the
'deflate' method.  unzip can also decompress old zip archives
(implode, shrink and reduce methods). gunzip can also decompress files
created by compress and pack. zip 1.9 and gzip do not support
compression methods other than deflation. (zip 1.0 supports shrink and
implode). Better compression methods may be added in future versions
of gzip. zip will always stick to absolute compatibility with pkzip,
it is thus constrained by PKWare, which is a commercial company.  The
gzip header format is deliberately different from that of pkzip to
avoid such a constraint.

On Unix, gzip is mostly useful in combination with tar. GNU tar
1.11.2 and later has a -z option to invoke gzip automatically.  "tar -z"
compresses better than zip, since gzip can then take advantage of
redundancy between distinct files. The drawback is that you must
scan the whole tar.gz file in order to extract a single file near
the end; unzip can directly seek to the end of the zip file. There
is no overhead when you extract the whole archive anyway.
If a member of a .zip archive is damaged, other files can still
be recovered. If a .tar.gz file is damaged, files beyond the failure
point cannot be recovered. (Future versions of gzip will have
error recovery features.)

gzip and gunzip are distributed as a single program. zip and unzip
are, for historical reasons, two separate programs, although the
authors of these two programs work closely together in the Info-ZIP
team. zip and unzip are not associated with the GNU project.
See http://info-zip.org/ for more about zip and unzip.


For any copyright year range specified as YYYY-ZZZZ in this package
note that the range specifies every single year in that closed interval.

========================================================================

Copyright (C) 1999, 2001-2002, 2006-2007, 2009-2018 Free Software Foundation,
Inc.
Copyright (C) 1992, 1993 Jean-loup Gailly

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts.  A copy of the license is included in the ``GNU Free
Documentation License'' file as part of this distribution.