# zipfile - ZIP Archive Access

Purpose:	Read and write ZIP archive files.

The zipfile module can be used to manipulate ZIP archive files, the format popularized by the PC program PKZIP.

## Testing ZIP Files

The is_zipfile() function returns a boolean indicating whether or not the filename passed as an argument refers to a valid ZIP archive.

In [1]:
# zipfile_is_zipfile.py
import zipfile

for filename in ['README.txt', 'example.zip',
                 'bad_example.zip', 'notthere.zip']:
    print('{:>15}  {}'.format(
        filename, zipfile.is_zipfile(filename)))

     README.txt  False
    example.zip  True
bad_example.zip  False
   notthere.zip  False


If the file does not exist at all, is_zipfile() returns False.

## Reading Metadata from an Archive

Use the ZipFile class to work directly with a ZIP archive. It supports methods for reading data about existing archives as well as modifying the archives by adding additional files.



In [2]:
# zipfile_namelist.py
import zipfile

with zipfile.ZipFile('example.zip', 'r') as zf:
    print(zf.namelist())

['example.py']


The namelist() method returns the names of the files in an existing archive.

The list of names is only part of the information available from the archive, though. To access all of the metadata about the ZIP contents, use the infolist() or getinfo() methods.

In [3]:
# zipfile_infolist.py
import datetime
import zipfile


def print_info(archive_name):
    with zipfile.ZipFile(archive_name) as zf:
        for info in zf.infolist():
            print(info.filename)
            print('  Comment     :', info.comment)
            mod_date = datetime.datetime(*info.date_time)
            print('  Modified    :', mod_date)
            if info.create_system == 0:
                system = 'Windows'
            elif info.create_system == 3:
                system = 'Unix'
            else:
                system = 'UNKNOWN'
            print('  System      :', system)
            print('  ZIP version :', info.create_version)
            print('  Compressed  :', info.compress_size, 'bytes')
            print('  Uncompressed:', info.file_size, 'bytes')
            print()

if __name__ == '__main__':
    print_info('example.zip')


example.py
  Comment     : b''
  Modified    : 2017-02-04 00:40:30
  System      : Unix
  ZIP version : 30
  Compressed  : 86 bytes
  Uncompressed: 93 bytes



If the name of the archive member is known in advance, its ZipInfo object can be retrieved directly with getinfo().

In [4]:
# zipfile_getinfo.py
import zipfile

with zipfile.ZipFile('example.zip') as zf:
    for filename in ['README.txt', 'notthere.txt','example.py']:
        try:
            info = zf.getinfo(filename)
        except KeyError:
            print('ERROR: Did not find {} in zip file'.format(
                filename))
        else:
            print('{} is {} bytes'.format(
                info.filename, info.file_size))


ERROR: Did not find README.txt in zip file
ERROR: Did not find notthere.txt in zip file
example.py is 93 bytes


If the archive member is not present, getinfo() raises a KeyError.

## Extracting Archived Files From an Archive

To access the data from an archive member, use the read() method, passing the member’s name.

In [5]:
# zipfile_read.py
import zipfile

with zipfile.ZipFile('example.zip') as zf:
    for filename in ['README.txt', 'notthere.txt','example.py']:
        try:
            data = zf.read(filename)
        except KeyError:
            print('ERROR: Did not find {} in zip file'.format(
                filename))
        else:
            print(filename, ':')
            print(data)
        print()

ERROR: Did not find README.txt in zip file

ERROR: Did not find notthere.txt in zip file

example.py :
b"#demopkg2/overloaded.py\n\ndef func():\n    print('This is the dev version of shallow func().')\n"



## Creating New Archives

To create a new archive, instantiate the ZipFile with a mode of 'w'. Any existing file is truncated and a new archive is started. To add files, use the write() method.



In [17]:
# zipfile_write.py
from zipfile_infolist import print_info
import zipfile

print('creating archive')
with zipfile.ZipFile('write.zip', mode='w') as zf:
    print('adding example.py')
    zf.write('example.py')

print()
print_info('write.zip')

creating archive
adding example.py

example.py
  Comment     : b''
  Modified    : 2017-02-04 00:40:28
  System      : Unix
  ZIP version : 20
  Compressed  : 93 bytes
  Uncompressed: 93 bytes



To add compression, the zlib module is required. If zlib is available, the compression mode for individual files or for the archive as a whole can be set using zipfile.ZIP_DEFLATED. The default compression mode is zipfile.ZIP_STORED, which adds the input data to the archive without compressing it.

In [19]:
# zipfile_write_compression.py
from zipfile_infolist import print_info
import zipfile
try:
    import zlib
    compression = zipfile.ZIP_DEFLATED
except:
    compression = zipfile.ZIP_STORED

modes = {
    zipfile.ZIP_DEFLATED: 'deflated',
    zipfile.ZIP_STORED: 'stored',
}

print('creating archive')
with zipfile.ZipFile('write_compression.zip', mode='w') as zf:
    mode_name = modes[compression]
    print('adding example.py with compression mode', mode_name)
    zf.write('example.py', compress_type=compression)

print()
print_info('write_compression.zip')

creating archive
adding example.py with compression mode deflated

example.py
  Comment     : b''
  Modified    : 2017-02-04 00:40:28
  System      : Unix
  ZIP version : 20
  Compressed  : 86 bytes
  Uncompressed: 93 bytes



## Using Alternate Archive Member Names

Pass an arcname value to write() to add a file to an archive using a name other than the original filename.



In [21]:
#zipfile_write_arcname.py
from zipfile_infolist import print_info
import zipfile

with zipfile.ZipFile('write_arcname.zip', mode='w') as zf:
    zf.write('example.py', arcname='example_arc.py')

print_info('write_arcname.zip')

example_arc.py
  Comment     : b''
  Modified    : 2017-02-04 00:40:28
  System      : Unix
  ZIP version : 20
  Compressed  : 93 bytes
  Uncompressed: 93 bytes



There is no sign of the original filename in the archive.

In [22]:
# zipfile_namelist.py
import zipfile

with zipfile.ZipFile('write_arcname.zip', 'r') as zf:
    print(zf.namelist())

['example_arc.py']


## Writing Data from Sources Other Than Files

Sometimes it is necessary to write to a ZIP archive using data that did not come from an existing file. 

Rather than writing the data to a file, then adding that file to the ZIP archive, use the writestr() method to add a string of bytes to the archive directly.

In [23]:
# zipfile_writestr.py
from zipfile_infolist import print_info
import zipfile

msg = 'This data did not exist in a file.'
with zipfile.ZipFile('writestr.zip',
                     mode='w',
                     compression=zipfile.ZIP_DEFLATED,
                     ) as zf:
    zf.writestr('from_string.txt', msg)

print_info('writestr.zip')

with zipfile.ZipFile('writestr.zip', 'r') as zf:
    print(zf.read('from_string.txt'))

from_string.txt
  Comment     : b''
  Modified    : 2017-02-04 00:53:20
  System      : Unix
  ZIP version : 20
  Compressed  : 36 bytes
  Uncompressed: 34 bytes

b'This data did not exist in a file.'


In this case, the compress_type argument to ZipFile was used to compress the data, since writestr() does not take an argument to specify the compression.

## Writing with a ZipInfo Instance

Normally, the modification date is computed when a file or string is added to the archive. A ZipInfo instance can be passed to writestr() to define the modification date and other metadata.

In [28]:
# zipfile_writestr_zipinfo.py
import time
import zipfile
from zipfile_infolist import print_info

msg = b'This data did not exist in a file.'

with zipfile.ZipFile('writestr_zipinfo.zip',
                     mode='w',
                     ) as zf:
    info = zipfile.ZipInfo('from_string.txt',
                           date_time=time.localtime(time.time()),
                           )
    info.compress_type = zipfile.ZIP_DEFLATED
    info.comment = b'Remarks go here'
    info.create_system = 0
    zf.writestr(info, msg)

print_info('writestr_zipinfo.zip')

from_string.txt
  Comment     : b'Remarks go here'
  Modified    : 2017-02-04 01:07:56
  System      : Windows
  ZIP version : 20
  Compressed  : 36 bytes
  Uncompressed: 34 bytes



In this example, the modified time is set to the current time, the data is compressed, and false value for create_system is used. A simple comment is also associated with the new file.

## Appending to Files

In addition to creating new archives, it is possible to append to an existing archive or add an archive at the end of an existing file (such as a .exe file for a self-extracting archive). To open a file to append to it, use mode 'a'.

In [32]:
# zipfile_append.py
from zipfile_infolist import print_info
import zipfile

print('creating archive')
with zipfile.ZipFile('append.zip', mode='w') as zf:
    zf.write('README.txt')

print('\nprinting info')
print_info('append.zip')

print('appending to the archive')
with zipfile.ZipFile('append.zip', mode='a') as zf:
    zf.write('README.txt', arcname='README2.txt')

print('\nprinting info')
print_info('append.zip')

creating archive

printing info
README.txt
  Comment     : b''
  Modified    : 2017-02-04 01:10:46
  System      : Unix
  ZIP version : 20
  Compressed  : 76 bytes
  Uncompressed: 76 bytes

appending to the archive

printing info
README.txt
  Comment     : b''
  Modified    : 2017-02-04 01:10:46
  System      : Unix
  ZIP version : 20
  Compressed  : 76 bytes
  Uncompressed: 76 bytes

README2.txt
  Comment     : b''
  Modified    : 2017-02-04 01:10:46
  System      : Unix
  ZIP version : 20
  Compressed  : 76 bytes
  Uncompressed: 76 bytes



## Python ZIP Archives

Python can import modules from inside ZIP archives using zipimport, if those archives appear in sys.path. The PyZipFile class can be used to construct a module suitable for use in this way. 

The extra method writepy() tells PyZipFile to scan a directory for .py files and add the corresponding .pyo or .pyc file to the archive. If neither compiled form exists, a .pyc file is created and added.

In [36]:
# zipfile_pyzipfile.py
import sys
import zipfile

if __name__ == '__main__':
    with zipfile.PyZipFile('pyzipfile.zip', mode='w') as zf:
        zf.debug = 3
        print('Adding python files')
        zf.writepy('.')
    for name in zf.namelist():
        print(name)

    print()
    sys.path.insert(0, 'pyzipfile.zip')
    import zipfile_pyzipfile
    print('Imported from:', zipfile_pyzipfile.__file__)
    
    zipfile_pyzipfile.func()

Adding python files
Adding files from directory .
Adding example.pyc
Adding zipfile_infolist.pyc
Adding zipfile_pyzipfile.pyc
example.pyc
zipfile_infolist.pyc
zipfile_pyzipfile.pyc

Imported from: /Users/binyang/GitHub/Py3MOTW/Data_Compression_and_Archiving/zipfile_pyzipfile.py
This is the dev version of shallow func().


## Limitations

The zipfile module does not support ZIP files with appended comments, or multi-disk archives. It does support ZIP files larger than 4 GB that use the ZIP64 extensions.