# Introduction

### What Is a ZIP File?

ZIP files also known as **ZIP archives**, are files that use the **ZIP file format**.<br>
**PKWARE** is the company that created and first implemented this file format.<br>
The ZIP file format is a **cross-platform**, interoperable file storage and transfer format. It combines **lossless data compression**, **file management**, and **data encryption**.<br>
**Data compression** isn’t a requirement for an archive to be considered a ZIP file.

### Why Use ZIP Files?

Knowing how to **create**, **read**, **write**, and **extract** ZIP files can be a useful skill for developers and professionals who work with computers and digital information. Among other benefits, ZIP files allow you to:
- **Reduce the size** of files and their storage requirements **without losing information**
- **Improve transfer speed over the network** due to reduced size and single-file transfer
- **Pack several related files** together into a single archive for efficient management
- **Bundle your code** into a single archive for distribution purposes
- **Secure your data** by using encryption, which is a common requirement nowadays
- **Guarantee the integrity** of your information to avoid accidental and malicious changes to your data

### Terminology

Because the terminology around ZIP files can be confusing at times, this tutorial will stick to the following conventions regarding terminology:
<table > 
    <thead>
        <tr>
            <th>Term</th>
            <th >Meaning</th>
        </tr>
    </thead >
    <tbody>
        <tr>
            <td>Term</td>
            <td>ZIP file , ZIP archive , or archive</td> 
        </tr>
        <tr>
            <td>File</td>
            <td>Physical file that uses the ZIP file format</td> 
        </tr>
        <tr> 
            <td>Member file</td> 
            <td>File that is part of an existing ZIP file</td>
        </tr>
    </tbody>
</table >

# Manipulating Existing ZIP Files

## Reading ZIP Files

In [2]:
import zipfile

try:
    with zipfile.ZipFile("spering.zip", mode="r") as archive:
        archive.printdir()
except zipfile.BadZipFile as error:
    print(error)

File Name                                             Modified             Size
spering-html/                                  2020-09-16 11:10:10            0
spering-html/about.html                        2020-07-28 14:47:58        10108
spering-html/category.html                     2020-07-28 14:48:10         9824
spering-html/css/                              2020-09-16 11:10:10            0
spering-html/css/bootstrap.css                 2019-02-13 20:17:50       192348
spering-html/css/responsive.css                2020-01-24 15:36:32         3216
spering-html/css/style.css                     2020-01-24 15:56:28        17458
spering-html/css/style.css.map                 2020-01-24 15:56:28        14126
spering-html/css/style.scss                    2020-01-24 15:56:28        12988
spering-html/images/                           2020-09-16 11:10:10            0
spering-html/images/about-img.jpg              2020-01-23 19:18:30        98932
spering-html/images/c1.png              

- **Try - Except:** to make sure that we are targeting a valid ZIP file before we try to open it.
- **printdir():** provides a quick way to display the content of the underlying ZIP file on the screen

#### Check for a valid ZIP file

In [6]:
print(zipfile.is_zipfile("spering.zip"))
print(zipfile.is_zipfile("not_exist.zip"))

True
False


## Writing to ZIP Files

This adds **hello.txt**(must exist) to a **hello.zip** archive using ZipFile using **writing mode**.

In [None]:
with zipfile.ZipFile("hello.zip", mode="w") as archive:
    archive.write("hello.txt")

- **write()** on the ZipFile object. This method allows you to **write member files** into your ZIP archives. 

#### Note:
If the target ZIP file doesn’t exist, then ZipFile creates it for you when you close the archive:

## Appending to ZIP Files
Allows you to append new member files to an existing ZIP file. This mode **doesn’t truncate** the archive, so its **original content is safe**. If the target ZIP file doesn’t exist, then the "a" mode **creates a new one** for you.

In [None]:
with zipfile.ZipFile("hello.zip", mode="a") as archive:
    archive.write("new_hello.txt")

## Reading Metadata From ZIP Files
The ZipFile class provides several handy **methods** for extracting metadata from existing ZIP files.


<table>
    <thead>
        <tr>
            <th>Method</th>
            <th>Description</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>.getinfo(filename)</td>
            <td>Returns a <code>ZipInfo</code> object with information about the member file provided by <code>filename</code>. Note that <code>filename</code> must hold the path to the target file inside the underlying ZIP file.</td>
        </tr>
        <tr>
            <td>.infolist()</td>
            <td>Returns a list of <code>ZipInfo</code> objects, one per member file.</td>
        </tr>
        <tr>
            <td>.namelist()</td>
            <td>Returns a list holding the names of all the member files in the underlying archive. The names in this list are valid arguments to <code>.getinfo()</code>.</td>
        </tr>
    </tbody>
</table>


### getinfo()

In [9]:
with zipfile.ZipFile("spering.zip", mode="r") as archive:
    info = archive.getinfo("spering-html/index.html")
    
print(info.file_size)
print(info.compress_size)
print(info.filename)
print(info.date_time)

23212
3183
spering-html/index.html
(2020, 7, 28, 14, 47, 48)


#### Note: 
**ZipInfo** isn’t intended to be instantiated directly. The .getinfo() and .infolist() methods return ZipInfo objects automatically when you call them. However, ZipInfo includes a class method called **.from_file()**, which allows you to instantiate the class explicitly if you ever need to do it.

### infolist()

In [10]:
import datetime

with zipfile.ZipFile("spering.zip", mode="r") as archive:
    for info in archive.infolist():
        print(f"Filename: {info.filename}")
        print(f"Modified: {datetime.datetime(*info.date_time)}")
        print(f"Normal size: {info.file_size} bytes")
        print(f"Compressed size: {info.compress_size} bytes")
        print("-" * 20)

Filename: spering-html/
Modified: 2020-09-16 11:10:10
Normal size: 0 bytes
Compressed size: 0 bytes
--------------------
Filename: spering-html/about.html
Modified: 2020-07-28 14:47:58
Normal size: 10108 bytes
Compressed size: 2142 bytes
--------------------
Filename: spering-html/category.html
Modified: 2020-07-28 14:48:10
Normal size: 9824 bytes
Compressed size: 1855 bytes
--------------------
Filename: spering-html/css/
Modified: 2020-09-16 11:10:10
Normal size: 0 bytes
Compressed size: 0 bytes
--------------------
Filename: spering-html/css/bootstrap.css
Modified: 2019-02-13 20:17:50
Normal size: 192348 bytes
Compressed size: 24679 bytes
--------------------
Filename: spering-html/css/responsive.css
Modified: 2020-01-24 15:36:32
Normal size: 3216 bytes
Compressed size: 825 bytes
--------------------
Filename: spering-html/css/style.css
Modified: 2020-01-24 15:56:28
Normal size: 17458 bytes
Compressed size: 2644 bytes
--------------------
Filename: spering-html/css/style.css.map
Mod

### namelist()

If you just need to perform a **quick check** on a ZIP file and list the names of its member files, then you can use **.namelist()**:

In [11]:
with zipfile.ZipFile("spering.zip", mode="r") as archive:
    for filename in archive.namelist():
        print(filename)

spering-html/
spering-html/about.html
spering-html/category.html
spering-html/css/
spering-html/css/bootstrap.css
spering-html/css/responsive.css
spering-html/css/style.css
spering-html/css/style.css.map
spering-html/css/style.scss
spering-html/images/
spering-html/images/about-img.jpg
spering-html/images/c1.png
spering-html/images/c2.png
spering-html/images/c3.png
spering-html/images/c4.png
spering-html/images/c5.png
spering-html/images/c6.png
spering-html/images/call.png
spering-html/images/experience-img.jpg
spering-html/images/f1.png
spering-html/images/f2.png
spering-html/images/f3.png
spering-html/images/f4.png
spering-html/images/fb.png
spering-html/images/freelance-img.jpg
spering-html/images/instagram.png
spering-html/images/linkedin.png
spering-html/images/location.png
spering-html/images/logo.png
spering-html/images/mail.png
spering-html/images/menu.png
spering-html/images/next-angle.png
spering-html/images/next.png
spering-html/images/prev-angle.png
spering-html/images/prev

Because the filenames in this output are valid arguments to .getinfo(), you can combine these two methods to retrieve information about selected member files only.

#### Example:
You may have a ZIP file containing different types of member files (.docx, .xlsx, .txt, and so on).<br>
Instead of getting the complete information with **.infolist()**, you just need to get the information about the .docx files.<br>
Then you can filter the files by their extension and call **.getinfo()** on your .docx files only. Go ahead and give it a try!

## Reading From and Writing to Member Files

**.read()** takes a member file’s name and returns that file’s content as bytes:

In [16]:
with zipfile.ZipFile("spering.zip", mode="r") as archive:
    for line in archive.read("spering-html/js/custom.js").split(b"\n"):
        print(line)

b'// nav menu style\r'
b'var nav = $("#navbarSupportedContent");\r'
b'var btn = $(".custom_menu-btn");\r'
b'btn.click\r'
b'btn.click(function (e) {\r'
b'\r'
b'    e.preventDefault();\r'
b'    nav.toggleClass("lg_nav-toggle");\r'
b'    document.querySelector(".custom_menu-btn").classList.toggle("menu_btn-style")\r'
b'});\r'
b'\r'
b'\r'
b'function getCurrentYear() {\r'
b'    var d = new Date();\r'
b'    var currentYear = d.getFullYear()\r'
b'\r'
b'    $("#displayDate").html(currentYear);\r'
b'}\r'
b'\r'
b'getCurrentYear();'


To use **.read()**, you need to open the ZIP file for reading or appending.<br>
Note that **.read()** returns the content of the target file as a stream of bytes.<br>
In this example, you use **.split()** to split the stream into lines, using the line feed character **"\n" as a separator**. Because **.split()** is operating on a byte object, you need to add a leading **b** to the string used as an argument.

- **ZipFile.read()** also accepts a second positional argument called pwd. This argument allows you to provide a password for reading encrypted files.

In [None]:
with zipfile.ZipFile("sample_pwd.zip", mode="r") as archive:
    for line in archive.read("hello.txt", pwd=b"secret").split(b"\n"):
        print(line)

#### Note
For large encrypted ZIP files, keep in mind that the decryption operation **can be extremely slow** because it’s implemented in pure Python. In such cases, consider using a **specialized program** to handle your archives instead of using zipfile.

- you can use ZipFile.setpassword() to set a global password

In [None]:
with zipfile.ZipFile("sample_pwd.zip", mode="r") as archive:
    archive.setpassword(b"secret")
    for file in archive.namelist():
        print(file)
        print("-" * 20)
        for line in archive.read(file).split(b"\n"):
            print(line)

#### Note
Consider that when you use the pwd argument, you’re overriding whatever archive-level password you may have set with **.setpassword()**.

#### Simpler Approach
**ZipFile.open()** is for you. Like the built-in open() function, this method implements the context manager protocol, and therefore it supports the with statement:

In [None]:
with zipfile.ZipFile("sample.zip", mode="r") as archive:
    with archive.open("hello.txt", mode="r") as hello:
        for line in hello:
            print(line)

- You can also use **.open()** with the "w" or "a" mode to create files or appending to them ... 

In [None]:
with zipfile.ZipFile("sample.zip", mode="a") as archive:
    with archive.open("new_hello.txt", "w") as new_hello:
        new_hello.write(b"Hello, World!")

### Reading the Content of Member Files as Text
- bytes.decode()
- io.TextIOWrapper

In [None]:
# First Method
with zipfile.ZipFile("sample.zip", mode="r") as archive:
    text = archive.read("hello.txt").decode(encoding="utf-8")

In [None]:
# Second Method


## Extracting Member Files From Your ZIP Archives

**ZipFile.extract()** takes the name of a member file and extracts it to a given directory signaled by path. The destination path defaults to the current directory:

In [None]:
with zipfile.ZipFile("sample.zip", mode="r") as archive:
    for file in archive.namelist():
        if file.endswith(".md"):
            archive.extract(file, "output_dir/")

#### Note
If the target filename already exists in the output directory, then **.extract()** overwrites it without asking for confirmation.

**ZipFile.extractall()** for extracting all the member files from an archive.

In [None]:
with zipfile.ZipFile("sample.zip", mode="r") as archive:
    archive.extractall("output_dir/")

- If you only need to **extract some of the member files** from a given archive, then you can use the **members argument**. This argument accepts a list of member files, which should be a subset of the whole list of files in the archive at hand.<br>
- Finally, just like .extract(), the .extractall() method also accepts a **pwd argument to extract encrypted files**.

## Closing ZIP Files After Use

It’s convenient for you to open a given ZIP file without using a with statement. In those cases, you need to manually close the archive after use to complete any writing operations and to free the acquired resources.

In [None]:
archive = zipfile.ZipFile("sample.zip", mode="r")
archive.printdir()

# Close the archive when you're done
archive.close()
archive

# Manipulating Own ZIP Files

## Creating a ZIP File From Multiple Regular Files

In [None]:
filenames = ["hello.txt", "lorem.md", "realpython.md"]

with zipfile.ZipFile("multiple_files.zip", mode="w") as archive:
    for filename in filenames:
        archive.write(filename)

## Building a ZIP File From a Directory

#### Example:
source_dir/<br>
│<br>
├── hello.txt<br>
├── lorem.md<br>
└── realpython.md<br>

- Because the directory **doesn’t contain subdirectories**, you can use pathlib.Path.iterdir() to iterate over its content directly.

In [None]:
import pathlib

directory = pathlib.Path("source_dir/")

with zipfile.ZipFile("directory.zip", mode="w") as archive:
    for file_path in directory.iterdir():
        archive.write(file_path, arcname=file_path.name)

In this case, we passed **file_path.name** to the second argument of **.write()**. This argument is called **arcname** and holds the **name of the member file inside the resulting archive**. All the examples that you’ve seen so far rely on the **default value of arcname**, which is the same filename you pass as the first argument to .write().

#### Example
root_dir/<br>
│<br>
├── sub_dir/<br>
│   └── new_hello.txt<br>
│<br>
├── hello.txt<br>
├── lorem.md<br>
└── realpython.md<br>

- You have the **usual files and a subdirectory** with a single file in it. If you want to create a ZIP file with this same internal structure, then you need a tool that recursively iterates through the directory tree under root_dir/.

In [None]:
directory = pathlib.Path("root_dir/")

with zipfile.ZipFile("directory_tree.zip", mode="w") as archive:
    for file_path in directory.rglob("*"):
        archive.write(
            file_path,
            arcname=file_path.relative_to(directory)
        )

n this example, you use **Path.rglob()** to recursively traverse the directory tree under **root_dir/**. Then you write every file and subdirectory to the target ZIP archive.

# Compressing Files and Directories

The **compression** method is the third argument to the initializer of ZipFile. If you want to compress your files while you write them into a ZIP archive, then you can set this argument to one of the following constants:

<table>
    <thead>
        <tr>
            <th>Constant</th>
            <th>Compression Method</th>
            <th>Required Module</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>zipfile.ZIP_DEFLATED</td>
            <td>Deflate</td>
            <td>zlib</td>
        </tr>
        <tr>
            <td>zipfile.ZIP_BZIP2</td>
            <td>Bzip2</td>
            <td>bz2</td>
        </tr>
        <tr>
            <td>zipfile.ZIP_LZMA</td>
            <td>LZMA</td>
            <td>lzma</td>
        </tr>
    </tbody>
</table>

#### Example:
Now say you want to archive and compress the content of a given directory using the Deflate method, which is the most commonly used method in ZIP files. To do that, you can run the following code:

In [None]:
import pathlib
from zipfile import ZipFile, ZIP_DEFLATED

directory = pathlib.Path("source_dir/")

with ZipFile("comp_dir.zip", "w", ZIP_DEFLATED, compresslevel=9) as archive:
    for file_path in directory.rglob("*"):
        archive.write(file_path, arcname=file_path.relative_to(directory))

# Additional Classes From zipfile

## Finding Path in a ZIP File

The zipfile.Path class allows you to construct path objects to quickly create and manage paths to member files and directories inside a given ZIP file. The class takes two arguments:

- **root** accepts a ZIP file, either as a ZipFile object or a string-based path to a physical ZIP file.
- **at** holds the location of a specific member file or directory inside the archive. It defaults to the empty string, representing the root of the archive.

In [None]:
import zipfile

hello_txt = zipfile.Path("sample.zip", "hello.txt")

print(hello_txt)

print(hello_txt.name)

print(hello_txt.is_file())

print(hello_txt.exists())

print(hello_txt.read_text())

This code shows that **zipfile.Path** implements several features that are common to a **pathlib.Path** object. You can get the name of the file with **.name**. You can check if the path points to a regular file with **.is_file()**. You can check if a given file exists inside a particular ZIP file, and more.

# Sources
- <a href="https://realpython.com/python-zipfile/">Python's zipfile: Manipulate Your ZIP Files Efficiently by Real Python</a>