# Quick Demos: Zip and Unzip in Python

> A quick demo showing how to __zip/unzip files__ in a Jupyter notebook using Python.

- toc: false
- branch: master
- badges: true
- comments: true
- author: David Cato
- categories: [jupyter, python, quick-demo]

---
### Purpose: 

 - Demo how to __zip/unzip files__ in a Jupyter notebook using Python

### Overview:
1. select files in directory
2. sample n files
3. write sampled filepaths into csv
4. __zip__ sampled files
5. __unzip__ sampled files
6. read sampled filepaths from csv

Author: David Cato

In [1]:
from pathlib import Path
from fastai.vision import get_image_files
import numpy as np
from zipfile import ZipFile

In [2]:
# working directory
path = Path('/home/dc/coronahack/source/nih-chest-xrays')

# source directory containing files to zip
src_dir   = path / 'data'

# csv filepath (to be created/overwritten)
csv_dst   = path / 'nih-chest-xrays_sample-2000.csv'

# zip filepath (to be created/overwritten)
zip_dst   = path / 'nih-chest-xrays_sample-2000.zip'

# unzip directory (to be created/overwritten)
unzip_dst = path / 'sample-2000'

## Create Zip

### 1. Select files in specified directory

(e.g all image files in dir + subdirs)

In [3]:
files = sorted(get_image_files(src_dir, recurse=True))
len(files), files[:5]

(112120,
 [PosixPath('/home/dc/coronahack/source/nih-chest-xrays/data/images_001/images/00000001_000.png'),
  PosixPath('/home/dc/coronahack/source/nih-chest-xrays/data/images_001/images/00000001_001.png'),
  PosixPath('/home/dc/coronahack/source/nih-chest-xrays/data/images_001/images/00000001_002.png'),
  PosixPath('/home/dc/coronahack/source/nih-chest-xrays/data/images_001/images/00000002_000.png'),
  PosixPath('/home/dc/coronahack/source/nih-chest-xrays/data/images_001/images/00000003_000.png')])

### 2. Randomly sample `n` files from list

(optional: set seed)

In [4]:
n = 2000

seed = np.random.randint(0, 2**32-1)
# seed = 0
np.random.seed(seed)

sample_paths = np.random.choice(files, n, replace=False)
sample_paths

array([PosixPath('/home/dc/coronahack/source/nih-chest-xrays/data/images_008/images/00017670_005.png'),
       PosixPath('/home/dc/coronahack/source/nih-chest-xrays/data/images_008/images/00016410_036.png'),
       PosixPath('/home/dc/coronahack/source/nih-chest-xrays/data/images_006/images/00012300_000.png'),
       PosixPath('/home/dc/coronahack/source/nih-chest-xrays/data/images_002/images/00003864_001.png'), ...,
       PosixPath('/home/dc/coronahack/source/nih-chest-xrays/data/images_003/images/00004037_001.png'),
       PosixPath('/home/dc/coronahack/source/nih-chest-xrays/data/images_003/images/00006469_009.png'),
       PosixPath('/home/dc/coronahack/source/nih-chest-xrays/data/images_005/images/00009545_002.png'),
       PosixPath('/home/dc/coronahack/source/nih-chest-xrays/data/images_008/images/00017914_000.png')], dtype=object)

### 3. Write csv of original file paths into `csv_dst` file

In [5]:
csv_dst.exists(), csv_dst

(True,
 PosixPath('/home/dc/coronahack/source/nih-chest-xrays/nih-chest-xrays_sample-2000.csv'))

In [6]:
np.savetxt(csv_dst, sample_paths.astype(np.str), fmt='%s', delimiter=',')

### 4. Zip files in list into `zip_dst` file

In [7]:
zip_dst.exists(), zip_dst

(True,
 PosixPath('/home/dc/coronahack/source/nih-chest-xrays/nih-chest-xrays_sample-2000.zip'))

In [8]:
with ZipFile(zip_dst,'w') as zf:
    for fn in sample_paths: 
        zf.write(fn) 

## Unzip files

### 5. Unzip files into `unzip_dst` folder

In [9]:
unzip_dst.mkdir(parents=True, exist_ok=True)
unzip_dst.exists(), unzip_dst

(True, PosixPath('/home/dc/coronahack/source/nih-chest-xrays/sample-2000'))

In [10]:
with ZipFile(zip_dst, 'r') as zf:
    # zf.printdir() # print zip contents
    zf.extractall(unzip_dst)

### 6. Load csv of original file paths

In [11]:
csv_dst.exists(), csv_dst

(True,
 PosixPath('/home/dc/coronahack/source/nih-chest-xrays/nih-chest-xrays_sample-2000.csv'))

In [12]:
np.loadtxt(csv_dst, dtype=np.str, delimiter=',')

array(['/home/dc/coronahack/source/nih-chest-xrays/data/images_008/images/00017670_005.png',
       '/home/dc/coronahack/source/nih-chest-xrays/data/images_008/images/00016410_036.png',
       '/home/dc/coronahack/source/nih-chest-xrays/data/images_006/images/00012300_000.png',
       '/home/dc/coronahack/source/nih-chest-xrays/data/images_002/images/00003864_001.png', ...,
       '/home/dc/coronahack/source/nih-chest-xrays/data/images_003/images/00004037_001.png',
       '/home/dc/coronahack/source/nih-chest-xrays/data/images_003/images/00006469_009.png',
       '/home/dc/coronahack/source/nih-chest-xrays/data/images_005/images/00009545_002.png',
       '/home/dc/coronahack/source/nih-chest-xrays/data/images_008/images/00017914_000.png'], dtype='<U82')