# Compression in Python – A Tutorial


### What is Compression?

Compression is a method used to shrink the size of files in order to save storage space on a device.  When downloading content from the internet, we often encounter files with the extension .zip or .rar.  These are instances of files that have been compressed.  Upon opening, they create the 'original', accessible file.

   There are diverse and complex schools of thought on how to approach compression, ranging from information theory to statistics.  And as we experience the age of big data, research in this field is in an incessant pursuit for more efficient compression methdos.  In general, the algorithms that make compression possible rely on some rather intuitive mathematical rules.  Text compression essentially remove redundancy from a file by removing redundant repeating characters (or patterns of characters), and keeping track of these in a dictionary.  For image compression, the same idea applies except instead of characters it manipulates pixels and their colors.  

### Data Compression Modules from the Python Standard Library (https://docs.python.org/3/library/archiving.html)

   Python offers several algorithms for this topic.  This tutorial will focus on the most of these methods.  

## — How to use zlib: Functions —

This section will explain how to use zlib.  Zlib is a lossless algorithm that can compress as well as decompress files.  

• We start by importing the zlib object
• For the purpose of this project, we also import sys to indicate the size of files to show how sucessful the compression worked


In [None]:
import zlib

Next we can compress a file of interest using the following.  Use level as a parameter.

In [None]:
zlib.compress(data, level=-1)

    #data is the data being compressed
    #level is the amount of compression desired, from 0-9

To decompress a file we use 

In [None]:
zlib.decompress(data, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE)

    #data is what is being decompressed
    #



`compressobj` creates a compressed object

In [None]:
zlib.compressobj(level=-1, method=DEFLATED, wbits=MAX_WBITS, memLevel=DEF_MEM_LEVEL, strategy=Z_DEFAULT_STRATEGY[, zdict])

    #A level of 1 is fastest
    #'DEFLATED' is the only compression method available
    #wbits controls the window size (max 15)
    #memLevel is the amount of memory used for compression (1-9)
    #Strategy tunes the compression                              ??
    #zdict is the compression dictionary

`decompressobj` creates a decompresed object

In [None]:
zlib.decompressobj(wbits=MAX_WBITS[, zdict])


## — How to use gzip: Functions —

Gzip is a type of compression that builds off of the zlib library.

## — How to use zipfile: Functions —

The .ZIP is one of the most common type of compression/archive file formats available.  There are many functions used for more advanced applications with data structures, so this with focus on a surface level implication that most users would do.

In [1]:
import zipfile

# Project Scripts



# Exploration and Research Application: Fractal Compression

Fractal compression is interesting because it looks for areas of self similarity in an image.  Because of this, it is ideal for natural images (such as landscapes) and combines well with machine learning.  Fractal compression is not ideal for real time applications, as encoding is computationally expensive.  The decoding however is not.  

In [None]:
dict1 = {'¡':'the','™':'be','£':'to',... }

bigstring = input('Type a sentence or two here')

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from PIL import Image
import cv2
from progressbar import ProgressBar
import multiprocessing

def lstsq(img_large, img_small):
    x = np.reshape(img_small, N*N)
    y = np.reshape(img_large, N*N)
    A = np.array([x, np.ones(len(x))]).T
    return np.linalg.lstsq(A,y)

def calc(s, t):
    residues = np.zeros((256-N,256-N))
    for j in range(256-N):
        for i in range(256-N):
            residues[j][i] = lstsq(img1[t:t+N, s:s+N], img2[j:j+N, i:i+N])[1]

    for j in range(256-N):
        for i in range(256-N):
            if residues[j][i] == np.amin(residues):
                x, y = i, j
                break;
        if residues[j][i] == np.amin(residues):
            break

    m, c = lstsq(img1[t:t+N, s:s+N], img2[y:y+N, x:x+N])[0]
    return np.r_[x, y, m, c]

def main(num):
    p = np.zeros((512/N/cpu, 512/N, 4))
    if num == 0: pbar = ProgressBar((512/N/cpu)*(512/N))
    for j in range(512/N/cpu):
        for i in range(512/N):
            p[j][i] = calc(i*N, (j+512/N/cpu*num)*N)
            if num == 0: pbar.update((512/N)*j+i+1)
    return p

if __name__ == '__main__':
    cpu = multiprocessing.cpu_count()
    N = 32
    img1 = np.array(Image.open('Lenna.jpg').convert("L"))
    img2 = cv2.resize(img1,(256,256))

    pool = multiprocessing.Pool(cpu)
    callback = pool.map(main, range(cpu))

    print('process finished!')

    d = np.zeros((512/N, 512/N, 4))
    for i in range(cpu):
        d[i*512/N/cpu:(i+1)*512/N/cpu] = callback[i]

    img3 = np.zeros((256, 256))
    dst = np.zeros((512, 512))

    for k in range(10):
        for j in range(512/N):
            for i in range(512/N):
                dst[j*N:j*N+N, i*N:i*N+N] = img3[d[j][i][1]:d[j][i][1]+N, d[j][i][0]:d[j][i][0]+N] * d[j][i][2] + d[j][i][3]
        img3 = cv2.resize(dst,(256,256))
        cv2.imwrite(str(N)+"_"+str(k+1)+".jpg", dst)


## Works Cited 





In [None]:
ask_for_num = True
while ask_for_num:
    try:
        my_num = int(input("Please enter a number: "))
        ask_for_num = False
    except ValueError:
        print("Oops!  That was no valid number. Try again!")

        
print('gay')

In [None]:
#and (my_num > 9 or my num < 1):