# gzip, zipfile, tarfile 模块：处理压缩文件

<h1>Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#shutil.make_archive" data-toc-modified-id="shutil.make_archive-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>shutil.make_archive</a></span></li><li><span><a href="#bz2-模块" data-toc-modified-id="bz2-模块-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>bz2 模块</a></span></li><li><span><a href="#zlib模块" data-toc-modified-id="zlib模块-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>zlib模块</a></span></li><li><span><a href="#gzip-模块" data-toc-modified-id="gzip-模块-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>gzip 模块</a></span></li><li><span><a href="#zipfile-模块" data-toc-modified-id="zipfile-模块-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>zipfile 模块</a></span></li><li><span><a href="#tarfile-模块" data-toc-modified-id="tarfile-模块-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>tarfile 模块</a></span></li></ul></div>

## shutil.make_archive

* **压缩文件**---支持bztar\gztar\tar\zip格式
```python
shutil.make_archive(basename, format, root_dir)
# eg
shutil.make_archive("test_archive", "zip", "test_dir/")
```

In [1]:
import os, shutil, glob
import zlib, gzip, bz2, zipfile, tarfile

## bz2 模块

In [2]:
orginal = "this is a test string"

compressed = bz2.compress(orginal)

print compressed
print bz2.decompress(compressed)

BZh91AY&SY*�v  	��@ "�   1 0"zi��FLT`�軒)�P�˰
this is a test string


## zlib模块

In [3]:
# zlib 提供了对字符串进行压缩和解压缩的功能：
# zlib.compress/zlib.decompress
orginal = "this is a test string"
compressed = zlib.compress(orginal)

print compressed
print zlib.decompress(compressed)

x�+��,V �D�����⒢̼t S��
this is a test string


In [4]:
# 校验和的计算方法
print zlib.adler32(orginal) & 0xffffffff
print zlib.crc32(orginal) & 0xffffffff

1407780813
4236695221


## gzip 模块

In [5]:
# 直接open好像也可以,这样他就相当于是一个文件
# 并不是压缩文件
content = "Lots of content here"
with open('file.txt.gz', 'wb') as f:
    f.write(content)
    
with open('file.txt.gz', 'rb') as f:
    file_content = f.read()
print file_content

Lots of content here


In [6]:
# 他不是压缩文件所以解压会报错
try:
    with gzip.open('file.txt.gz', 'rb') as f_in, open('file.txt', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)
except IOError as msg:
    print msg

Not a gzipped file


In [7]:
# 产生 .gz 格式的文件，其压缩方式由 zlib 模块提供
# gzip.open 方法来读写 .gz 格式的文件
content = "Lots of content here"
with gzip.open('file.txt.gz', 'wb') as f:
    f.write(content)
    
with gzip.open('file.txt.gz', 'rb') as f:
    file_content = f.read()
print file_content

Lots of content here


In [8]:
# 现在就不会报错了
try:
    # 这里学一下，with后面的语句可以连着写
    # shutil.copyfileobj方法
    # 相当于f_in是解压的文件内容，新建一个文件f_out对象
    # 将f_in里的内容复制到f_out
    with gzip.open('file.txt.gz', 'rb') as f_in, open('file.txt', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)
except IOError as msg:
    print msg

In [9]:
# 看一下文件
!ls

01-python-overview.ipynb
02-ipython-interpreter.ipynb
03-ipython-terminal.ipynb
04-pprint.ipynb
05-pickle-and-cPickle.ipynb
06-json.ipynb
07-glob.ipynb
08-shutil.ipynb
09-gzip,-zipfile,-tarfile.ipynb
file.txt
file.txt.gz


In [10]:
!cat file.txt
!cat file.txt.gz

Lots of content here
b峊[�ile.txt 笊/)V萇SH蜗+I�Q菻-J ~u �   


In [11]:
os.remove("file.txt.gz")

## zipfile 模块

In [12]:
# 先生成一些文件
for i in xrange(10):
    shutil.copy('file.txt','file-'+str(i)+'.txt')

In [13]:
# 压缩
f = zipfile.ZipFile('file.zip','w')
for name in glob.glob('*.txt'):
    f.write(name)
    os.remove(name)
f.close()

In [14]:
!ls

01-python-overview.ipynb
02-ipython-interpreter.ipynb
03-ipython-terminal.ipynb
04-pprint.ipynb
05-pickle-and-cPickle.ipynb
06-json.ipynb
07-glob.ipynb
08-shutil.ipynb
09-gzip,-zipfile,-tarfile.ipynb
file.zip


In [15]:
# 解压
f = zipfile.ZipFile('file.zip','r')
# 使用namelist()查看文件名，返回文件名列表
print f.namelist()

['file-0.txt', 'file-1.txt', 'file-2.txt', 'file-3.txt', 'file-4.txt', 'file-5.txt', 'file-6.txt', 'file-7.txt', 'file-8.txt', 'file-9.txt', 'file.txt']


In [16]:
# 读取文件内容
for name in f.namelist():
    print name,'content:',f.read(name)

file-0.txt content: Lots of content here
file-1.txt content: Lots of content here
file-2.txt content: Lots of content here
file-3.txt content: Lots of content here
file-4.txt content: Lots of content here
file-5.txt content: Lots of content here
file-6.txt content: Lots of content here
file-7.txt content: Lots of content here
file-8.txt content: Lots of content here
file-9.txt content: Lots of content here
file.txt content: Lots of content here


In [17]:
# extract(name) 或者 extractall() 解压单个或者全部文件
f.extract(f.namelist()[-1])

'E:\\Project _Sources\\notebook\\python-tools\\file.txt'

## tarfile 模块

In [18]:
# 支持 .tar 格式文件的读写：
with tarfile.open('file.txt.tar','w')as f:
    f.add('file.txt')

In [19]:
!ls

01-python-overview.ipynb
02-ipython-interpreter.ipynb
03-ipython-terminal.ipynb
04-pprint.ipynb
05-pickle-and-cPickle.ipynb
06-json.ipynb
07-glob.ipynb
08-shutil.ipynb
09-gzip,-zipfile,-tarfile.ipynb
file.txt
file.txt.tar
file.zip


In [20]:
file_name = glob.glob('file*')
for name in file_name:
    print 'Remove '+name+' successful.'
    os.remove(name)

Remove file.txt successful.
Remove file.txt.tar successful.
Remove file.zip successful.


In [21]:
!ls

01-python-overview.ipynb
02-ipython-interpreter.ipynb
03-ipython-terminal.ipynb
04-pprint.ipynb
05-pickle-and-cPickle.ipynb
06-json.ipynb
07-glob.ipynb
08-shutil.ipynb
09-gzip,-zipfile,-tarfile.ipynb
