#Table of Contents
* [Purpose](#Purpose)
* [Try out zipfile standard library](#Try-out-zipfile-standard-library)
	* [Open zip file & get list of names of files contained in the zip file](#Open-zip-file-&-get-list-of-names-of-files-contained-in-the-zip-file)
	* [Verify that can read data from each individual file in the zip file](#Verify-that-can-read-data-from-each-individual-file-in-the-zip-file)


In [1]:
%%javascript
IPython.load_extensions('calico-document-tools');

<IPython.core.display.Javascript object>

In [1]:
import zipfile as zf
from __future__ import division
from __future__ import print_function

# Purpose

Avoid dealing directly with 100's of MBs of .csv files of data by using the [zipfile standard library](https://docs.python.org/3/library/zipfile.html) to read data directly from the data files which are all compressed into a single .zip file.

# Try out zipfile standard library

## Open zip file & get list of names of files contained in the zip file

In [2]:
myzip = zf.ZipFile('20150316_data.zip','r')

In [3]:
myzip.namelist()

['sample1_100.0us.csv',
 'sample1_200.0us.csv',
 'sample1_300.0us.csv',
 'sample1_400.0us.csv',
 'sample1_500.0us.csv',
 'sample1_600.0us.csv',
 'sample1_700.0us.csv',
 'sample1_800.0us.csv',
 'sample1_900.0us.csv',
 'sample1_1000.0us.csv',
 'sample1_1500.0us.csv',
 'sample1_2000.0us.csv',
 'sample1_2500.0us.csv',
 'sample1_3000.0us.csv',
 'sample1_4000.0us.csv',
 'sample1_5000.0us.csv',
 'sample1_8000.0us.csv',
 'sample1_9000.0us.csv']

In [4]:
filenames = myzip.namelist()
print(filenames[0])

sample1_100.0us.csv


## Verify that can read data from each individual file in the zip file

In [5]:
nlines = 3
with myzip.open(filenames[0]) as myfile:
    counter = 0
    for i in range(0,nlines):
        print(counter, filenames[0], myfile.readline(), end="")
        counter += 1

0 sample1_100.0us.csv #time_s, Vpp, PMT
1 sample1_100.0us.csv -1.000066E-03, 0.000000, 0.000000
2 sample1_100.0us.csv -1.000056E-03, 0.000000, 0.020101


In [7]:
nlines = 3
for fname in filenames:
    with myzip.open(fname) as f:
        counter = 0
        for i in range(0,nlines):
            print(counter, fname, ': ', f.readline(), end="")
            counter += 1
        print('-----------')

0 sample1_100.0us.csv :  #time_s, Vpp, PMT
1 sample1_100.0us.csv :  -1.000066E-03, 0.000000, 0.000000
2 sample1_100.0us.csv :  -1.000056E-03, 0.000000, 0.020101
-----------
0 sample1_200.0us.csv :  #time_s, Vpp, PMT
1 sample1_200.0us.csv :  -1.000066E-03, 0.000000, -0.020101
2 sample1_200.0us.csv :  -1.000056E-03, 0.000000, -0.040201
-----------
0 sample1_300.0us.csv :  #time_s, Vpp, PMT
1 sample1_300.0us.csv :  -1.000066E-03, -0.080402, -0.040201
2 sample1_300.0us.csv :  -1.000056E-03, 0.000000, 0.000000
-----------
0 sample1_400.0us.csv :  #time_s, Vpp, PMT
1 sample1_400.0us.csv :  -1.000066E-03, 0.000000, 0.000000
2 sample1_400.0us.csv :  -1.000056E-03, 0.000000, 0.000000
-----------
0 sample1_500.0us.csv :  #time_s, Vpp, PMT
1 sample1_500.0us.csv :  -1.000066E-03, 0.000000, -0.020101
2 sample1_500.0us.csv :  -1.000056E-03, 0.000000, -0.020101
-----------
0 sample1_600.0us.csv :  #time_s, Vpp, PMT
1 sample1_600.0us.csv :  -1.000066E-03, 0.000000, 0.000000
2 sample1_