# File I/O

In Python (or other programming languages), data can be printed out on the screen in a human-readable form, or written to a file for future use. 

There are three ways of writing values: 
1) expression statements and the print() function.
2) the standard outputs and errors with sys.stdout and sys.err
3) using the write() method of file objects 

In this lecture, we will discuss 3) (We have unknowingly used the method 1) from the previous classes)

## Formatting

### Old string formatting

In [1]:
a = 3
b = 4
print (a, b)

3 4


In [2]:
print ('a is %d and b is %d.' % (a, b))

a is 3 and b is 4.


### Formatted string literals

In [3]:
print (f'a is {a} and b is {b}.')

a is 3 and b is 4.


In [8]:
import numpy as np
print(f'The value of pi is approximately {np.pi:0.10f}.')

The value of pi is approximately 3.1415926536.


In [9]:
table = {'Seoul': 2, 'Sejong': 44, 'Daejeon': 42, 'Busan': 55}
for city, areacode in table.items():
    print(f'{city:10} ==> 0{areacode:d}')

Seoul      ==> 02
Sejong     ==> 044
Daejeon    ==> 042
Busan      ==> 055


### String format method

Usage: `str.format()`

In [10]:
print('Today is {}, {} {}'.format('Wednesday', 'October', '20th'))

Today is Wednesday, October 20th


In [11]:
print('Today is {0}, {1} {2}'.format('Wednesday', 'October', '20th'))

Today is Wednesday, October 20th


In [12]:
print('Today is {1}, {2} {0}'.format('20th', 'Wednesday', 'October'))

Today is Wednesday, October 20th


In [13]:
print('Today is {0}, {mon} {day}'.format('Wednesday', mon='October', day='20th'))

Today is Wednesday, October 20th


### Manual string format

In [14]:
table = {'Seoul': 2, 'Sejong': 44, 'Daejeon': 42, 'Busan': 55}
print ("0123456789")
for city, areacode in table.items():
    print(f'0{areacode:d}'.center(10))

0123456789
    02    
   044    
   042    
   055    


In [15]:
print ("0123456789")
for city, areacode in table.items():
    print(f'0{areacode:d}'.rjust(10))

0123456789
        02
       044
       042
       055


In [16]:
print ("0123456789")
for city, areacode in table.items():
    print(f'0{areacode:d}'.ljust(10))

0123456789
02        
044       
042       
055       


## Reading and Writing Files

`open()` returns a file object, and is most commonly used with two arguments: `open(filename, mode)`.

*mode* can be 
1) *'r'* for only reading 
2) *'w'* for only writing (an existing file with the same name will be erased)
3) *'a'* for appending (any data written to the file is automatically added to the end)
4) *'r+'* for both reading and writing. 

The mode argument is optional; *'r'* will be assumed if it’s omitted.

In [17]:
f = open('emptyfile', 'w')

When the file writing is completed, it should be closed with `close()`

In [18]:
f.closed

False

In [19]:
f.close()

In [20]:
f.closed

True

In [21]:
with open('emptyfile') as f:
    #something with file
    read_data = f.read()

In [22]:
f.closed

True

In [23]:
read_data

''

In [24]:
f = open('myfile', 'r')
#f.readline()

In [25]:
for line in f:
    print(line, end='')

1 1
2 4
3 9
4 16
5 25

In [26]:
f = open('myfile', 'r')
for line in f.readlines():
    print(line)

1 1

2 4

3 9

4 16

5 25


In [27]:
f = open('myfile', 'r')
list(f)

['1 1\n', '2 4\n', '3 9\n', '4 16\n', '5 25']

In [28]:
f = open('myfile', 'r')
f.readlines()

['1 1\n', '2 4\n', '3 9\n', '4 16\n', '5 25']

In [29]:
f.close()

In [30]:
f = open('myfile', 'r')
lines = f.readlines()
f.close()

f2 = open('newfile', 'w')
for line in lines:
    f2.write(line)

f2.readlines()
f2.close()

UnsupportedOperation: not readable

In [31]:
f2 = open('newfile', 'r')
f2.readlines()

['1 1\n', '2 4\n', '3 9\n', '4 16\n', '5 25']

In [32]:
f2.close()

Other types of objects need to be converted – either to a string (in text mode) or a bytes object (in binary mode) – before writing them:

In [33]:
f2 = open('newfile', 'r+')
f2.write(a)
f2.readlines()

TypeError: write() argument must be str, not int

In [37]:
f2.write(str(a))
f2.flush()
f2.close()
f3 = open('newfile', 'r')
f3.readlines()

['1 1\n', '2 4\n', '3 9\n', '4 16\n', '5 253333']

In [38]:
f3.close()

In [47]:
f4 = open('newfile', 'a')
f4.write("append\n")
f4.close()

In [48]:
f5 = open('newfile', 'r')
print(f5.readlines())
f5.close()

['1 1\n', '2 4\n', '3 9\n', '4 16\n', '5 253333append\n', 'append\n', 'append\n', 'append\n']


## File I/O with numpy

In [49]:
import numpy as np

### numpy binary files

In [50]:
x = np.arange(10)
print (x)

[0 1 2 3 4 5 6 7 8 9]


In [51]:
fnp = open('npfile.npy', 'wb')
np.save(fnp, x)

In [52]:
fnp.close()

In [53]:
fnp = open('npfile.npy', 'rb')
xl = np.load(fnp)

In [54]:
print (xl)

[0 1 2 3 4 5 6 7 8 9]


In [55]:
with open('npfile.npy', 'wb') as f:
    np.save(f, np.array([[1,2], [3,4]]))

In [56]:
with open ('npfile.npy', 'rb') as f:
    xl = np.load(f)
print (xl)

[[1 2]
 [3 4]]


In [57]:
y = np.exp(x)

In [58]:
f = open('npfile.npz', 'wb')
np.savez(f, x, y)
f.close()

In [59]:
f = open ('npfile.npz', 'rb')
npdata = np.load(f)

In [60]:
npdata.files

['arr_0', 'arr_1']

In [62]:
npdata['arr_1']

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [63]:
f.close()

In [64]:
f = open('npfile.npz', 'wb')
np.savez(f, x=x, z=y)
f.close()

In [65]:
f = open ('npfile.npz', 'rb')
npdata = np.load(f)
npdata.files

['x', 'z']

In [66]:
npdata['z']

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [67]:
f.close()

In [68]:
z = np.sin(x)

In [69]:
f = open('npfile.txt', 'w')

In [70]:
np.savetxt(f, x, delimiter=',')
f.close()

In [71]:
f = open('npfile.txt', 'r')
np.loadtxt(f)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [81]:
f = open('npfile.txt', 'w')
np.savetxt(f, (x,y,z))
f.close()
f = open('npfile.txt', 'r')
x, y, _= np.loadtxt(f)
f.close()

In [82]:
print (x)

[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]


In [83]:
print (y)

[1.00000000e+00 2.71828183e+00 7.38905610e+00 2.00855369e+01
 5.45981500e+01 1.48413159e+02 4.03428793e+02 1.09663316e+03
 2.98095799e+03 8.10308393e+03]


In [84]:
print (z)

[ 0.          0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427
 -0.2794155   0.6569866   0.98935825  0.41211849]


## Hierarchical Data Format

Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of data.

HDF5 simplifies the file structure to include only two major types of object:
1) Groups, which are container structures which can hold datasets and other groups
2) Datasets, which are multidimensional arrays of a homogeneous type
3) Attributes, which are small metadata objects describing the nature and/or intended usage of a primary data object

This results in a truly hierarchical, filesystem-like data format

In order to use HDF formatting, first install hdf5 and pytables. Open a terminal and type below command

```
$ conda install pytables
```

PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. 

In [85]:
import tables

In [86]:
tables.__version__

'3.6.1'

In [87]:
h5file = tables.open_file("test.h5", mode="w", title="Test file")

In [88]:
h5file

File(filename=test.h5, title='Test file', mode='w', root_uep='/', filters=Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None))
/ (RootGroup) 'Test file'

In [89]:
group1 = h5file.create_group("/", 'particles', 'particle data')

In [90]:
h5file.root.particles

/particles (Group) 'particle data'
  children := []

In [91]:
group2 = h5file.create_group("/", "twiss", "twiss data")

In [92]:
h5file.root

/ (RootGroup) 'Test file'
  children := ['particles' (Group), 'twiss' (Group)]

In [93]:
# Twiss parameters
ax = -0.5
bx = 1.3
cx = 2.1
ex = 3.1
twiss = np.array([ax, bx, cx, ex])

# Covariance matrix
s11 = ex * bx
s12 = - ex * ax
s22 = ex * cx
cov = np.zeros((2,2))
cov[0,0] = s11
cov[0,1] = s12
cov[1,0] = s12
cov[1,1] = s22

# beam distribution
mean = [np.sqrt(s11), np.sqrt(s22)]
Np = 10000
coords = np.random.multivariate_normal(mean, cov, Np)

In [94]:
earray1 = h5file.create_earray(group1, 'coords', obj=coords)
earray2 = h5file.create_earray(group2, 'twiss', obj=twiss)
earray3 = h5file.create_earray(group2, 'cov', obj=cov)

In [95]:
charge = 1
group3 = h5file.create_group(group1, "charge", "particle charge")
array1 = h5file.create_array(group3, "charge", obj=charge)

In [96]:
h5file.close()

In [97]:
h5file = tables.open_file('test.h5', 'r')

In [98]:
h5file

File(filename=test.h5, title='Test file', mode='r', root_uep='/', filters=Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None))
/ (RootGroup) 'Test file'
/particles (Group) 'particle data'
/particles/coords (EArray(10000, 2)) ''
  atom := Float64Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (4096, 2)
/twiss (Group) 'twiss data'
/twiss/cov (EArray(2, 2)) ''
  atom := Float64Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (4096, 2)
/twiss/twiss (EArray(4,)) ''
  atom := Float64Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (8192,)
/particles/charge (Group) 'particle charge'
/particles/charge/charge (Array()) ''
  atom := Int32Atom(shape=(), dflt=0)
  maindim := 0
  flavor := 'python'
  byteorder := 'little'
  chunkshape := None

In [99]:
h5file.root

/ (RootGroup) 'Test file'
  children := ['particles' (Group), 'twiss' (Group)]

In [100]:
h5file.root.particles

/particles (Group) 'particle data'
  children := ['charge' (Group), 'coords' (EArray)]

In [103]:
h5file.root.particles.coords

/particles/coords (EArray(10000, 2)) ''
  atom := Float64Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (4096, 2)

In [104]:
h5file.root.particles.coords.read()

array([[ 5.09520953, -0.98146941],
       [-0.01643098,  0.30759883],
       [ 1.74871368,  6.94681421],
       ...,
       [ 5.5415468 ,  3.03397117],
       [ 3.79974199,  5.56172535],
       [ 4.20117932,  0.61000866]])

In [105]:
h5file.close()

In [106]:
class Particle(tables.IsDescription):
    name      = tables.StringCol(16)   # 16-character String                                                                                                         
    ADCcount  = tables.UInt16Col()     # Unsigned short integer                                                                                                      
    grid_i    = tables.Int32Col()      # 32-bit integer                                                                                                              
    grid_j    = tables.Int32Col()      # 32-bit integer                                                                                                              
    pressure  = tables.Float32Col()    # float  (single-precision)                                                                                                   
    energy    = tables.Float64Col()    # double (double-precision)                                                                                                   
    idnumber  = tables.Int64Col()      # Signed 64-bit integer                                                                                                       
    pressure2    = tables.Float32Col(shape=(2,3)) # array of floats (single-precision)

h5file = tables.open_file("test2.h5", mode = "w", title = "Test file")
group = h5file.create_group("/", 'detector', 'Detector information')
table = h5file.create_table(group, 'readout', Particle, "Readout example")
particle = table.row
for i in range(10):
    particle['name']  = 'Particle: %6d' % (i)
    particle['ADCcount'] = (i * 256) % (1 << 16)
    particle['grid_i'] = i
    particle['grid_j'] = 10 - i
    particle['pressure'] = float(i*i)
    particle['energy'] = float(particle['pressure'] ** 4)
    particle['idnumber'] = i * (2 ** 34)
    particle['pressure2'] = [
        [0.5+float(i),1.5+float(i),2.5+float(i)],
        [-1.5+float(i),-2.5+float(i),-3.5+float(i)]]
    # Insert a new particle record                                                                                                                            
    particle.append()

h5file.close()

In [107]:
h5file = tables.open_file('test2.h5', 'r')

In [108]:
h5file.root

/ (RootGroup) 'Test file'
  children := ['detector' (Group)]

In [109]:
h5file.root.detector

/detector (Group) 'Detector information'
  children := ['readout' (Table)]

In [110]:
h5file.close()