# Files, Dicts, and HDF5 Files

**Somehow** we have to access data, which is often stored in files. I'll show a few ways of loading data from files (many of which you can skip, but otherwise quite useful in general and frequently used), leading up to HDF5 files. I'll also describe Python Dictionaries because the way you access data in an HDF5 File is related to the way you access data in a Python Dictionary.

# Files (skippable)

In [2]:
# Writing numbers from 0 to 9 in a file using vanilla python

with open('trash01.txt','w') as _file:
    for i in range(10):
        _file.write(str(i))
        _file.write('\n')

# the 'with' keyword will automatically close and save the file when it is done

In [19]:
# I encourage you to look at these files after they are made to see what they look like inside
# Go ahead and delete them when you are done

# Reading file using vanilla python
with open('trash01.txt','r') as _file:
    print(_file.read())
    
with open('trash01.txt','r') as _file:
    lines = _file.readlines()

print('readlines\n',lines)

# An example of parsing a file read with vanilla python
print('process the lines as ints\n',list(map(int,lines)))

# Reading and writing files using numpy 
import numpy # install

print('loadtxt\n',numpy.loadtxt('trash01.txt'))

data = numpy.arange(10)[:,None] + numpy.zeros(3)

numpy.savetxt('trash02.txt',data)

numpy.save('trash03.npy',data)

print('numpy.loadtxt\n',numpy.loadtxt('trash02.txt'))
print('numpy.load\n',numpy.load('trash03.npy'))



0
1
2
3
4
5
6
7
8
9

readlines
 ['0\n', '1\n', '2\n', '3\n', '4\n', '5\n', '6\n', '7\n', '8\n', '9\n']
process the lines as ints
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
loadtxt
 [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
numpy.loadtxt
 [[0. 0. 0.]
 [1. 1. 1.]
 [2. 2. 2.]
 [3. 3. 3.]
 [4. 4. 4.]
 [5. 5. 5.]
 [6. 6. 6.]
 [7. 7. 7.]
 [8. 8. 8.]
 [9. 9. 9.]]
numpy.load
 [[0. 0. 0.]
 [1. 1. 1.]
 [2. 2. 2.]
 [3. 3. 3.]
 [4. 4. 4.]
 [5. 5. 5.]
 [6. 6. 6.]
 [7. 7. 7.]
 [8. 8. 8.]
 [9. 9. 9.]]


# Dicts (skippable)

**Dictionaries** are a very useful Python data structure related to mystical entities called hash maps. I'll show some examples on how to use them, which will be useful for HDF5 files. 

In [32]:
# make an empty dict
_dict = {}

print(_dict)

# make a dict with some stuff in it
_dict = {'a':1, 'b':2}

print(_dict)
print(_dict['a'])
print(_dict['b'])

# see if a key is in _dict
print('c' in _dict)
print('b' in _dict)
print('a' in _dict)

# add stuff to a dict
_dict['c'] = 3
_dict[1] = 'apple'
_dict[2] = 'banana'
_dict[3] = 'cherry'

print(_dict)

# modify stuff
_dict[1] = 'alpha'
_dict[2] = 'beta'
_dict[3] = 'gamma'

print(_dict)

print(_dict.keys())
print(_dict.values())

{}
{'a': 1, 'b': 2}
1
2
False
True
True
{'a': 1, 'b': 2, 'c': 3, 1: 'apple', 2: 'banana', 3: 'cherry'}
{'a': 1, 'b': 2, 'c': 3, 1: 'alpha', 2: 'beta', 3: 'gamma'}
dict_keys(['a', 'b', 'c', 1, 2, 3])
dict_values([1, 2, 3, 'alpha', 'beta', 'gamma'])


In [45]:
import numpy #install
import h5py # install
import make_h5 # included as .py

make_h5.main() # creates trash.h5

with h5py.File('trash.h5','r') as _file:
    
    # a python object representing the file
    print(_file)
    # finding the keys, as you would find the keys of a dict
    print(_file.keys())
    # using the key 'example'
    print(_file['example'])
    # using the key 'radius'
    print(_file['radius'])
    # accessing the [0,0,0] element of 'radius', since it is an array
    print(_file['radius'][0,0,0])
    # finding the sum of the array 'radius,' treating it as a numpy array
    print(numpy.sum(_file['radius']))

<HDF5 file "trash.h5" (mode r)>
<KeysViewHDF5 ['example', 'radius']>
<HDF5 dataset "example": shape (50, 50, 50), type "<f8">
<HDF5 dataset "radius": shape (50, 50, 50), type "<f8">
42.4352447854375
3001354.051036985


# Exercise: Print the sum of the middle slice of the 'example' dataset of 'trash.h5' for all 3 directions

In [1]:
import make_h5
make_h5.solution()


38.0
242.0
902.0
