# EPA1333 - Computer Engineering for Scientific Computing
## Week 4 - Sept 28, 2017

**Think  Python** -
**How to Think Like a Computer Scientist**

*Allen B. Downey*


## Ch 14: Files

Reading, writing, creating, renaming and deleting files.

Before we can read/write from/to a file, it must be opened.

```python
f = open( filename, mode )

f.readline()     # Read a single line
f.readlines()    # Returns a list of all lines
f.write( "text" )

f.close()
```

mode indicates how we want to use the file, use a *combination* of:
  * 'r' - open the file for reading (default)
  * 'w' - open the file for writing, truncate the file!
  * 'a' - open the file for writing, append to the end.
  * 'x' - create the file and open file for writing.
  
  
  * 't' - open the file in text mode (recognizes newlines) (default)
  * 'b' - open the file in binary mode (interpret all characters as-is)


In [1]:
# List files in directory
import os

os.listdir()

['.anaconda',
 '.cisco',
 '.conda',
 '.condarc',
 '.continuum',
 '.ipynb_checkpoints',
 '.ipython',
 '.jupyter',
 '.LSC',
 '.matplotlib',
 '.Mendeley Desktop',
 '.QtWebEngineProcess',
 '.spyder-py3',
 'AnacondaProjects',
 'AppData',
 'Application Data',
 'Assignment-Week1-solution.ipynb',
 'Assignment-Week1.ipynb',
 'Assignment-Week1_09092017_1959H (2).ipynb',
 'Assignment-Week1_09092017_1959H.ipynb',
 'Assignment-Week3.ipynb',
 'Coding Challenge_21092017_1113H.R',
 'Contacts',
 'Cookies',
 'Desktop',
 'Documents',
 'Downloads',
 'Favorites',
 'Intel',
 'IntelGraphicsProfiles',
 'Links',
 'Local Settings',
 'Music',
 'My Documents',
 'NetHood',
 'NTUSER.DAT',
 'ntuser.dat.LOG1',
 'ntuser.dat.LOG2',
 'NTUSER.DAT{1d5171a9-76d6-11e7-831d-df6cecc777d0}.TM.blf',
 'NTUSER.DAT{1d5171a9-76d6-11e7-831d-df6cecc777d0}.TMContainer00000000000000000001.regtrans-ms',
 'NTUSER.DAT{1d5171a9-76d6-11e7-831d-df6cecc777d0}.TMContainer00000000000000000002.regtrans-ms',
 'ntuser.ini',
 'Pictures',
 'PrintHoo

In [3]:
# Create a file for writing
f = open('NEW_FILE.TXT', 'x')

FileExistsError: [Errno 17] File exists: 'NEW_FILE.TXT'

In [None]:
os.listdir()

In [None]:
text = """Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Aenean vel magna scelerisque libero tempor consequat eu 
sit amet tellus. Donec sit amet ipsum id magna semper eleifend. 
Nam vehicula augue eget est pretium, quis accumsan est rutrum. 
Donec elit lorem, pretium ac semper gravida, accumsan sed mauris. 
Quisque quis enim eget tortor maximus fringilla quis sed libero. 
Proin ut gravida lectus. Mauris suscipit tempor lacus sed ultrices. 
Maecenas accumsan pretium posuere. Etiam facilisis ligula diam, 
ac ultrices ligula posuere id."""

In [4]:
# Write something to the file... explicitly write a newline.
f.write( 'This is a line of text.\n' )


f.write( text )
f.flush()         # flush the file, necessary in case of buffering.
f.close()         # close the file, which also forces a flush()

NameError: name 'text' is not defined

In [6]:
# check file with os type command / repleace type with cat for linux / unix
!type NEW_FILE.TXT


In [None]:
# Now open the file again for reading.

f = open( 'NEW_FILE.TXT', 'r')

for line in f.readlines():
    print('Read line:', line, end='' )
    
f.close()

In [None]:
# Append a line to the end
f = open('NEW_FILE.TXT', 'a')

f.write('\nA last line at the end\n')

f.close()

In [None]:
!type NEW_FILE.TXT

<div class="alert alert-success">
<h3>Exercise 1</h3>
Write a Python program that counts the number of lines in a file.
</div>

In [None]:
f=open('NEW_FILE.TXT','r')
a=0
for line in f.readline():
    a=a+1
    

#### Renaming and deleting files

We can rename and delete files with the following commands.

```python
import os

os.rename( file1, file2)  # Rename file1 to file2
os.remove( file )         # Delete a file
```

In [None]:
os.rename( 'NEW_FILE.TXT', 'RENAMED_FILE.TXT')

os.listdir()

In [None]:
os.remove( 'RENAMED_FILE.TXT' )

os.listdir()

#### Creating, deleting directories / folders

Python can also manipulate directories / folders.

```python
os.getcwd()     # get current directory
os.chdir( dir ) # change current directory ('..' is the parent directory)

os.mkdir( dir ) # create a new directory in the current directory
os.rmdir( dir ) # remove a directory
```


<div class="alert alert-success">
<h3>Exercise 2</h3>
Write a Python program that takes a directory as input and "walks" through it, printing the file names and calling itself recurively for the inner directories. You might want to check os.path.isfile(). Use \\\ instead of \ in the path string.
</div>

In [38]:
def walk (dirname):
    for name in os.listdir(dirname):
        
        path = os.path.join(dirname,name)
        
        if os.path.isfile(path):
            print(path)
        else:
            walk(path)

walk('C:\\Users\\USER\\')

C:\Users\USER\.anaconda\navigator\anaconda-navigator.ini
C:\Users\USER\.anaconda\navigator\channels\metadata.json
C:\Users\USER\.anaconda\navigator\channels\repo.continuum.io_pkgs_free_noarch_repodata.json.bz2
C:\Users\USER\.anaconda\navigator\channels\repo.continuum.io_pkgs_free_win-64_repodata.json.bz2
C:\Users\USER\.anaconda\navigator\channels\repo.continuum.io_pkgs_msys2_noarch_repodata.json.bz2
C:\Users\USER\.anaconda\navigator\channels\repo.continuum.io_pkgs_msys2_win-64_repodata.json.bz2
C:\Users\USER\.anaconda\navigator\channels\repo.continuum.io_pkgs_pro_noarch_repodata.json.bz2
C:\Users\USER\.anaconda\navigator\channels\repo.continuum.io_pkgs_pro_win-64_repodata.json.bz2
C:\Users\USER\.anaconda\navigator\channels\repo.continuum.io_pkgs_r_noarch_repodata.json.bz2
C:\Users\USER\.anaconda\navigator\channels\repo.continuum.io_pkgs_r_win-64_repodata.json.bz2
C:\Users\USER\.anaconda\navigator\content\content.json
C:\Users\USER\.anaconda\navigator\content\events.json
C:\Users\USER\.

PermissionError: [WinError 5] Access is denied: 'C:\\Users\\USER\\AppData\\Local\\Application Data'

### Storing data persistently

If you want to store (intermediate) results persistently, you have to save the result on persistent storage, such as a file or a persistent database.

Internal python datastructures such as lists, dictionaries, tuples, etc. are all lost as soon as python quits.

There are two things you have to do:
  1. Choose a persistent storage medium (file, database, etc).
  2. Choose a *suitable format* in which you want to store your data (data representation format)   

#### Suitable format

Not all datastructures in Python cannot be immediately written to a file/database.
The binary data often difficult to represent in a file/database.
  * simple types such as integers, floats, strings are *usually* ok.
  * binary files can only be read/edited by the 'same program', usually not human readable.

**Solutions:**
  1. Choose a standard representation form, such as CSV or JSON or pickling or ...
  2. Choose your own custom format (document it!), 
  > e.g. a list is represented by a number N (nr of elements in the list) followed by
  N lines each containing a string representation of the elements.
  
  [ 1, 2, 3 ] is represented as
  
        3<br>
        1<br>
        2<br>
        3<br>
         

         

### Pickling

Writing data usually uses strings. How do you write a list or dictionary to a file?

First we have to *encode* the list/directory into a string-friendly format. This is
called *pickling* or *serializing*. Then we can write the list to a file.

When reading a pickled object from file, the reverse must be done to turn it into a 
list/dictionary that python understands.

```python
import pickle
 
l = [1,2,3]
pickled_l = pickle.dumps(l)   # serialize the list (dump string-version of the list)

new_l = pickle.loads( pickled_l )  # deserialize the list (load string-version)

``` 
 
 

In [27]:
# Example of serializing a list
import pickle

l=[1,2,3]

pickled_l = pickle.dumps(l)

print(pickled_l)       # Unreadable, but understandable for python.

b'\x80\x03]q\x00(K\x01K\x02K\x03e.'


In [28]:
# Example of deserializing a serialized list
pickle.loads( pickled_l )

[1, 2, 3]

In [31]:
import os
l = [ 1,2,3 ]

# Open the file in binary mode!!!
f = open('STORAGE.TXT', 'xb')

# This will not work, cannot write a list directly.
#f.write( l )     

f.write( pickle.dumps( l ))
f.close()

FileExistsError: [Errno 17] File exists: 'STORAGE.TXT'

In [32]:
# File is not human readable unfortunately...
!type STORAGE.TXT

€]q (KKKe.


In [33]:
# Open the file in binary mode!
f = open('STORAGE.TXT', 'rb')

s = f.readline()
f.close()

print(type(s))
print(s)

l = pickle.loads(s)
l

<class 'bytes'>
b'\x80\x03]q\x00(K\x01K\x02K\x03e.'


[1, 2, 3]

In [None]:
os.remove('STORAGE.TXT')

## Formatting strings / alignment / precision

If you output results, you may want to have control of the format of the output.
The string-format can be used for that.


    "%d %f %e %s" % ( decimal, float, scientificfloat, string )
    
### How to align fields
  * Use tabs (\t) in your format string (%d\t%d)
  * Use padding and preciesion in format string ( %10.2f )
  * Use rjust(x), ljust(x), center(), zfill() methods of a string
  
  

In [None]:
import random

M = [ [ random.random() * 20 for i in range(3) ] for i in range(3) ] 
M

In [None]:
# Write a table

for r in M:
    print( "%f %f %f" % (r[0],r[1],r[2]))
    

In [None]:
# Write a table, use tabs
for r in M:
    print( "%f\t%f\t%f" % (r[0],r[1],r[2]))
    

In [None]:
# Write a table, using formating <size>.<precision>
for r in M:
    print( "%10.3f %10.3f %10.3f" % (r[0],r[1],r[2]))
    

In [None]:
# Write a table, using formating <size>.<precision>, 0 padding
for r in M:
    print( "%015.3f %015.3f %015.3f" % (r[0],r[1],r[2]))
    

In [None]:
# Similar for strings
# Create a matrix of words.

M = [ [w for w in text.split()[i:i+3] ] for i in range(0,9,3)]

M

In [None]:
# Write a table, using size parameters
for r in M:
    print( "%15s %15s %15s" % (r[0], r[1], r[2]) )
    

In [None]:
# Write a table, using formating rjust() (or ljust)
for r in M:
    print( "%s %s %s" % (r[0].rjust(15), r[1].rjust(15), r[2].rjust(15)))
    

In [None]:
# Write a table, using formating center()
for r in M:
    print( "%s %s %s" % (r[0].center(15), r[1].center(15), r[2].center(15)))
    

<div class="alert alert-success">
<h3>Exercise 3</h3>
Write a Python program that writes a dictionary to a file. Check correctness by loading the dictionary from the file and comparing it to the initial one.
</div>

In [None]:
# use the following dictionary as input

d = {'Bill Gates': '555-987654',
 'Bill Hewlett': '555-555555',
 'Dave Packard': '555-888444',
 'Michael Dell': '555-101010',
 'Steve Jobs': '555-123456'}


<div class="alert alert-success">
<h3>Exercise 4</h3>
Print a table that contain numbers from 1 to 10 on the first column, their squares on the second column, their cubes on the third column and their square root on the fourth. Make sure they are veritically aligned.
</div>