# Dealing with files

Standard input and output is not convenient for
large volumes of data. Instead, read and write files on the disk. Disk read/write is much slower than memory

# Disk buffers

* Disk data is read/written in large blocks

* “Buffer” is a temporary parking place for disk data

# Reading/writing disk data

* Open a file — create file handle to file on disk (like setting up a buffer for the file)

* Read and write operations are to file handle

* Close a file
    * Write out buffer to disk (flush)
    * Disconnect file handle


# Opening a file

fh = open("gcd.py", "r")

* First argument to open is file name Can give a full path

* Second argument is mode for opening file

1. Read, "r": opens a file for reading only
2. Write, "w": creates an empty file to write to
3. Append, "a": append to an existing file

In [7]:
fh = open("lecture1.ipynb", "r")
print("Sucess")

content = fh.read(10)

# Reads entire file into name as a single string
# Reads entire file as list of strings Each string is one line, ending with'\n'

contents = fh.readline()
print("sucess")

# When reading incrementally, important to know when file has ended

fh.close()

Sucess
sucess


# String prosessing 

* Easy to read and write text files

* String processing functions make it easy to analyse and transform contents
    * Search and replace text
    * Export spreadsheet as text file (csv) and process columns

Strip whiteSpace: 

s.rstrip() removes traillng whiteSpace

In [3]:
s = "   ArithmeticError   "

print(s.rstrip(),":removes trailing whitespace")
print(s.lstrip(),":removes leading whitespace")
print(s.strip(),":removes leading and trailing whitespace")

   ArithmeticError :removes trailing whitespace
ArithmeticError    :removes leading whitespace
ArithmeticError :removes leading and trailing whitespace


# Searching for text

s.find(pattern)

Returns first position in s where pattern occurs, -1 if
no occurrence of pattern

s.find(pattern,start,end)

Search for pattern in slice s[start:end]

s.index(pattern), s.index(pattern,l,r)

Like find, but raise ValueError if pattern not found

# Search and replace

s.replace(fromstr,tostr)

Returns copy of s with each occurrence of
fromstr replaced by tostr

s.replace(fromstr,tostr,n)

Replace at most first n copies

Note that s itself is unchanged — strings are
immutable

# Splitting a string

Export spreadsheet as “comma separated value” text file

Want to extract columns from a line of text

Split the line into chunks between commas

columns = s.split(",")

Can split using any separator string

Split into at most n chunks

columns = s.split(" : ", n)

In [6]:
# find:

s = "hello world"
r1 = s.find('o')
r2 = s.replace('h','H')
columns = s.split()

print(r1)
print(r2)
print(columns)


4
Hello world
['hello', 'world']


# Joining strings

* Recombine a list of strings using a separator

In [None]:

columns = s.split(",")
joinstring = ","
csvline = joinstring.join(columns)

date = "16"
month = "08"
year = "2016"
today = "-".join([date,month,year])

# Converting case

* Convert lower case to upper case, ...

* s.capitalize() — return new string with first letter uppercase, rest lower

* s.lower() — convert all uppercase to lowercase

* s.upper() — convert all lowercase to uppercase

* s.title(), s.swapcase()

* s.center(n)

* Returns string of length n with s centred, rest blank

* s.center(n,"*")

* Fill the rest with * instead of blanks

* s.ljust(n), s.ljust(n,"*"), s.rjust(n), ...

* Similar, but left/right justify s in returned string

# Formatted printing

* Recall that we have limited control over how print() display output.
* Optional argument end="..." changes default new line at the end of print
* Optional argument sep="..." changes default
separator between items

In [13]:
print("first:{0}, second:{1}".format(21,31))

print("second:{1}, first{0}".format(21,31))

# Can also replace arguments by name

print("One: {f}, two: {s}".format(f=47,s=11))

print("One: {f}, two: {s}".format(s=11,f=47))

print("Value: {0:3d}".format(4))
# 3d describes how to display the value 4
# d is a code specifies that 4 should be treated as an integer value
# 3 is the width of the area to show 4

print("Value: {0:6.2f}".format(47.523))


first:21, second:31
second:31, first21
One: 47, two: 11
One: 47, two: 11
Value:   4
Value:  47.52


# Now, real formatting

>>> "Value: {0:6.2f}".format(47.523)

6.2f describes how to display the value 47.523

f is a code specifies that 47.523 should be treated
as a floating point value

* 6 — width of the area to show 47.523

* 2 — number of digits to show after decimal point

"Value: 47.52"

# Doing nothing

Blocks such as except:, else:, ...cannot be empty
Use pass for a null statement

In [None]:

while(True):
    try:
        userdata = input("Enter a number: ")
        usernum = int(userdata)
    except ValueError:
        pass
    else:
        break

# Removing a list entry

* Want to remove l[4]?

del(l[4])

* Automatically contracts the list and shifts elements in l[5:] left

* Also works for dictionaries

* del(d[k]) removes the key k and its associated value

# Checking undefined name

* Assign a value to x only if x is undefined

try:
    x
except NameError:
    x = 5

# The value None

None is a special value used to denote “nothing”. Use it to initialise a name and later check if it has been assigned a valid value

x = None

...

if x is not None:
y = x

Exactly one value None

x is None is same as
x == None