## 19 Reading and Writing Text Files
* 19.1 Working with file Objects
* 19.1.1 The WITH Statement
* 19.1.2 Writing Text Files
* 19.1.3 Safe File Reading
* 19.1.4 The os Working Directory vs. the arcpy workspace
* 19.1.5 Reading Text Files
* 19.2 Parsing Line Contents
* 19.2.1 Parsing Field Names
* 19.3 Modifying Text
* 19.3.1 Pseudocode for Modifying Text Files
* 19.3.2 Working with Tabular Text
* 19.4 Pickling
* 19.5 Discussion
* 19.6 Key Terms
* 19.7 Exercises

## 19.1 Working with file Objects

In [1]:
f = open('data/poem.txt', 'r')

In [2]:
f

<open file 'data/poem.txt', mode 'r' at 0x0313C6A8>

In [3]:
f.read()

'Scripterwocky\n`Twas brillig, and the Python lists\nDid join and pop-le in the loop:\nAll-splitsy were the string literals,\nAnd the boolean values were true.\t \n'

### 19.1.1 The WITH Statement

### 19.1.2 Writing Text Files

In [7]:
f = open('scratch/sneeze.txt', 'w')
f.write('haa')
f.write('choo')
f.close()

In [8]:
f = open('scratch/sneeze2.txt', 'w')
f.write('snork\nsnif fle\n')
f.write('haaack\n')
f.write('*sigh*')
f.close()

In [10]:
f = open('scratch/sneeze3.txt', 'w')
f.write(5000)

TypeError: expected a character buffer object

In [11]:
f.write(str(5000))
lament = 'I sneezed {0} times today.'.format(5000)
f.write(lament)
f.close()

### 19.1.3 Safe File Reading

In [None]:
# %load script/safeFileRead.py
import os, sys
infile = sys.argv[1]
try:
  f = open(infile, 'r')
  print f.read()
  f.close()
except IOError:
  print"{0} doesn't exist or can't be opened.".format(infile)

In [21]:
%run script/safeFileRead.py scratch/sneeze3.txt

5000I sneezed 5000 times today.


### 19.1.4 The os Working Directory vs. the arcpy workspace

In [23]:
import arcpy
arcpy.env.workspace = 'data'
f = open('poem.txt', 'r')

IOError: [Errno 2] No such file or directory: 'poem.txt'

In [25]:
import os
os.path.isfile('data/poem.txt')

True

In [27]:
os.getcwd()

'D:\\BOOKS\\GISen\\_PYTHON\\PythonForArcGIS\\SF_PFA\\ch19'

In [30]:
f = open('data/poem.txt', 'r')
f.readline()

'Scripterwocky\n'

In [31]:
f.close()

### 19.1.5 Reading Text Files

In [33]:
f = open('data/poem.txt', 'r')
f.readline()

'Scripterwocky\n'

In [34]:
 f.readline()

'`Twas brillig, and the Python lists\n'

In [35]:
f.readline()

'Did join and pop-le in the loop:\n'

In [36]:
f.readline()

'All-splitsy were the string literals,\n'

In [37]:
f.readline()

'And the boolean values were true.\t \n'

In [38]:
f.readline()

''

In [40]:
f.close()

---

In [44]:
f = open('data/poem.txt', 'r')

In [45]:
for line in f:
  print line

Scripterwocky

`Twas brillig, and the Python lists

Did join and pop-le in the loop:

All-splitsy were the string literals,

And the boolean values were true.	 



In [46]:
f.readline()

''

In [47]:
f.close()

---

In [48]:
f = open('data/poem.txt', 'r')
f.readline()

'Scripterwocky\n'

In [49]:
for line in f:
  print line

`Twas brillig, and the Python lists

Did join and pop-le in the loop:

All-splitsy were the string literals,

And the boolean values were true.	 



In [50]:
f.close()

## 19.2 Parsing Line Contents

In [51]:
f = open('data/report.txt', 'r')
f.readline()

'1\t2.07\t5.21\t4.05\t3.64\t2.03\t3.74\n'

In [52]:
line = f.readline()
line

'2\t3.51\t7.29\t4.2\t4.44\t3.67\t4.46\n'

In [53]:
print line

2	3.51	7.29	4.2	4.44	3.67	4.46



In [54]:
lineList = line.split()
lineList

['2', '3.51', '7.29', '4.2', '4.44', '3.67', '4.46']

In [55]:
nums = [float(i) for i in lineList]
nums

[2.0, 3.51, 7.29, 4.2, 4.44, 3.67, 4.46]

In [56]:
data = nums[1:]
data

[3.51, 7.29, 4.2, 4.44, 3.67, 4.46]

In [58]:
sum(data)

27.57

In [None]:
# %load script/parseTable.py
# parseTable.py
# Purpose: Parse numeric values in a tabular text file.
# Usage: No arguments required.  Input file hard-coded.
# Output: Printed ID, sum, count, and data value list for each row of a table in the text file.

cap = 5
infile = 'C:/gispy/data/ch19/report.txt'
try:
    with open(infile, 'r') as f:
        for line in f:
            # String to list of strings.
            lineList = line.split()
            # String items to float items.
            nums = [float(i) for i in lineList]
            # First col is ID, rest are data values.
            ID = nums[0]
            data = nums[1:]
            # Cap the data values at 5.
            for index, val in enumerate(data):
                if val > cap:
                    data[index] = cap
            # Count and sum the values and report the results.
            count = len(data)
            total = sum(data)
            print 'ID: {0}   Sum: {1}   Count {2}'.format(ID, total, count)
            print 'Data: {0}'.format(data)
except IOError:
    print "{0} doesn't exist or can't be opened.".format(infile)


In [60]:
%run script2/parseTable.py

ID: 1.0   Sum: 20.53   Count 6
Data: [2.07, 5, 4.05, 3.64, 2.03, 3.74]
ID: 2.0   Sum: 25.28   Count 6
Data: [3.51, 5, 4.2, 4.44, 3.67, 4.46]
ID: 3.0   Sum: 19.72   Count 5
Data: [3.9, 4.24, 4.05, 4.04, 3.49]
ID: 4.0   Sum: 22.64   Count 6
Data: [3.18, 3.5, 4.73, 4.39, 3.28, 3.56]


In [None]:
# %load script/cfactor.py
# cfactor.py
# Purpose: Read a text file contents into a dictionary.
# Input: No arguments required.  Input file hard-coded.
# Output: Printed cfactor:label dictionary.

factorDict = {}
infile = 'C:/gispy/data/ch19/cfactors.txt'
try:
    with open(infile, 'r') as f:
        f.readline()
        for row in f:
            row = row.split('=')
            factor = int(row[0])
            label = row[1].rstrip()
            factorDict[factor] = label
    print factorDict
except IOError:
    print "{0} doesn't exist.".format(infile)


In [62]:
%run script2/cfactor.py

{1: 'stable', 2: 'low deposition', 3: 'moderate deposition', 4: 'high deposition', 5: 'severe deposition'}


## 19.2.1 Parsing Field Names

In [63]:
mylist = ['a','b','c','d']
mylist.index('c')

2

In [None]:
# %load script/fieldIndex.py
# fieldIndex.py
# Purpose: Find the index of a field name in a text
#         file with space separated fields in the first row.
# Input: No arguments required.  Input file hard-coded.
infile = 'C:/gispy/data/ch19/cfactors.txt'
fieldName = 'Label'


def getIndex(delimString, delimiter, name):
    '''Get position of item in a delimited string'''
    delimString = delimString.strip()
    rowList = delimString.split(delimiter)
    index = rowList.index(name)
    return index

with open(infile, 'r') as f:
    row = f.readline()
    ind = getIndex(row, ' ', fieldName)
    print '{0} has index {1}'.format(fieldName, ind)


In [66]:
%run script2/fieldIndex.py

Label has index 1


## 19.3 Modifying Text Files

In [None]:
# %load script/cfactorModify.py
# cfactorModify.py
# Purpose: Demonstrate reading and writing files.
# IUsage: No arguments required.  Input file hard-coded.
# Output: Modified text file *v2.txt

import os

infile = 'C:/gispy/data/ch19/cfactors.txt'
baseN = os.path.basename(infile)
outfile = 'C:/gispy/scratch/' + os.path.splitext(baseN)[0] + 'v2.txt'
try:
    # OPEN the input and output files.
    with open(infile, 'r') as fin:
        with open(outfile, 'w') as fout:
            # READ/MODIFY/WRITE the first line.
            line = fin.readline()
            line = line.replace(' ', ',')
            fout.write(line)

            # FOR the remaining lines.
            for line in fin:
                # MODIFY the line.
                line = line.replace('=', ',')
                # WRITE to output.
                fout.write(line)
            print '{0} created.'.format(outfile)
except IOError:
    print "{0} doesn't exist.".format(infile)


In [70]:
%run script2/cfactorModify.py

scratch/cfactorsv2.txt created.


### 19.3.1 Pseudocode for Modifying Text Files

### 19.3.2 Working with Tabular Text

In [None]:
# %load script/removeHeader.py
# removeHeader.py
# Purpose: Remove header rows.
# Usage: No arguments required.  Input file hard-coded.
# Output: Modified text file *v2.txt

import os
headers = 2
infile = 'C:/gispy/data/ch19/eyeTrack.csv'
baseN = os.path.basename(infile)
outfile = 'C:/gispy/scratch/' + os.path.splitext(baseN)[0] \
          + 'v2.txt'
try:
    with open(infile, 'r') as fin:
        with open(outfile, 'w') as fout:
            # READ header lines, but don't write them.
            for i in range(headers):
                fin.readline()
            # READ and WRITE the remaining lines.
            for line in fin:
                fout.write(line)
            print '{0} created.'.format(outfile)
except IOError:
    print "{0} doesn't exist.".format(infile)


In [72]:
%run script2/removeHeader.py

scratch/eyeTrackv2.txt created.


In [None]:
# %load script/removeRecords.py
# removeRecords.py
# Purpose: Demonstrate emoving rows under specific conditions.
# Input: No arguments required.  Input file hard-coded.
# Output: Modified text file *v2.txt

import os
headers = 2
field1 = 'FPOGX'
field2 = 'FPOGY'
sep = ','


def getIndex(delimString, delimiter, name):
    '''Get position of item in a delimited string'''
    delimString = delimString.strip()
    lineList = delimString.split(delimiter)
    index = lineList.index(name)
    return index

infile = 'C:/gispy/data/ch19/eyeTrack.csv'
baseN = os.path.basename(baseN)
outfile = 'C:/gispy/scratch/' + os.path.splitext(baseN)[0] \
          + 'v2' + os.path.splitext(baseN)[1]
try:
    with open(infile, 'r') as fin:
        with open(outfile, 'w') as fout:
            # READ header lines, but don't write them.
            for i in range(headers):
                line = fin.readline()
            # READ and WRITE field names
            line = fin.readline()
            fout.write(line)

            # FIND field indices
            findex1 = getIndex(line, sep, field1)
            findex2 = getIndex(line, sep, field2)

            # FOR the remaining lines:
            for line in fin:
                lineList = line.split(sep)
                v1 = float(lineList[findex1])
                v2 = float(lineList[findex2])
                v2 = float(lineList[findex2])
                # IF condition is TRUE, write line.
                if v1 > 0 and v2 > 0:
                    fout.write(line)
            print '{0} created.'.format(outfile)

except IOError:
    print "{0} doesn't exist.".format(infile)


In [75]:
%run script2/removeRecords.py

scratch/eyeTrackv2.csv created.


In [76]:
fields = ['FireId', 'Org', 'State', 'FireType', 'Protection']
index = 2
fields.pop(index)
fields

['FireId', 'Org', 'FireType', 'Protection']

In [77]:
fields = ['FireId', 'Org', 'State', 'FireType', 'Protection']
indexA = 2
indexB = 4
fields.pop(indexA)

'State'

In [78]:
fields.pop(indexB)

IndexError: pop index out of range

In [79]:
fields = ['FireId', 'Org', 'State', 'FireType', 'Protection']
fields.pop(indexB)

'Protection'

In [80]:
fields

['FireId', 'Org', 'State', 'FireType']

In [81]:
fields.pop(indexA)

'State'

In [82]:
fields

['FireId', 'Org', 'FireType']

---

In [None]:
# %load script/removeColumns.py
# removeColumns.py
# Purpose: Demonstrate removing columns, given the field names.
# Input: No arguments required.  Input file hard-coded.
# Output: Modified text file *v2.txt
import os


def getIndex(delimString, delimiter, name):
    '''Get position of item in a delimited string'''
    delimString = delimString.strip()
    lineList = delimString.split(delimiter)
    index = lineList.index(name)
    return index


def removeItems(indexList, delimiter, delimString):
    '''Remove items at given indices in a delimited string'''
    lineList = delimString.split(delimiter)
    indexList.sort(reverse=True)
    for i in indexList:
        lineList.pop(i)
    stringLine = delimiter.join(lineList)
    return stringLine

headers = 2
sep = ','
removeFields = ['LPCX', 'LPCY', 'RPCX', 'RPCY', 'LGX', 'LGY', 'RGX', 'RGY']
infile = 'C:/gispy/data/ch19/eyeTrack.csv'
baseN = os.path.basename(baseN)
outfile = 'C:/gispy/scratch/' + os.path.splitext(baseN)[0] \
          + 'v2' + os.path.splitext(baseN)[1]
try:
    with open(infile, 'r') as fin:
        with open(outfile, 'w') as fout:
            # READ header lines, but don't write them.
            for i in range(headers):
                fin.readline()
            # READ field names.
            fieldNamesLine = fin.readline()
            # FIND field indices.
            rfIndex = []
            for field in removeFields:
                rfIndex.append(getIndex(fieldNamesLine, sep, field))
            line = removeItems(rfIndex, sep, fieldNamesLine)
            fout.write(line)
            # READ and WRITE the remaining lines.
            for line in fin:
                line = removeItems(rfIndex, sep, line)
                fout.write(line)
            print '{0} created.'.format(outfile)
except IOError:
    print "{0} doesn't exist.".format(infile)


In [85]:
%run script2/removeColumns.py

scratch/eyeTrackv2.csv created.


## 19.4 Pickling

In [87]:
import pickle
f = open('data/gherkin.txt', 'w')
pickle.dump(2.71828,f)
pickle.dump(['FireId', 'Org', 'FireType'],f)
f.close()

In [88]:
f2 = open('data/gherkin.txt', 'r')
thing1 = pickle.load(f2)
thing1

2.71828

In [89]:
type(thing1)

float

In [90]:
thing2 = pickle.load(f2)
thing2

['FireId', 'Org', 'FireType']

In [91]:
type(thing2)

list

In [92]:
f2.close()

In [95]:
f3 = open("data/gherkin.txt", "r")
f3.readline()

'F2.71828\n'

In [94]:
f3.close()

## 19.5 Discussion

## 19.6 Key Terms

## 19.7 Exercises