# Review: Writing and Reading Files in Python

There are lots of ways to handle reading and writing files in Python, especially for CSV data.

*Note*: In Python3, if you have an error with characters it can't read, you can get around it by saying

`errors="ignore"` in your file() function.

Remember pandas will do it for you with the .read_csv and .write_csv functions, too.

In [1]:
csvdata = [[2,4,5,12,"fred"],[34,43,21,43,"annie"], [324,3,43,4,"jean"]]

In [3]:
# Remember how join works:

In [4]:
','.join(["fred", "annie", "howard"])  # creates a string separated by ,

'fred,annie,howard'

In [5]:
# won't work because of the numbers in it!
','.join(csvdata[1])

TypeError: sequence item 0: expected str instance, int found

In [6]:
row = csvdata[1]
','.join([ str(row[0]), str(row[1]), str(row[2]), str(row[3]), str(row[4]) ])

'34,43,21,43,annie'

In [7]:
#or you can make it all strings first:

row = [str(item) for item in row]
row

['34', '43', '21', '43', 'annie']

## To Read a Plain Text File

This won't work on Windows but is useful on Mac to see you files in a folder:

In [2]:
ls data/books

[31mAusten_Emma.txt[m[m*       [31mMelville_MobyDick.txt[m[m*
[31mAusten_Pride.txt[m[m*      [31mlovecraft.txt[m[m*


In [5]:
with open("data/books/lovecraft.txt", "r", errors="ignore") as handle:
    text = handle.read()

In [6]:
text

'\n                        The\n                             Shunned House\n\n                        By H. P. LOVECRAFT\n\n\n\nFrom even the greatest of horrors irony is seldom absent. Sometimes it\nenters directly into the composition of the events, while sometimes it\nrelates only to their fortuitous position among persons and places. The\nlatter sort is splendidly exemplified by a case in the ancient city of\nProvidence, where in the late forties Edgar Allan Poe used to sojourn\noften during his unsuccessful wooing of the gifted poetess, Mrs.\nWhitman. Poe generally stopped at the Mansion House in Benefit\nStreet--the renamed Golden Ball Inn whose roof has sheltered Washington,\nJefferson, and Lafayette--and his favorite walk led northward along the\nsame street to Mrs. Whitman\'s home and the neighboring hillside\nchurchyard of St. John\'s, whose hidden expanse of Eighteenth Century\ngravestones had for him a peculiar fascination.\n\nNow the irony is this. In this walk, so many ti

**When we open a file for writing, we need to say "w" for the write operation. Read is the default (but you can also say 'r' if you prefer when reading.)**

In [64]:
with open("data/myfile.csv", "w", errors="ignore") as handle:
    # up here, print your headers to the file:
    handle.write("Score1,Score2,Score3,Score4,Name\n")
    for row in csvdata:
        # we loop through the data -- but we have to make it a string
        # to write it with the plain file write command.
        # each string has to end in a \n -- new line.
        #handle.write("Some string")
        row = [str(item) for item in row]
        print(row)
        handle.write(','.join(row) + "\n")

['2', '4', '5', '12', 'fred']
['34', '43', '21', '43', 'annie']
['324', '3', '43', '4', 'jean']


## Wrap It In a Function! 

In [2]:
def write_csv(filename, data, headers):
    """ Takes file to write to, data in list, and header string."""
    with open(filename, "w", errors="ignore") as handle:
        # up here, print your headers to the file:
        handle.write(headers)
        for row in data:
            # we loop through the data -- but we have to make it a string
            # to write it with the plain file write command.
            # each string has to end in a \n -- new line.
            #handle.write("Some string")
            row = [str(item) for item in row]
            handle.write(','.join(row) + "\n")
        print("wrote file %s" % filename)

In [4]:
header = "Score1,Score2,Score3,Score4,Name\n"

# call the function with the arguments:
write_csv("myfile1.csv", csvdata, header)

wrote file myfile1.csv


## Using the CSV Module

docs: https://docs.python.org/3.6/library/csv.html

In [7]:
# reading a csv file using csv -- notice, no 'w', so 'r' is assumed:

import csv
with open('data/myfile.csv', 'r', errors='ignore') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')
    for row in spamreader:
        print("raw row looks like this:", row)
        # make it prettier with:
        print("Prettier", ', '.join(row))

raw row looks like this: ['Score1', 'Score2', 'Score3', 'Score4', 'Name']
Prettier Score1, Score2, Score3, Score4, Name
raw row looks like this: ['2', '4', '5', '12', 'fred']
Prettier 2, 4, 5, 12, fred
raw row looks like this: ['34', '43', '21', '43', 'annie']
Prettier 34, 43, 21, 43, annie
raw row looks like this: ['324', '3', '43', '4', 'jean']
Prettier 324, 3, 43, 4, jean


In [10]:
# Writing with it:

with open('data/myfile3.csv', 'w', newline='\n') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=',')
    for row in csvdata:
        spamwriter.writerow(row)

In [11]:
# unix, probably won't work on windows - you can go find the file and look at it.
!cat data/myfile3.csv

2,4,5,12,fred
34,43,21,43,annie
324,3,43,4,jean


## CSV Dict files

If your first row has labels in it, you can return a dictionary using the CSV DictReader.

In [12]:
import csv

mydata = []
with open('data/myfile.csv', errors='ignore') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print("the raw dictionary", row)
        # accessing certain columns:
        print(row['Score1'], row['Score2'], row['Name'])
        mydata.append(row)

the raw dictionary {'Name': 'fred', 'Score2': '4', 'Score4': '12', 'Score1': '2', 'Score3': '5'}
2 4 fred
the raw dictionary {'Name': 'annie', 'Score2': '43', 'Score4': '43', 'Score1': '34', 'Score3': '21'}
34 43 annie
the raw dictionary {'Name': 'jean', 'Score2': '3', 'Score4': '4', 'Score1': '324', 'Score3': '43'}
324 3 jean


Likewise, you can write dictionary data out as a csv using the DictWriter.  Up above, we collected the rows into a list called mydata.

In [13]:
mydata

[{'Name': 'fred', 'Score1': '2', 'Score2': '4', 'Score3': '5', 'Score4': '12'},
 {'Name': 'annie',
  'Score1': '34',
  'Score2': '43',
  'Score3': '21',
  'Score4': '43'},
 {'Name': 'jean',
  'Score1': '324',
  'Score2': '3',
  'Score3': '43',
  'Score4': '4'}]

In [52]:
# These are the column headers for the data file.  Remember there is no order here.
mydata[0].keys()

dict_keys(['Score2', 'Score4', 'Score3', 'Score1', 'Name'])

In [14]:
# we can set the delimiter between fields to whatever we want, including tab - \t
# Notice you have kind of random order on the fields. That's because we just got the keys from the first item,
# and didn't specify the order. if you say the order, it's controlled.

with open('data/myfile3.csv', 'w', errors='ignore') as csvfile:
    writer = csv.DictWriter(csvfile, delimiter='\t', fieldnames=list(mydata[0].keys()))
    writer.writeheader()
    for row in mydata:
        # accessing certain columns:
        writer.writerow(row)

In [15]:
!cat data/myfile3.csv

Name	Score2	Score4	Score1	Score3
fred	4	12	2	5
annie	43	43	34	21
jean	3	4	324	43


In [62]:
# Here we specify the field order and it controls how it gets written out.

with open('data/myfile4.csv', 'w', errors='ignore') as csvfile:
    writer = csv.DictWriter(csvfile, delimiter='\t', fieldnames=['Name', 'Score1', 'Score2', 'Score3', 'Score4'])
    writer.writeheader()
    for row in mydata:
        # accessing certain columns:
        writer.writerow(row)

In [61]:
# this won't work on Windows.
!cat data/myfile4.csv

Name	Score1	Score2	Score3	Score4
fred	2	4	5	12
annie	34	43	21	43
jean	324	3	43	4
