# Writing and Reading Files in Python

There are lots of ways to handle reading and writing files in Python, especially for CSV data.

## 0. Join strings

In [11]:
csvdata = [[2,4,5,12,"fred"],[34,43,21,43,"annie"], [324,3,43,4,"jean"]]

In [30]:
# Remember how join works:
','.join(["fred", "annie", "howard"])  # creates a string separated by ,

'fred,annie,howard'

In [14]:
# won't work because of the numbers in it! 
','.join(csvdata[1])

TypeError: sequence item 0: expected str instance, int found

In [15]:
# change the type of numbers from int to str 
row = csvdata[1]
','.join([ str(row[0]), str(row[1]), str(row[2]), str(row[3]), str(row[4]) ])

'34,43,21,43,annie'

In [31]:
#or you can make it all strings first:
row = [str(item) for item in row]
print(row)
','.join(row)

['324', '3', '43', '4', 'jean']


'324,3,43,4,jean'

## 1. Using basic functions of Python


**When we open a file for writing, we need to say "w" for the write operation. Read is the default (but you can also say 'r' if you prefer when reading.)**

*Note*: In Python3, if you have an error with characters it can't read, you can get around it by saying
`errors="ignore"` in your file() function.
## 1.1. Writing 

In [18]:
with open("data/myfile.csv", "w", errors="ignore") as handle:
    # up here, print your headers to the file:
    handle.write("Score1,Score2,Score3,Score4,Name\n")
    for row in csvdata:
        # we loop through the data -- but we have to make it a string
        # to write it with the plain file write command.
        # each string has to end in a \n -- new line.
        #handle.write("Some string")
        row = [str(item) for item in row]
        print(row)
        handle.write(','.join(row) + "\n")

['2', '4', '5', '12', 'fred']
['34', '43', '21', '43', 'annie']
['324', '3', '43', '4', 'jean']


**the option "w" overwrites the file **

### Wrap It In a Function! 

In [38]:
def write_csv(filepath, data, headers):
    """ Takes the path of thefile to write to, data in list, and header string."""
    with open(filepath, "w", errors="ignore") as handle:
        # up here, print your headers to the file:
        handle.write(headers)
        for row in data:
            # we loop through the data -- but we have to make it a string
            # to write it with the plain file write command.
            # each string has to end in a \n -- new line.
            #handle.write("Some string")
            row = [str(item) for item in row]
            handle.write(','.join(row) + "\n")
        print("wrote file %s" % filepath)

In [39]:
header = "Score1,Score2,Score3,Score4,Name\n"

# call the function with the arguments:
write_csv("data/myfile1.csv", csvdata, header)

wrote file data/myfile1.csv


## 1.2. Reading

In [47]:
with open("data/myfile.csv","r") as handle:
    data = handle.read()
    print(data)

# This method returns a string, which is not convenient to manage data

Score1,Score2,Score3,Score4,Name
2,4,5,12,fred
34,43,21,43,annie
324,3,43,4,jean



## 2. Using the CSV Module

docs: https://docs.python.org/3.6/library/csv.html

## 2.1. Reading

In [49]:
# reading a csv file using csv -- notice, no 'w', so 'r' is assumed:

import csv

with open('data/myfile.csv', 'r', errors='ignore') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')
    for row in spamreader:
        print("raw row looks like this:", row)
        # make it prettier with:
        print("Prettier", ', '.join(row))

raw row looks like this: ['Score1', 'Score2', 'Score3', 'Score4', 'Name']
Prettier Score1, Score2, Score3, Score4, Name
raw row looks like this: ['2', '4', '5', '12', 'fred']
Prettier 2, 4, 5, 12, fred
raw row looks like this: ['34', '43', '21', '43', 'annie']
Prettier 34, 43, 21, 43, annie
raw row looks like this: ['324', '3', '43', '4', 'jean']
Prettier 324, 3, 43, 4, jean


## 2.2. Writing

In [50]:
# Writing with it:

with open('data/myfile3.csv', 'w', newline='\n') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=',')
    for row in csvdata:
        spamwriter.writerow(row)

In [56]:
# unix, probably won't work on windows - you can go find the file and look at it.
!cat data/myfile3.csv

La syntaxe de la commande n'est pas correcte.


## 2.3. CSV Dict files

If your first row has labels in it, you can return a dictionary using the CSV DictReader.

In [64]:
import csv

mydata = []
with open('data/myfile.csv', errors='ignore') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print("the raw dictionary", row)
        # accessing certain columns:
        print(row['Score1'], row['Score2'], row['Name'])
        mydata.append(row)

the raw dictionary OrderedDict([('Score1', '2'), ('Score2', '4'), ('Score3', '5'), ('Score4', '12'), ('Name', 'fred')])
2 4 fred
the raw dictionary OrderedDict([('Score1', '34'), ('Score2', '43'), ('Score3', '21'), ('Score4', '43'), ('Name', 'annie')])
34 43 annie
the raw dictionary OrderedDict([('Score1', '324'), ('Score2', '3'), ('Score3', '43'), ('Score4', '4'), ('Name', 'jean')])
324 3 jean


Likewise, you can write dictionary data out as a csv using the DictWriter.  Up above, we collected the rows into a list called mydata.

In [76]:
mydata

[OrderedDict([('Score1', '2'),
              ('Score2', '4'),
              ('Score3', '5'),
              ('Score4', '12'),
              ('Name', 'fred')]),
 OrderedDict([('Score1', '34'),
              ('Score2', '43'),
              ('Score3', '21'),
              ('Score4', '43'),
              ('Name', 'annie')]),
 OrderedDict([('Score1', '324'),
              ('Score2', '3'),
              ('Score3', '43'),
              ('Score4', '4'),
              ('Name', 'jean')])]

In [66]:
# These are the column headers for the data file.  Remember there is no order here.
mydata[0].keys()

odict_keys(['Score1', 'Score2', 'Score3', 'Score4', 'Name'])

In [14]:
# we can set the delimiter between fields to whatever we want, including tab - \t
# Notice you have kind of random order on the fields. That's because we just got the keys from the first item,
# and didn't specify the order. if you say the order, it's controlled.

with open('data/myfile3.csv', 'w', errors='ignore') as csvfile:
    writer = csv.DictWriter(csvfile, delimiter='\t', fieldnames=list(mydata[0].keys()))
    writer.writeheader()
    for row in mydata:
        # accessing certain columns:
        writer.writerow(row)

In [15]:
!cat data/myfile3.csv

Name	Score2	Score4	Score1	Score3
fred	4	12	2	5
annie	43	43	34	21
jean	3	4	324	43


In [62]:
# Here we specify the field order and it controls how it gets written out.

with open('data/myfile4.csv', 'w', errors='ignore') as csvfile:
    writer = csv.DictWriter(csvfile, delimiter='\t', fieldnames=['Name', 'Score1', 'Score2', 'Score3', 'Score4'])
    writer.writeheader()
    for row in mydata:
        # accessing certain columns:
        writer.writerow(row)

In [61]:
!cat data/myfile4.csv

Name	Score1	Score2	Score3	Score4
fred	2	4	5	12
annie	34	43	21	43
jean	324	3	43	4


## 3. Using the pandas library 
### 3.1. Reading

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

In [1]:
#import the library pandas

import pandas as pd 

# Chargement CSV
data_path = "./data/goog.csv"
data = pd.read_csv(data_path, delimiter=',',header = 0)

# header = 0 indicates the first row is headers, it's the default value
# if no headers, put header = None 
# to look at the different args you can pass in this function, read the doc

In [2]:
# data is a pandas DataFrame 
data.head()
#.head() prints only the five first rows of the DataFrame

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,2010-01-04,313.16,314.44,311.81,313.06,
1,2010-01-05,313.28,313.61,310.46,311.68,
2,2010-01-06,312.62,312.62,302.88,303.83,
3,2010-01-07,304.4,304.7,296.03,296.75,
4,2010-01-08,295.7,301.32,294.26,300.71,


In [3]:
# manage a pandas dataFrame

print(data.columns)
print('\n')
print(data.iloc[1,1])
print("\n")
print(data.iloc[0,:])
print("\n")
print(data.iloc[1:5,3])
print("\n")
data['Date'].head()

Index(['Date', 'Open', 'High', 'Low', 'Close', 'Volume'], dtype='object')


313.28


Date      2010-01-04
Open          313.16
High          314.44
Low           311.81
Close         313.06
Volume           NaN
Name: 0, dtype: object


1    310.46
2    302.88
3    296.03
4    294.26
Name: Low, dtype: float64




0    2010-01-04
1    2010-01-05
2    2010-01-06
3    2010-01-07
4    2010-01-08
Name: Date, dtype: object

##### ** The week 2 is about pandas DataFrames and how to use them **

### 3.2. Writing 

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

In [5]:
data.to_csv('./data/myfile5.csv', sep = ',')

#sep and delimiter are equivalent
# see the doc for other arguments

** The function overwrites the file**