# Writing and Reading Files in Python

There are lots of ways to handle reading and writing files in Python, especially for CSV data. 

You already know the method using Python basic functions and you are going to find out two more, using two different libraries : CSV and Pandas. 

## 0. Join strings

In [None]:
csvdata = [[2,4,5,12,"fred"],[34,43,21,43,"annie"], [324,3,43,4,"jean"]]

In [None]:
# Remember how join works:
','.join(["fred", "annie", "howard"])  # creates a string separated by ,

In [None]:
# won't work because of the numbers in it! 
','.join(csvdata[1])

In [None]:
# change the type of numbers from int to str 
row = csvdata[1]
','.join([ str(row[0]), str(row[1]), str(row[2]), str(row[3]), str(row[4]) ])

In [None]:
#or you can make it all strings first:
row = [str(item) for item in row]
print(row)
','.join(row)

## 1. Using basic functions of Python


**When we open a file for writing, we need to say "w" for the write operation. Read is the default (but you can also say 'r' if you prefer when reading.)**

*Note*: In Python3, if you have an error with characters it can't read, you can get around it by saying
`errors="ignore"` in your file() function.
## 1.1. Writing 

In [None]:
with open("data/myfile.csv", "w", errors="ignore") as handle:
    # up here, print your headers to the file:
    handle.write("Score1,Score2,Score3,Score4,Name\n")
    for row in csvdata:
        # we loop through the data -- but we have to make it a string
        # to write it with the plain file write command.
        # each string has to end in a \n -- new line.
        #handle.write("Some string")
        row = [str(item) for item in row]
        print(row)
        handle.write(','.join(row) + "\n")

**the option "w" overwrites the file **

### Wrap It In a Function! 

In [None]:
def write_csv(filepath, data, headers):
    """ Takes the path of thefile to write to, data in list, and header string."""
    with open(filepath, "w", errors="ignore") as handle:
        # up here, print your headers to the file:
        handle.write(headers)
        for row in data:
            # we loop through the data -- but we have to make it a string
            # to write it with the plain file write command.
            # each string has to end in a \n -- new line.
            #handle.write("Some string")
            row = [str(item) for item in row]
            handle.write(','.join(row) + "\n")
        print("wrote file %s" % filepath)

In [None]:
header = "Score1,Score2,Score3,Score4,Name\n"

# call the function with the arguments:
write_csv("data/myfile1.csv", csvdata, header)

## 1.2. Reading

In [None]:
with open("data/myfile.csv","r") as handle:
    data = handle.read()
    print(data)

# This method returns a string, which is not convenient to manage data

## 2. Using the CSV Module

docs: https://docs.python.org/3.6/library/csv.html

## 2.1. Reading

In [None]:
# reading a csv file using csv -- notice, no 'w', so 'r' is assumed:

import csv

with open('data/myfile.csv', 'r', errors='ignore') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')
    for row in spamreader:
        print("raw row looks like this:", row)
        # make it prettier with:
        print("Prettier", ', '.join(row))

## 2.2. Writing

In [None]:
# Writing with it:

with open('data/myfile3.csv', 'w', newline='\n') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=',')
    for row in csvdata:
        spamwriter.writerow(row)

In [None]:
# unix, probably won't work on windows - you can go find the file and look at it.
!cat data/myfile3.csv

## 2.3. CSV Dict files

If your first row has labels in it, you can return a dictionary using the CSV DictReader.

In [None]:
import csv

mydata = []
with open('data/myfile.csv', errors='ignore') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print("the raw dictionary", row)
        # accessing certain columns:
        print(row['Score1'], row['Score2'], row['Name'])
        mydata.append(row)

Likewise, you can write dictionary data out as a csv using the DictWriter.  Up above, we collected the rows into a list called mydata.

In [None]:
mydata

In [None]:
# These are the column headers for the data file.  Remember there is no order here.
mydata[0].keys()

In [None]:
# we can set the delimiter between fields to whatever we want, including tab - \t
# Notice you have kind of random order on the fields. That's because we just got the keys from the first item,
# and didn't specify the order. if you say the order, it's controlled.

with open('data/myfile3.csv', 'w', errors='ignore') as csvfile:
    writer = csv.DictWriter(csvfile, delimiter='\t', fieldnames=list(mydata[0].keys()))
    writer.writeheader()
    for row in mydata:
        # accessing certain columns:
        writer.writerow(row)

In [None]:
!cat data/myfile3.csv

In [None]:
# Here we specify the field order and it controls how it gets written out.

with open('data/myfile4.csv', 'w', errors='ignore') as csvfile:
    writer = csv.DictWriter(csvfile, delimiter='\t', fieldnames=['Name', 'Score1', 'Score2', 'Score3', 'Score4'])
    writer.writeheader()
    for row in mydata:
        # accessing certain columns:
        writer.writerow(row)

In [None]:
!cat data/myfile4.csv

## 3. Using the pandas library 
### 3.1. Reading

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

In [None]:
#import the library pandas

import pandas as pd 

# Chargement CSV
data_path = "./data/goog.csv"
data = pd.read_csv(data_path, delimiter=',',header = 0)

# header = 0 indicates the first row is headers, it's the default value
# if no headers, put header = None 
# to look at the different args you can pass in this function, read the doc

In [None]:
# data is a pandas DataFrame 
data.head()
#.head() prints only the five first rows of the DataFrame

In [None]:
# manage a pandas dataFrame

print(data.columns)
print('\n')
print(data.iloc[1,1])
print("\n")
print(data.iloc[0,:])
print("\n")
print(data.iloc[1:5,3])
print("\n")
data['Date'].head()

##### ** The week 2 is about pandas DataFrames and how to use them **

### 3.2. Writing 

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

In [None]:
data.to_csv('./data/myfile5.csv', sep = ',')

#sep and delimiter are equivalent
# see the doc for other arguments

** The function overwrites the file**

In [None]:
cat ./data/myfile5.csv