# Reading and Writing CSV Files

## Video 1: What are CSV Files
CSVs
+ Common input/output file type for programs  
+ Text file, no non-printable characters  
+ Easy to work with programmatically

## Video 2: Reading CSVs with Python's `csv` Module

In [1]:
import csv

#first we open the file
with open('employee_birthday.csv') as csv_file:
    
    #Next we have to create the reader
    csv_reader = csv.reader(csv_file, delimiter=",")
    line_count = 0 
    for row in csv_reader:
        # first line should contain column names
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            # rest of the lines contain data
            print(f'\t{row[0]} works in the {row[1]} department, and was born in {row[2]}.')
            line_count += 1
            
    # Summary of what was read in
    print(f'Processed {line_count} lines')

Column names are name, department, birthday month
	John Smith works in the Accounting department, and was born in November.
	Erica Meyers works in the IT department, and was born in March.
Processed 3 lines


The CSV module also allows you to use `.DictReader` you'll be able to use column names. `.DictReader` assumes that first row is a header. This is easier to use and can be helpful if you have more columns or if you need to filter out columns.

In [2]:
#first we open the file
with open('employee_birthday.csv') as csv_file:
    
    csv_reader = csv.DictReader(csv_file, delimiter=",")
    line_count = 0 
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        print(f'\t{row["name"]} works in the {row["department"]} department, and was born in {row["birthday month"]}.')
        line_count += 1
    print(f'Processed {line_count} lines')

Column names are name, department, birthday month
	John Smith works in the Accounting department, and was born in November.
	Erica Meyers works in the IT department, and was born in March.
Processed 3 lines


## Video 3: Advanced CSV Reader Parameters
What about non-standard files.

What if you want to read a file that has a comma in the data field, such as addresses. There are a few ways around this:  
1) Use a different delimiter

In [3]:
with open('different_delim.csv') as csv_file:
    
    csv_reader = csv.DictReader(csv_file, delimiter="|")
    line_count = 0 
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        print(f'\t{row["name"]} lives at {row["address"]} and joined on {row["date joined"]}.')
        line_count += 1
    print(f'Processed {line_count} lines')

Column names are name, address, date joined
	john smith lives at 1132 Anywhere Lane Hoboken NJ, 07030 and joined on Jan 4.
	erica meyers lives at 1234 Anywhere Lane Hoboken JN, 07030 and joined on March 2.
Processed 3 lines


2) Wrap the data in quotes and use the `quotechar` argument

In [4]:
with open('quote_wrapping.csv') as csv_file:
    
    csv_reader = csv.DictReader(csv_file, delimiter="|", quotechar = '"')
    line_count = 0 
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        print(f'\t{row["name"]} lives at {row["address"]} and joined on {row["date joined"]}.')
        line_count += 1
    print(f'Processed {line_count} lines')

Column names are name, address, date joined
	john smith lives at 1132 Anywhere Lane Hoboken NJ, 07030 and joined on Jan 4.
	erica meyers lives at 1234 Anywhere Lane Hoboken JN, 07030 and joined on March 2.
Processed 3 lines


3) use an escape character. Use the `escapechar` parameter

In [5]:
with open('escape_char.csv') as csv_file:
    
    csv_reader = csv.DictReader(csv_file, delimiter=",", escapechar="|")
    line_count = 0 
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        print(f'\t{row["name"]} lives at {row["address"]} and joined on {row["date joined"]}.')
        line_count += 1
    print(f'Processed {line_count} lines')

Column names are name, address, date joined
	john smith lives at 1132 Anywhere Lane Hoboken NJ, 07030 and joined on Jan 4.
	erica meyers lives at 1234 Anywhere Lane Hoboken JN, 07030 and joined on March 2.
Processed 3 lines


## Video 4: Writing CSVs with Python's `csv` module
You can write csvs line by line or by using dictionaries

In [6]:
with open('employee_file.csv', mode='w') as employee_file:
    # we need to create a writer
    employee_writer = csv.writer(employee_file, delimiter=',', quotechar='"', quoting = csv.QUOTE_MINIMAL)
    
    
    # Now we write individual lines
    employee_writer.writerow(['John Smith', 'Accounting', 'November'])
    employee_writer.writerow(['Erica Meyers', 'IT', 'March'])
  
# This created a file in the local directory

Quoteminimal only puts quotes if they are contained within the data. SOme other options include:  
+ QUOTE_ALL: puts all fields in quotes
+ QUOTE_NONNUMERIC: Only puts quotes around non-numeric fields
+ QUOTE_NONE: Won't add any quotes. Will need add in an escape

In [7]:
with open('employee_file_QUOTE_NONE.csv', mode='w') as employee_file:
    # we need to create a writer
    employee_writer = csv.writer(employee_file, delimiter=',', quotechar='"', quoting = csv.QUOTE_NONE, escapechar = '|')
    
    
    # Now we write individual lines
    employee_writer.writerow(['John, Smith', 'Accounting', 'November'])
    employee_writer.writerow(['Eric, Meyers', 'IT', 'March'])

You can also write a csv using a dictionary


In [8]:
with open('employee_file_dict.csv', mode='w') as employee_file:
    # we need to create a field names
    fieldnames = ['name', 'dept', 'birth_month']
    employee_writer = csv.DictWriter(employee_file, fieldnames=fieldnames)
    
    # Then we need to write the header row
    employee_writer.writeheader()
    # Now we write individual lines using a dict
    employee_writer.writerow({'name': 'John Smith', 'dept': 'Accounting', 'birth_month': 'November'})
    employee_writer.writerow({'name':'Erica Meyers', 'dept':'IT', 'birth_month':'March'})
  
# This created a file in the local directory

## Reading CSVs With Pandas
Pandas is short for panel data

To load a csv simply use the `pd.read_cv()` method. It treats the first rows as headers. 

In [14]:
import pandas as pd

df = pd.read_csv('hrdata.csv')
print(df)
type(df['Hire Date'][0])

             Name Hire Date   Salary  Sick Days remaining
0  Graham Chapman  03/15/14  50000.0                   10
1     John Cleese  06/01/15  65000.0                    8
2       Eric Idle  05/12/14  45000.0                   10
3     Terry Jones  11/01/13  70000.0                    3
4   Terry Gilliam  08/12/14  48000.0                    7
5   Michael Palin  05/23/13  66000.0                    8


str

If we want to change the index to the name column. It's also worth noting that the hire date is a string. We can change that using the parse_dates parameter:

In [24]:
df = pd.read_csv('hrdata.csv', index_col='Name', parse_dates=['Hire Date'])
print(df)

                Hire Date   Salary  Sick Days remaining
Name                                                   
Graham Chapman 2014-03-15  50000.0                   10
John Cleese    2015-06-01  65000.0                    8
Eric Idle      2014-05-12  45000.0                   10
Terry Jones    2013-11-01  70000.0                    3
Terry Gilliam  2014-08-12  48000.0                    7
Michael Palin  2013-05-23  66000.0                    8


We can do further customization as well. The example below will change the headers rather than using the default

In [21]:
df = pd.read_csv('hrdata.csv',
                 index_col='Employee', # notice referencing the new headers
                 parse_dates=['Hired'],
                 header=0, # THis tells python to ignore the header row
                 names=['Employee', 'Hired', 'Salary', 'Sick Days'] #providing new header names
                )

print(df)

                    Hired   Salary  Sick Days
Employee                                     
Graham Chapman 2014-03-15  50000.0         10
John Cleese    2015-06-01  65000.0          8
Eric Idle      2014-05-12  45000.0         10
Terry Jones    2013-11-01  70000.0          3
Terry Gilliam  2014-08-12  48000.0          7
Michael Palin  2013-05-23  66000.0          8


## Writing CSVs with Pandas
Lets modify the df dataframe and write out another csv. Adding a row is done by specifying the index (name) inside square brackets and then providing a list for the other columns.  

Writing a CSV is as simple as using the `.to_csv()` method and specifying a new file name

In [23]:
df.loc['Cookie Cat']= ['2016-07-04', 20000.00, 0]
print(df)

df.to_csv('hrdata_modified.csv')

                              Hired   Salary  Sick Days
Employee                                               
Graham Chapman  2014-03-15 00:00:00  50000.0         10
John Cleese     2015-06-01 00:00:00  65000.0          8
Eric Idle       2014-05-12 00:00:00  45000.0         10
Terry Jones     2013-11-01 00:00:00  70000.0          3
Terry Gilliam   2014-08-12 00:00:00  48000.0          7
Michael Palin   2013-05-23 00:00:00  66000.0          8
Cookie Cat               2016-07-04  20000.0          0
