# Reading and Writing CSV files

In [22]:
cwd = !cd
print(f'Current working directory is {cwd}\n')

Current working directory is ['C:\\Users\\cfornshell\\Real Python\\Data Science With Python Core Skills']



## Video 1: What are CSV files
We'll focus on two ways to import CSV files  

    - Python built in library
    - Pandas
    
## Video 2: Reading CSVs with Python's "csv" Module

First thing you need to do is import the csv module and open the csv file

In [23]:
import csv

with open('Data/employee_birthday.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=",")
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            print(f'\t{row[0]} works in the {row[1]} department, and was born in {row[2]}')
            line_count += 1
            
    print(f'Processed {line_count} lines.')
                  

Column names are name, department, birthday month
	John Smith works in the Accounting department, and was born in November
	Erica Meyers works in the IT department, and was born in March
Processed 3 lines.


You can also use a DictReader to reference the columns by their header name

In [28]:
with open('Data/employee_birthday.csv') as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=",")
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        print(f'\t{row["name"]} works in the {row["department"]} department, and was born in {row["birthday month"]}')
        line_count += 1
            
    print(f'Processed {line_count} lines.')

Column names are name, department, birthday month
	John Smith works in the Accounting department, and was born in November
	Erica Meyers works in the IT department, and was born in March
Processed 3 lines.


## Video 3: Advanced CSV Reader Parameter
What if you want to read in data that contains commas, such as addresses?

The first thing you could do is change the delimiter. It may be easiest to just use the pipe delimiter rather than a comma.

In [31]:
with open('Data/different_delim.csv') as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter="|")
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        print(f'\t{row["name"]} lives at {row["address"]} and joined the company on {row["date joined"]}')
        line_count += 1
            
    print(f'Processed {line_count} lines.')

Column names are name, address, date joined
	john smith lives at 1132 Anywhere Lane Hoboken NJ, 07030 and joined the company on Jan 4
	erica meyers lives at 1234 Anywhere Lane Hoboken JN, 07030 and joined the company on March 2
Processed 3 lines.


You can also use quotes within the csv to wrap the comma containing text. Then you can use the csv_reader to ignore the commas within the quotes

In [32]:
with open('Data/quote_wrapping.csv') as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=",", quotechar='"')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        print(f'\t{row["name"]} lives at {row["address"]} and joined the company on {row["date joined"]}')
        line_count += 1
            
    print(f'Processed {line_count} lines.')

Column names are name, address, date joined
	john smith lives at 1132 Anywhere Lane Hoboken NJ, 07030 and joined the company on Jan 4
	erica meyers lives at 1234 Anywhere Lane Hoboken JN, 07030 and joined the company on March 2
Processed 3 lines.


Another option is to use an escape character. An escape character essentially tells the csv reader to ignore anything that appears directly after the defined character. In the example below we're telling {csv_reader} to ignore the comma that comes after the pipe operator

In [35]:
with open('Data/escape_char.csv') as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=",", escapechar='|')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        print(f'\t{row["name"]} lives at {row["address"]} and joined the company on {row["date joined"]}')
        line_count += 1
            
    print(f'Processed {line_count} lines.')

Column names are name, address, date joined
	john smith,1132 Anywhere Lane Hoboken NJ, 07030 lives at Jan 4 and joined the company on None
	erica meyers,1234 Anywhere Lane Hoboken JN, 07030 lives at March 2 and joined the company on None
Processed 3 lines.


## Video 4: Writing CSV's With Python's csv Module

In [44]:
with open('OutData\employee_file.csv', mode='w', newline='') as employee_file:
        employee_writer = csv.writer(employee_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        
        employee_writer.writerow(['John Smith', 'Accounting', 'November'])
        employee_writer.writerow(['Erica Meyers', 'IT', 'March'])

You may want to add quotes around everything in that case change 'QUOTE_MINIMAL' to 'QUOTE_ALL'

In [46]:
with open('OutData\employee_file_Quotes.csv', mode='w', newline='') as employee_file:
        employee_writer = csv.writer(employee_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
        
        employee_writer.writerow(['John Smith', 'Accounting', 'November'])
        employee_writer.writerow(['Erica Meyers', 'IT', 'March'])

You may want to add escape characters to the output

In [47]:
with open('OutData\employee_file_escapes.csv', mode='w', newline='') as employee_file:
        employee_writer = csv.writer(employee_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_NONE, escapechar="|")
        
        employee_writer.writerow(['Smith, John', 'Accounting', 'November'])
        employee_writer.writerow(['Meyers, Erica', 'IT', 'March'])

You can also write a csv using a dictionary. To do this you need to add a list of field names and then reference those fields

In [48]:
with open('OutData\employee_file_dict.csv', mode='w', newline='') as employee_file:
        fieldnames = ['name', 'department', 'birth month']
        employee_writer = csv.DictWriter(employee_file, fieldnames=fieldnames)
        
        employee_writer.writeheader() #This adds the fieldnames parameter as a header row
        employee_writer.writerow({'name':'John Smith', 'department':'Accounting', 'birth month':'November'})
        employee_writer.writerow({'name':'Erica Meyers', 'department':'IT', 'birth month':'March'})

## Video 5: Reading CSVs with pandas


In [1]:
import pandas as pd

Reading in a csv is as simple as using pd.read_csv() method  
Automatically reads the first row as headers

In [3]:
df = pd.read_csv('Data/hrdata.csv')
df

Unnamed: 0,Name,Hire Date,Salary,Sick Days remaining
0,Graham Chapman,03/15/14,50000.0,10
1,John Cleese,06/01/15,65000.0,8
2,Eric Idle,05/12/14,45000.0,10
3,Terry Jones,11/01/13,70000.0,3
4,Terry Gilliam,08/12/14,48000.0,7
5,Michael Palin,05/23/13,66000.0,8


There are a couple of changes that we can make to improve structure of the data frame:  
    1) Change the index from the 0 based column to the name field  
    2) Hire date is parsed as a string
    
Both of these issues can be fixed by using a few parameters in the pd.read_csv() method

In [6]:
df = pd.read_csv('Data/hrdata.csv', index_col='Name', parse_dates=['Hire Date'])
df

  df = pd.read_csv('Data/hrdata.csv', index_col='Name', parse_dates=['Hire Date'])


Unnamed: 0_level_0,Hire Date,Salary,Sick Days remaining
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Graham Chapman,2014-03-15,50000.0,10
John Cleese,2015-06-01,65000.0,8
Eric Idle,2014-05-12,45000.0,10
Terry Jones,2013-11-01,70000.0,3
Terry Gilliam,2014-08-12,48000.0,7
Michael Palin,2013-05-23,66000.0,8


Another helpful feature is to provide custom column names. This is accomplished by using names parameter

In [8]:
df = pd.read_csv('Data/hrdata.csv',
                index_col = 'Employee',
                parse_dates=['Hired'],
                header=0, # setting to 0 allows us to specify custom column names
                names=['Employee', 'Hired', 'Salary', 'Sick Days'])

df

  df = pd.read_csv('Data/hrdata.csv',


Unnamed: 0_level_0,Hired,Salary,Sick Days
Employee,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Graham Chapman,2014-03-15,50000.0,10
John Cleese,2015-06-01,65000.0,8
Eric Idle,2014-05-12,45000.0,10
Terry Jones,2013-11-01,70000.0,3
Terry Gilliam,2014-08-12,48000.0,7
Michael Palin,2013-05-23,66000.0,8


## Video 6: Writing CSVs with pandas
To write a pandas dataframe to csv use the dataframe.to_csv() method and specify a save location

In [10]:
df.loc['Caleb Fornshell'] = ['2021-03-15', 123000, 8] # adding a row to the dataframe
df

Unnamed: 0_level_0,Hired,Salary,Sick Days
Employee,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Graham Chapman,2014-03-15 00:00:00,50000.0,10
John Cleese,2015-06-01 00:00:00,65000.0,8
Eric Idle,2014-05-12 00:00:00,45000.0,10
Terry Jones,2013-11-01 00:00:00,70000.0,3
Terry Gilliam,2014-08-12 00:00:00,48000.0,7
Michael Palin,2013-05-23 00:00:00,66000.0,8
Caleb Fornshell,2021-03-15,123000.0,8


In [12]:
df.to_csv('OutData/hrdata_modified.csv')