## Overview

The Python standard library contains many useful functions, including to read and write files. In this lesson we will cover:
* how to open a file and read data from it
* how to write text to a file

### Basic Directory Commands

In Jupyter notebooks, you can use Unix commands in a cell.
`pwd` will show the present working directory.

In [3]:
pwd

'C:\\Users\\ng_sm\\Documents\\CodeForasia\\Week 2 Notebooks'

You can use `ls` to **list** the contents of the present working directory.

In [2]:
ls


 Volume in drive C is Windows
 Volume Serial Number is A097-F9F5

 Directory of C:\Users\ng_sm\Documents\CodeForasia\Week 2 Notebooks

08/01/2022  16:26    <DIR>          .
07/01/2022  15:23    <DIR>          ..
08/01/2022  14:21    <DIR>          .ipynb_checkpoints
12/12/2021  22:38                98 13002.txt
08/01/2022  11:20            17,338 Week 2 Notebook 1 Collections.ipynb
08/01/2022  16:26             6,923 Week 2 Notebook 2 Files.ipynb
               3 File(s)         24,359 bytes
               3 Dir(s)  339,150,520,320 bytes free


In the same directory as this notebook file, you should have downloaded and saved a text file `13002.txt`, and it should appear in the listing of files above.

If not, save it into this directory so that you can read it in easily.

You can open the file using Jupyter, using the File->Open command in the menu above. You will find that it contains information about an invoice, written in plain text. Let's try to read in the data using Python.

## Reading Files


To access a file, you can just use the Python standard library function `open()`. 

The default mode is for reading.

In [13]:
#open a file for reading, with default mode
myfile = open('13002.txt')


In [9]:
print(myfile)

<_io.TextIOWrapper name='13002.txt' mode='r' encoding='cp1252'>


You can see that when we print `myfile`, it gives us the filename and the mode.

To read the data from the file, you can use the `readline()` function on the object `myfile`.

In [154]:
# Execute this cell again to read another line
mydataAsLines = myfile.readline()
print(mydataAsLines)

ValueError: I/O operation on closed file.

As you can see, it has read one line of the data. If you run the line in the cell above again, you will see that it will read another line. If there are no more lines to read there will be no output.

You can also read a specific number of characters (bytes) by using the `read()` function. 

In [41]:
# close the file before reopening 
myfile.close()

# open the file again
myfile = open('13002.txt', 'r')

# read 4 characters. 
myfile.read(4)

'Invo'

Try to change the argument value to read more characters. If the argument is -1 or left blank, the whole file will be read. 

In [44]:
# close the file from the previous open
myfile.close()

# myfile.read() will read the whole file.
myfile = open('13002.txt', 'r')
myfile.read()

# then close the file
myfile.close()

'Invoice No: 13002\nCustomer Name: Lighthouse Entertainment\nDate: 13 Jan 2020\nInvoice Amount: $45.60'

You can see that the file is read as a single string, including the newline character, `\n` at the end of each line. 

You can use the `print()` function to print out the string.

In [56]:
# myfile.read() will read the whole file.
myfile = open('13002.txt', 'r')

# using the print() function will print the string that is read
print(myfile.read())

myfile.close()

Invoice No: 13002
Customer Name: Lighthouse Entertainment
Date: 13 Jan 2020
Invoice Amount: $45.60


You can also read *all* the lines, using the `readlines()` function. This will store all the lines as elements of a list.


In [59]:
# myfile.read() will read the whole file.
myfile = open('13002.txt', 'r')

# using the readlines() function. 
mydata = myfile.readlines()

# This stores line as an element in a list.
print(mydata)

myfile.close()

['Invoice No: 13002\n', 'Customer Name: Lighthouse Entertainment\n', 'Date: 13 Jan 2020\n', 'Invoice Amount: $45.60']


This time, the data is stored as a list, where the first element in index 0 is `'Invoice No: 13002\n'`. 

You can also process each line of the file using a `for` loop.

In [155]:
# open the file
myfile = open('13002.txt', 'r')

# process each line
for line in myfile:
    # separate the line at the colon
    parts = line.split(":")
    
    # print the split parts 
    print(parts)
    
    # print only the data
    print(parts[1])

myfile.close()
    

['Invoice No', ' 13002\n']
 13002

['Customer Name', ' Lighthouse Entertainment\n']
 Lighthouse Entertainment

['Date', ' 13 Jan 2020\n']
 13 Jan 2020

['Invoice Amount', ' $45.60']
 $45.60


## Writing to Files

You can also open files for writing, using `mode = 'w'`. If the file does not exist, it will be created. 

In [80]:
#open a file for writing
anotherfile= open('sample.txt', 'w')
anotherfile.write('Invoice No: 13003\n')
anotherfile.close()


Let's check the file contents by reading it.  

In [81]:
anotherfile = open('sample.txt','r')
print(anotherfile.readline())
anotherfile.close()

Invoice No: 13003



However, if you open an existing file for writing again, the contents will be deleted.

In [118]:
# open the file for writing, again
anotherfile= open('sample.txt', 'w')
anotherfile.write('testing')
anotherfile.close()

In [119]:
# check the file contents again
anotherfile = open('sample.txt','r')
print(anotherfile.readline())
anotherfile.close()

testing


### Appending Data

If you do not want to overwrite an existing file, you can *append* lines to it. 

In order to append data to the file, we use `mode = 'a'`. This means that data will be added to the end of the file.

Let's say we have an existing file, 'invoices.csv'. Let's read the file to check it's contents first.


In [144]:
# open the file for reading
myfile = open('invoices.csv', 'r')
print(myfile.read())
myfile.close()

Invoice No, Customer Name, Date, Invoice Amount
13002, Lighthouse Entertainment, 13 Jan 2020, $45.60
13003, Main Street News, 13 Jan 2020, $100.20
13003, Lee Enterprise, 14 Jan 2020, $30.00
13004, Raju Store, 14 Jan 2020, $300.2013005, One more row, 15 Jan 2020, $250.30
13005, One more row, 15 Jan 2020, $250.30



We will open the file to append, by setting the mode to 'a'. The file pointer will be set at the end of the file.

In [140]:
# Open the file to append data
myfile = open('invoices.csv', 'a')
myfile.write('13005, Additional Row, 14 Jan 2020, $100.00\n')
myfile.close()

You can open a file for a combination of reading and writing by specifying the mode. 
* `mode = 'r+'` opens the file for reading and writing, by setting the file pointer at the beginning of the file, ready to read.
* `mode = 'a+'` opens the file for appending and reading, by setting the file pointer at the end of the file, ready for adding more data
* `mode = 'w+'` opens the file for writing and reading, by setting the file pointer at the beginning of the file, ready to write and thus overwriting existing data.
* `file.seek(0)` will set the pointer back to the beginning of the file.

In [153]:
#open the file again, this time to read and write.
# mode = 'r+' will start with the file handle at the beginning of the file, 
# so that you can read first
# open with mode = 'r+' will point at the beginning of the file
myfile = open('invoices.csv', 'r+')

# read the file and print its contents
print('Before appending new data')
print(myfile.read())

# write one more row
myfile.write('13005, One more row, 15 Jan 2020, $250.30\n')

# go back to the beginning of the file
myfile.seek(0)

# read again
print('After appending new data')
print(myfile.read())

myfile.close()


Before appending new data
Invoice No, Customer Name, Date, Invoice Amount
13002, Lighthouse Entertainment, 13 Jan 2020, $45.60
13003, Main Street News, 13 Jan 2020, $100.20
13003, Lee Enterprise, 14 Jan 2020, $30.00
13004, Raju Store, 14 Jan 2020, $300.20

After appending new data
Invoice No, Customer Name, Date, Invoice Amount
13002, Lighthouse Entertainment, 13 Jan 2020, $45.60
13003, Main Street News, 13 Jan 2020, $100.20
13003, Lee Enterprise, 14 Jan 2020, $30.00
13004, Raju Store, 14 Jan 2020, $300.20
13005, One more row, 15 Jan 2020, $250.30



## Exercise

Write the code to open the 'invoices.csv' file for reading, and then print only the heading 'Invoice No' and the data for the invoice numbers.

Use the following comments to guide you

In [152]:
# open the file for reading
myfile = open('invoices.csv', 'r')

# read all the lines
lines = myfile.readlines()

# header is the first element in the list of lines
header = lines[0]

# use a split() function to separate the header names by the comma
headerNames = header.split(',')

# print the first element of the header names after splitting
print(headerNames[0])

# use a for loop (with a slice starting at index 1) for the rest of the lines
for line in lines[1:]:
    # split at the comma
    data = line.split(',')     
    
    # print the first element
    print(data[0])                 
    
# close the file
myfile.close()

Invoice No
13002
13003
13003
13004
13005
13005


Being able to extract just the invoice numbers is useful, because it will help us to keep track of the running number.

In this lesson, we have learned how to read and write from files. However, there are many data science libraries that have simplified these operations, and we will make use of the pandas library to read CSV files.

