<center>
  <a href="PP-07-InputsAndOutputs.ipynb" target="_self">Inputs and Outputs</a> | <a href="./">Content Page</a> | <a href="PP-09-Numpy.ipynb">The Numpy Library</a>
</center>

# <center>FILE HANDLING</center>
<center><b>Copyright &copy 2023 by DR DANNY POO</b><br> e:dannypoo@nus.edu.sg<br> w:drdannypoo.com</center><br>

Up until now, our discussion has been on reading and writing to the standard input (the keyboard) and output (the screen). 

Here, we will extend our discussion on inputs and outputs to the use of data files; in particular, we will examine how to manipulate text and binary data files in Python. 

# 1. What is a File?
A file is a construct that resides on secondary storage. It is used to store data that can persist beyond the execution cycle of a program i.e. the data are still available even when the computer machine in which the Python programs execute is turned off. Furthermore, primary data storage is volatile and data will be lost when the computer is turned off.

A file is a contiguous set of bytes that store data. The data can be organized as texts or in program executable forms. 

There are two types of data files that we will deal with in this chapter:
<ol>
<li><b>Text files</b>: Data are organized as human-readable strings with an end-of-line (EOL) character to indicate each line’s termination. A new line character (“\n”) is the default EOL terminator. Text files are generally identified by files with extension “.txt”, “.csv”, “.tsv”, “.json”, “.xml”, etc.</li>
<li><b>Binary files</b>: Data are represented as 1s and 0s. They are not human-readable and neither do they have EOL characters. Binary file type returns bytes. They are used for storing non-text files such as images (with file extension “.png”, “.bmp”, “.jpg”, “.tiff”, “.gif”, etc.) and executable programs with extension “.exe”, “.py”, “.class”, etc.</li>
</ol><br>
A text file is generally structured into three parts:
<ol>
<li><b>Header</b>: information on the contents of the file e.g. file name, size, type, etc.</li>
<li><b>Data</b>: the contents of the file.</li>
<li><b>End of File (EOF)</b>: a special character indicating the end of the file.</li>
</ol><br>
All files stored in computer systems are located and accessed via a file path. The file path is a string that represents the location of the file. It has three parts:
<ol>
<li><b>Folder Path</b>: the directory location of a file in the file system. Folders in a directory are separated by slash (“/”) in Unix and backslash (“\”) in Windows. </li>
<li><b>File Name</b>: the actual name of the file.</li>
<li><b>Extension</b>: the end of the file path pre-pended with a period(“.”) used to indicate the file type.</li>
</ol><br>
All data in a file are stored in bytes which are encoded from byte data to human readable characters.<br> Encoding basically assigns a numerical value to represent a character. <br>Common encoding methods include ASCII (American Standard Code for Information Interchange) and UNICODE (also known as Universal Coded Character Set). <br>ASCII can store up to a maximum of 128 characters while UNICODE can contain up to 1,114,112 characters. <br>ASCII is a subset of UNICODE (UTF-8); this means that what is represented in ASCII also has the same numerical to character values in UNICODE. <br>

# 2. Text File
File operations in Python are handled using a `file` object (known as a handle) and they generally take the following three steps:
<ol>
<li>Open a file</li>
<li>Read or write to the file</li>
<li>Close the file</li>
</ol>

## 2.1 Opening and Closing a Text File
Any operation on a file can only proceed after a file is opened. <br>Python’s built-in `open()`function is used to open a file.<br>This function creates a `file` object which is used to call other support methods associated with it. 

To close a file, we call the `close()` method on the `file` object. <br>This method flushes any unwritten information and closes the `file` object, after which no more writing can be done on it. <br>Python also automatically closes a file when the reference `file` object is reassigned to another file. <br>In any case, it is good practice to use the `close()` method to close a file.<br>


### The open() Function
The Python ``open()`` function has the following syntax:
```python
file object = open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, 
                   closefd=True, opener=None)
```
where:<br>
<ul>
<li><b>file</b>: A string value of the name of the file to open.</li><br>
<li><b>mode</b>: Optional. <br>A string that specifies the mode in which the file is opened. <br>Default to ‘r’ which means open for reading in <b>text mode</b>. <br>See Table below for the list of possible values for mode.</li><br>
<li><b>buffering</b>: Optional. <br>
* If set to 0, no buffering takes place. <br>
* If set to 1, line buffering is performed while accessing a file. <br>
* If set to greater than 1, buffering action is performed with the indicated buffer size. <br>
* If set to negative, the buffer size is the system default value.</li>

<li><b>encoding</b>: Optional. <br>Encoding used to decode or encode the file. Default is None.</li><br>
<li><b>errors</b>: Optional. <br>A string that specifies how encoding and decoding errors are to be handled – this cannot be used in binary mode.</li><br>
<li><b>newline</b>: Optional. <br>It controls how universal newlines mode works – applies only to text mode. Default is None.</li><br>
<li><b>closefd</b>: Optional. <br>
* If a file descriptor rather than a filename is given and if “closefd” is False, then the underlying file descriptor will be kept open when the file is closed. 
<br>* If a filename is given then “closefd” must be True (the default) otherwise an error will be raised.</li>
</ul><br>
Examples:<br>


```python
file = open("D:\python\python.txt")      # open file for reading in text mode (default)
file = open("D:\python\python.txt", 'w') # open python.txt file for writing in text mode
file = open("D:\python\doc.txt", 'w+')   # open new file for reading and writing in text mode
file = open("D:\python\pix.gif", 'rb+')  # open file for reading and writing in binary mode
file = open("D:\python\python.txt", mode='r', encoding='utf-8') # open with 'utf-8' encoding
```
<br>

![image.png](attachment:image.png)



### ``file`` Object Attributes

![image.png](attachment:image.png)

In [None]:
# Open a file
file = open(".\data\python.txt", mode='r', encoding='utf-8')
print(file.mode)
print(file.name)
print(file.closed)

### The close() Method
When we are done performing the operations on a file, the file needs to be properly closed using the opened file object’s ``close()`` method. Closing a file will free up the resources that were tied with the file. Therefore, it is good programming practice to close the file when it is no longer required.
```python
fileObject.close()  
```

In [None]:
# Close a file
file = open(".\data\python.txt")
print(file.close())
print(file.closed)

In [None]:
# Use the “try-except-finally” statement to ensure file is always closed when it is no longer required
file = open(".\data\python.txt", 'r')
try:
    print(file.readlines())
except Exception:
    print("Error: Exception called")
finally:
    file.close()
    print("File is closed? ", file.closed)

### The "with" Statement
To achieve the same effect using the ``try-finally`` statement to close a file, use a ``with`` statement to open a file, perform the necessary operations on the file and then automatically closes it upon exit.

In [None]:
# File operation is easier with the use of the "with" statement
with open(".\data\python.txt", 'r') as file:
    for line in file:       # read line by line
        line = line.strip() # strip off whitespaces and newline
        print(line)
print("File is closed?", file.closed)

## 2.2 Reading and Writing a Text File
To read data from an opened file, we can use `read()`, `readline()`, `readlines()` and `next()` method on the file.<br>
To write a string (which can be a text or binary data) to an opened file, we use the `write()` method on the file. 

### The read(), readline(), readlines() and next() Method 
![image.png](attachment:image.png)

In [None]:
# read()
file = open(".\data\python.txt", 'r')
print(file.read(10)) # read the first 10 characters
print(file.read(10)) # read the next 10 characters
print(file.read())   # read in the rest till the EOF
print(file.read())   # further read returns empty string
file.close()
print("File is closed?", file.closed)

In [None]:
# readline() and readlines()
file = open(".\data\python.txt", 'r')
print(file.readline())  # read next line into a string (newline included)
print(file.readlines()) # read all (next) lines into a list of strings
print(file.readline())  # return an empty string after EOF
file.close()
print("File is closed?", file.closed)

In [None]:
# Read each line using readline() in a while loop
file = open(".\data\python.txt", 'r')
line = file.readline()         # read next line into a string (newline included)
while line:                    # while line is not an empty string
    line = line.rstrip()       # read next line into a string 
                               # stripped of spaces and newline
    print(line) 
    line = file.readline()
file.close()
print("File is closed?", file.closed)

In [None]:
# next()
file = open(".\data\python.txt", 'r')
try:
    line = file.readline()       
    while line:                   
        line = line.rstrip()       
        print(line) 
        line = next(file)
except StopIteration:
    file.close()

### The write() Method
![image.png](attachment:image.png)

In [None]:
# Copy one file to another. 
fileIn  = ".\data\python.txt"
fileOut = ".\data\doc.txt"
with open(fileIn, 'r') as fIn, open(fileOut, 'w') as fOut:
    for line in fIn:   # read line by line
        fOut.write(line)
print("Files closed?", fIn.closed, fOut.closed)

In [None]:
# To write a sequence into a file using writelines() method
fileOut  = ".\data\sequence.txt"
sequence = ["This is line 1.\n", "1020\n", "This is line 2.\n"] # a sequence
with open(fileOut, 'w') as file:
    file.writelines(sequence)                                     

### Appending Data to a File

In [None]:
# To add a line of text to an existing file
fileIn  = ".\data\python.txt"            # the text file
with open(fileIn, 'a+') as file:            # opens text file for appending texts
    file.write("Appended line: Dr Danny Poo") # write this line in append mode

### File Position, tell() and seek() Method
There is a pointer that keeps track of the current position within a file where the next read or write will occur. The current position is the number of bytes from the beginning of the file. 

The ``tell()`` method tells the current position of the pointer in the file and the ``seek()`` method allows for change to the current file position.

``tell()`` method has the following syntax; it does not take in any arguments; and it returns the current position of the file read/write pointer within the file:

```python
fileObject.tell()  
```
``seek()`` method has the following syntax; it sets the file’s current position at the offset; it does not return any value:

```python
fileObject.seek(offset [, from])   
```

where:<br>
<b>offset</b>: This is the position of the read/write pointer within the file. <br>
<b>from</b>: Optional. Specifies the reference position.
* 0: sets the reference position at the beginning of the file.
* 1: sets the reference position at the current file position.
* 2: sets the reference position at the end of the file.
* Default is 0. Reference position at current position or end of file (i.e. value of 1 and 2) cannot be set in text mode unless the offset is 0.

In [None]:
# tell() and seek()
fileIn  = ".\data\python.txt"
with open(fileIn, 'r+') as file: # open text file for reading and writing
    line = file.readline()       # read a line
    print("Line: %s" %(line))    # print the line together with newline
    print(file.tell())           # tell the pointer position which is 41
    file.seek(100, 0)            # goto position 100 from the beginning
    print(file.tell())           # tell the current position which is 100
    file.write("==TEST 1==")     # write line at position 100
    print(file.tell())           # tell the current position which is 110

In [None]:
# seek() and tell()
fileIn  = ".\data\python.txt"
with open(fileIn, 'rb+') as file:                    # open file for reading and writing in binary mode
    file.seek(97, 0)                                 # position pointer to 97 from beginning
    print(file.tell())                               # tell the pointer position which is 97
    byteArray = [69, 83, 83, 69, 78, 84, 73, 65, 76] # create byte array “ESSENTIAL”
    bytes = bytearray(byteArray)                     # form byte array in binary format
    file.write(bytes)                                # write bytes at from position 97
    print(file.tell())                               # tell the current position which is 106
    file.seek(-30,2)                                 # seek position 30 offset from the end of file
    print(file.readline().decode('utf-8'))           # read a binary line and decode and print bytes as string

### File Methods
Complete list of methods in text mode. 
![image.png](attachment:image.png)

## 2.3 Renaming and Deleting a File
To rename or delete a file within Python, use the `os` module which provides methods for performing file-processing operations. The `os` module must be imported first before files can be renamed or deleted.

### The rename() Method
The `rename()` method takes two arguments: current file name and the new file name:
```python  
os.rename(current_file_name, new_file_name)  
```

In [None]:
# Renaming doc1.txt to doc2.txt
import os
file = open("doc1.txt", 'w')
file.write("A line for doc1 file.")
file.close()
os.rename("doc1.txt", "doc2.txt")
file = open("doc2.txt")
print(file.readline())
file.close()

### The remove() Method
The `remove()`method deletes a file by supplying the file name to be deleted:
```python  
os.remove(file_name)  
```

In [None]:
# Delete doc2.txt
import os
os.remove("doc2.txt")

## 2.4 Directory and File Management
Files are stored in folders or directories. The `os` module has methods that allows for creation, removal or change of directories.

Directory manipulation methods are given in this table:
![image.png](attachment:image.png)

<center>
  <a href="PP-07-InputsAndOutputs.ipynb" target="_self">Inputs and Outputs</a> | <a href="./">Content Page</a> | <a href="PP-09-Numpy.ipynb">The Numpy Library</a>
</center>