## 9.2 Files

Python views a text file as a sequence of characters and a binary file (for images, videos and more) as a sequence of bytes. As in lists and arrays, the first character in a text file and byte in a binary file is located at position 0, so in a file of n characters or bytes, the highest position number is n – 1.

For each file you open, Python creates a file object that you’ll use to interact with the file.

Standard File Objects
When a Python program begins execution, it creates three standard file objects:

sys.stdin—the standard input file object
sys.stdout—the standard output file object
sys.stderr—the standard error file object

## 9.3 Text-File Processing

9.3.1 Writing to a Text File: Introducing the with Statement

Let’s create an accounts.txt file and write five client records to the file. Generally, records in text files are stored one per line, so we end each record with a newline character:

In [2]:
with open('accounts.txt', mode='w') as accounts:
    accounts.write('100 Jones 24.98\n')
    accounts.write('200 Doe 345.67\n')
    accounts.write('300 White 0.00\n')
    accounts.write('400 Stone -42.16\n')
    accounts.write('500 Rich 224.62\n')

In [3]:
with open('accounts.txt', mode='r') as accounts:
    print(f'{"Account":<10}{"Name":<10}{"Balance":>10}')
    for record in accounts:
        account, name, balance = record.split()
        print(f'{account:<10}{name:<10}{balance:>10}')

Account   Name         Balance
100       Jones          24.98
200       Doe           345.67
300       White           0.00
400       Stone         -42.16
500       Rich          224.62


The with Statement

Python’s with statement:

acquires a resource (in this case, the file object for accounts.txt) and assigns its corresponding object to a variable (accounts in this example),
allows the application to use the resource via that variable, and
calls the resource object’s close method to release the resource when program control reaches the end of the with statement’s suite.

Built-In Function open

The built-in open function opens the file accounts.txt and associates it with a file object. The mode argument specifies the file-open mode, indicating whether to open a file for reading from the file, for writing to the file or both. 

Writing to the File

The with statement assigns the object returned by open to the variable accounts in the as clause. In the with statement’s suite, we use the variable accounts to interact with the file. In this case, we call the file object’s write method five times to write five records to the file, each as a separate line of text ending in a newline. At the end of the with statement’s suite, the with statement implicitly calls the file object’s close method to close the file.

Self Check: Create a grades.txt file and write to it the following three records consisting of student IDs, last names and letter grades:

In [4]:
with open('grades.txt', mode='w') as grades:
    grades.write('1 Red A\n')
    grades.write('2 Green B\n')
    grades.write('3 White A\n')

In [23]:
with open('grades.txt', mode='r') as grades:
    print(f'{"ID":<10}{"Last_Name":<10}{"Grade":>10}')
    for record in grades:
        ID, Last_Name, Grade = record.split()
        print(f'{ID:<10}{Last_Name:<10}{Grade:>10}')

ID        Last_Name      Grade
1         Red                A
2         Green              B
3         White              A


9.3.2 Reading Data from a Text File

In [24]:
with open('accounts.txt', mode='r') as accounts:
    print(f'{"Account":<10}{"Name":<10}{"Balance":>10}')
    for record in accounts:
        account, name, balance = record.split()
        print(f'{account:<10}{name:<10}{balance:>10}')

Account   Name         Balance
100       Jones          24.98
200       Doe           345.67
300       White           0.00
400       Stone         -42.16
500       Rich          224.62


In [25]:
with open('grades.txt', mode='r') as grades:
    print(f'{"ID":<10}{"Last_Name":<10}{"Grade":>10}')
    for record in grades:
        ID, Last_Name, Grade = record.split()
        print(f'{ID:<10}{Last_Name:<10}{Grade:>10}')

ID        Last_Name      Grade
1         Red                A
2         Green              B
3         White              A


## 9.4 Updating Text Files

Formatted data written to a text file cannot be modified without the risk of destroying other data.

Updating accounts.txt: Let’s use a with statement to update the accounts.txt file to change account 300’s name from 'White' to 'Williams' 

In [26]:
accounts = open('accounts.txt', 'r')

In [27]:
temp_file = open('temp_file.txt', 'w')

In [28]:
with accounts, temp_file:
    for record in accounts:
        account, name, balance = record.split()
        if account != '300':
            temp_file.write(record)
        else:
            new_record = ' '.join([account, 'Williams', balance])
            temp_file.write(new_record + '\n')

This with statement manages two resource objects, specified in a comma-separated list after with. The for statement unpacks each record into account, name and balance. If the account is not '300', we write record (which contains a newline) to temp_file. Otherwise, we assemble the new record containing 'Williams' in place of 'White' and write it to the file. 

In [30]:
with open('temp_file.txt', mode='r') as temp_file:
    print(f'{"Account":<10}{"Name":<10}{"Balance":>10}')
    for record in temp_file:
        account, name, balance = record.split()
        print(f'{account:<10}{name:<10}{balance:>10}')

Account   Name         Balance
100       Jones          24.98
200       Doe           345.67
300       Williams        0.00
400       Stone         -42.16
500       Rich          224.62


os Module File-Processing Functions

To complete the update, let’s delete the old accounts.txt file, then rename temp_file.txt as accounts.txt. The os module4 provides functions for interacting with the operating system, including several that manipulate your system’s files and directories. Now that we’ve created the temporary file, let’s use the remove function5 to delete the original file:

In [31]:
import os
os.remove('accounts.txt')

In [32]:
os.rename('temp_file.txt', 'accounts.txt')

In [33]:
with open('accounts.txt', mode='r') as accounts:
    print(f'{"Account":<10}{"Name":<10}{"Balance":>10}')
    for record in accounts:
        account, name, balance = record.split()
        print(f'{account:<10}{name:<10}{balance:>10}')

Account   Name         Balance
100       Jones          24.98
200       Doe           345.67
300       Williams        0.00
400       Stone         -42.16
500       Rich          224.62


Self Check: In the accounts.txt file, update the last name 'Doe' to 'Smith'.

In [35]:
accounts = open('accounts.txt', 'r')

In [36]:
temp_file = open('temp_file.txt', 'w')

In [37]:
with accounts, temp_file:
    for record in accounts:
        account, name, balance = record.split()
        if account != '200':
            temp_file.write(record)
        else:
            new_record = ' '.join([account, 'Smith', balance])
            temp_file.write(new_record + '\n')

In [38]:
import os
os.remove('accounts.txt')

In [39]:
os.rename('temp_file.txt', 'accounts.txt')

In [40]:
with open('accounts.txt', mode='r') as accounts:
    print(f'{"Account":<10}{"Name":<10}{"Balance":>10}')
    for record in accounts:
        account, name, balance = record.split()
        print(f'{account:<10}{name:<10}{balance:>10}')

Account   Name         Balance
100       Jones          24.98
200       Smith         345.67
300       Williams        0.00
400       Stone         -42.16
500       Rich          224.62


## 9.5 Serialization with JSON

JSON (JavaScript Object Notation) is a text-based, human-and-computer-readable, data-interchange format used to represent objects (such as dictionaries, lists and more) as collections of name–value pairs.

JSON Data Format
JSON objects are similar to Python dictionaries. Each JSON object contains a comma-separated list of property names and values, in curly braces. For example, the following key–value pairs might represent a client record:

Values in JSON objects and arrays can be:

strings in double quotes (like "Jones"),
numbers (like 100 or 24.98),
JSON Boolean values (represented as true or false in JSON),
null (to represent no value, like None in Python),
arrays (like [100, 200, 300]), and
other JSON objects.

Python Standard Library Module json

The json module enables you to convert objects to JSON (JavaScript Object Notation) text format. This is known as serializing the data

In [41]:
accounts_dict = {'accounts': [{'account': 100, 'name': 'Jones', 'balance': 24.98}, {'account': 200, 'name': 'Doe', 'balance': 345.67}]}

Serializing an Object to JSON
Let’s write that object in JSON format to a file:

In [42]:
import json

In [43]:
with open('accounts.json', 'w') as accounts:
    json.dump(accounts_dict, accounts)

This opens the file accounts.json and uses the json module’s dump function to serialize the dictionary accounts_dict into the file.

Deserializing the JSON Text

The json module’s load function reads the entire JSON contents of its file object argument and converts the JSON into a Python object. This is known as deserializing the data. 

In [44]:
with open('accounts.json', 'r') as accounts:
    accounts_json = json.load(accounts)

We can now interact with the loaded object. For example, we can display the dictionary:

In [45]:
accounts_json

{'accounts': [{'account': 100, 'name': 'Jones', 'balance': 24.98},
  {'account': 200, 'name': 'Doe', 'balance': 345.67}]}

As you’d expect, you can access the dictionary’s contents. Let’s get the list of dictionaries associated with the 'accounts' key:

In [47]:
accounts_json['accounts']

[{'account': 100, 'name': 'Jones', 'balance': 24.98},
 {'account': 200, 'name': 'Doe', 'balance': 345.67}]

Now, let’s get the individual account dictionaries:

In [48]:
accounts_json['accounts'][0]

{'account': 100, 'name': 'Jones', 'balance': 24.98}

In [49]:
accounts_json['accounts'][1]

{'account': 200, 'name': 'Doe', 'balance': 345.67}

Displaying the JSON Text

The json module’s dumps function (dumps is short for “dump string”) returns a Python string representation of an object in JSON format. Using dumps with load, you can read the JSON from the file and display it in a nicely indented format—sometimes called “pretty printing” the JSON. When the dumps function call includes the indent keyword argument, the string contains newline characters and indentation for pretty printing—you   also can use indent with the dump function when writing to a file:

In [50]:
with open('accounts.json', 'r') as accounts:
    print(json.dumps(json.load(accounts), indent=4))

{
    "accounts": [
        {
            "account": 100,
            "name": "Jones",
            "balance": 24.98
        },
        {
            "account": 200,
            "name": "Doe",
            "balance": 345.67
        }
    ]
}


Self check: Create a JSON file named grades.json and write into it the following dictionary:

In [51]:
grades_dict = {'gradebook':
    [{'student_id': 1, 'name': 'Red', 'grade': 'A'},
     {'student_id': 2, 'name': 'Green', 'grade': 'B'},
     {'student_id': 3, 'name': 'White', 'grade': 'A'}]}

In [52]:
import json

In [53]:
with open('grades.json', 'w') as grades:
    json.dump(grades_dict, grades)

In [54]:
with open('grades.json', 'r') as grades:
    print(json.dumps(json.load(grades), indent=4))

{
    "gradebook": [
        {
            "student_id": 1,
            "name": "Red",
            "grade": "A"
        },
        {
            "student_id": 2,
            "name": "Green",
            "grade": "B"
        },
        {
            "student_id": 3,
            "name": "White",
            "grade": "A"
        }
    ]
}


## 9.6 Focus on Security: pickle Serialization and Deserialization

The Python Standard Library’s pickle module can serialize objects into in a Python-specific data format. Caution: The Python documentation provides the following warnings about pickle:

Pickle files can be hacked.
Pickle is a protocol which allows the serialization of arbitrarily complex Python objects. As such, it is specific to Python and cannot be used to communicate with applications written in other languages.

We do not recommend using pickle, but it’s been used for many years, so you’re likely to encounter it in legacy code—old code that’s often no longer supported. For this reason, we’ve included an end-of-chapter pickle exercise, which explains how to use it.

## 9.7 Additional Notes Regarding Files

'r' Open a text file for reading
'w' Open a text file for writing. Existing file contents are deleted
'a' Open a text file for appending at the end, creating the file if it does not exsist. New data is written at the end of the file.
'r+' Open a text file for reading and writing
'w+' Open a text file for reading and writing, exsisting file contents are deleted.
'a+' Open a text file for reading and appending at the end. New data is written at the end of the file. If the file does not exist it is created. 

Other File Object Methods
Here are a few more useful file-object methods.

For a text file, the read method returns a string containing the number of characters specified by the method’s integer argument. For a binary file, the method returns the specified number of bytes. If no argument is specified, the method returns the entire contents of the file.
The readline method returns one line of text as a string, including the newline character if there is one. This method returns an empty string when it encounters the end of the file.
The writelines method receives a list of strings and writes its contents to a file.

## 9.8 Handling Exceptions

Various types of exceptions can occur when you work with files, including:

A FileNotFoundError occurs if you attempt to open a non-existent file for reading with the 'r' or 'r+' modes.
A PermissionsError occurs if you attempt an operation for which you do not have permission. 
A ValueError (with the error message 'I/O operation on closed file.') occurs when you attempt to write to a file that has already been closed.

9.8.1 Division by Zero and Invalid Input

Division By Zero  
Recall that attempting to divide by 0 results in a ZeroDivisionError:

In [1]: 10 / 0
-------------------------------------------------------------------------
ZeroDivisionError                       Traceback (most recent call last)
<ipython-input-1-a243dfbf119d> in <module>()
----> 1 10 / 0

ZeroDivisionError: division by zero

In this case, the interpreter is said to raise an exception of type ZeroDivisionError. When an exception is raised in IPython, it:

terminates the snippet,
displays the exception’s traceback, then
shows the next In [] prompt so you can input the next snippet.

If an exception occurs in a script, IPython terminates the script and displays the exception’s traceback.

Invalid Input 
Recall that the int function raises a ValueError if you attempt to convert to an integer a string (like 'hello') that does not represent a number:

In [2]: value = int(input('Enter an integer: '))
Enter an integer: hello
-------------------------------------------------------------------------
ValueError                               Traceback (most recent call last)
<ipython-input-2-b521605464d6> in <module>()
----> 1 value = int(input('Enter an integer: '))

ValueError: invalid literal for int() with base 10: 'hello'

9.8.2 try Statements

The script uses exception handling to catch and handle (i.e., deal with) any ZeroDivisionErrors and ValueErrors that arise—in this case, allowing the user to re-enter the input.

In [None]:
"""Simple exception handling example."""

while True:
    # attempt to convert and divide values
    try:
        number1 = int(input('Enter numerator: '))
        number2 = int(input('Enter denominator: '))
        result = number1 / number2
    except ValueError:  # tried to convert non-numeric value to int
        print('You must enter two integers\n')
    except ZeroDivisionError:  # denominator was 0
        print('Attempted to divide by zero\n')
    else:  # executes only if no exceptions occur
        print(f'{number1:.3f} / {number2:.3f} = {result:.3f}')
        break  # terminate the loop

try Clause
Python uses try statements (like lines 6–16) to enable exception handling. The try statement’s try clause (lines 6–9) begins with keyword try, followed by a colon (:) and a suite of statements that might raise exceptions.

except Clause
A try clause may be followed by one or more except clauses (lines 10–11 and 12–13) that immediately follow the try clause’s suite. These also are known as exception handlers. Each except clause specifies the type of exception it handles. In this example, each exception handler just displays a message indicating the problem that occurred.

else Clause
After the last except clause, an optional else clause (lines 14–16) specifies code that should execute only if the code in the try suite did not raise exceptions. If no exceptions occur in this example’s try suite, line 15 displays the division result and line 16 terminates the loop.

Self Check: 

In [None]:
def try_it(value)
    try:
        x = int(value)
    except ValueError:
        print(f'{value} could not be converted to an integer')
    else:
        print(f'int({value}) is {int(value)}')

In [None]:
try_it(10.7)

In [None]:
try_it('Python')

## 9.9 finally Clause

The following IPython session demonstrates that the finally clause always executes, regardless of whether an exception occurs in the corresponding try suite. First, let’s consider a try statement in which no exceptions occur in the try suite:

In [1]:
try:
    print('try suite with no exceptions raised')
except:
    print('this will not execute')
else:
    print('else executes because no exceptions in the try suite')
finally:
    print('finally always executes')

try suite with no exceptions raised
else executes because no exceptions in the try suite
finally always executes


Now let’s consider a try statement in which an exception occurs in the try suite:

In [2]:
try:
    print('try suite that raises an exception')
    int('hello')
    print('this will not execute')
except ValueError:
    print('a ValueError occurred')
else:
    print('else will not execute because an exception occurred')
finally:
    print('finally always executes')

try suite that raises an exception
a ValueError occurred
finally always executes


Combining with Statements and try…except Statements

In [4]:
try:
    with open('gradez.txt', 'r') as accounts:
        print(f'{"ID":<3}{"Name":<7}{"Grade"}')
        for record in accounts:
            student_id, name, grade = record.split()
            print(f'{student_id:<3}{name:<7}{grade}')
except FileNotFoundError:
    print('The file name you specified does not exist')

The file name you specified does not exist


Self Check: Before executing the IPython session, determine what the following function displays if you call it with the value 10.7, then the value 'Python'?

In [6]:
def try_it(value):
    try:
       x = int(value)
    except ValueError:
       print(f'{value} could not be converted to an integer')
    else:
       print(f'int({value}) is {int(value)}')
    finally:
       print('finally executed')

In [7]:
try_it(10.7)

int(10.7) is 10
finally executed


In [8]:
try_it('Python')

Python could not be converted to an integer
finally executed


## 9.10 Explicitly Raising an Exception

The raise statement explicitly raises an exception. The simplest form of the raise statement is:
raise ExceptionClassName

## 9.11 (Optional) Stack Unwinding and Tracebacks

Each exception object stores information indicating the precise series of function calls that led to the exception. This is helpful when debugging your code. Consider the following function definitions—function1 calls function2 and function2 raises an Exception:

In [9]:
def function1():
   ...:     function2()

In [10]:
def function2():
   ...:     raise Exception('An exception occurred')
   ...:

In [11]:
function1()

Exception: An exception occurred

Traceback Details

The traceback shows the type of exception that occurred (Exception) followed by the complete function call stack that led to the raise point. The stack’s bottom function call is listed first and the top is last, so the interpreter displays the following text as a reminder: Traceback (most recent call last)

Stack Unwinding

When an exception is not caught in a given function, stack unwinding occurs. Let’s consider stack unwinding in the context of this example:

In function2, the raise statement raises an exception. This is not in a try suite, so function2 terminates, its stack frame is removed from the function-call stack, and control returns to the statement in function1 that called function2.

In function1, the statement that called function2 is not in a try suite, so function1 terminates, its stack frame is removed from the function-call stack, and control returns to the statement that called function1—snippet [3] in the IPython session.

The call in snippet [3] call is not in a try suite, so that function call terminates. Because the exception was not caught (known as an uncaught exception), IPython displays the traceback, then awaits your next input. If this occurred in a typical script, the script would terminate.10

Exceptions in finally Suites

Raising an exception in a finally suite can lead to subtle, hard-to-find problems. If an exception occurs and is not processed by the time the finally suite executes, stack unwinding occurs. If the finally suite raises a new exception that the suite does not catch, the first exception is lost, and the new exception is passed to the next enclosing try statement. For this reason, a finally suite should always enclose in a try statement any code that may raise an exception, so that the exceptions will be processed within that suite.

## 9.12 Intro to Data Science: Working with CSV Files

9.12.1 Python Standard Library Module csv

In [14]:
import csv

In [15]:
with open('accounts.csv', mode='w', newline='') as accounts:
   ...:     writer = csv.writer(accounts)
   ...:     writer.writerow([100, 'Jones', 24.98])
   ...:     writer.writerow([200, 'Doe', 345.67])
   ...:     writer.writerow([300, 'White', 0.00])
   ...:     writer.writerow([400, 'Stone', -42.16])
   ...:     writer.writerow([500, 'Rich', 224.62])

The .csv file extension indicates a CSV-format file. The csv module’s writer function returns an object that writes CSV data to the specified file object. Each call to the writer’s writerow method receives an iterable to store in the file. Here we’re using lists. By default, writerow delimits values with commas, but you can specify custom delimiters.12 After the preceding snippet, accounts.csv contains:

Reading from a CSV File

In [16]:
with open('accounts.csv', 'r', newline='') as accounts:
   ...:     print(f'{"Account":<10}{"Name":<10}{"Balance":>10}')
   ...:     reader = csv.reader(accounts)
   ...:     for record in reader:
   ...:         account, name, balance = record
   ...:         print(f'{account:<10}{name:<10}{balance:>10}')

Account   Name         Balance
100       Jones          24.98
200       Doe           345.67
300       White            0.0
400       Stone         -42.16
500       Rich          224.62


Self Check: create a text file named grades.csv and write to it the following three records consisting of student IDs, last names and letter grades:

In [17]:
import csv

In [19]:
with open('grades.csv', mode='w', newline='') as grades:
   ...:     writer = csv.writer(grades)
   ...:     writer.writerow([1, 'Red', 'A'])
   ...:     writer.writerow([2, 'Green', 'B'])
   ...:     writer.writerow([3, 'White', 'A'])

In [20]:
with open('grades.csv', 'r', newline='') as grades:
   ...:     print(f'{"ID":<10}{"Name":<10}{"Grade":>10}')
   ...:     reader = csv.reader(grades)
   ...:     for record in reader:
   ...:         ID, name, Grade = record
   ...:         print(f'{ID:<10}{name:<10}{Grade:>10}')

ID        Name           Grade
1         Red                A
2         Green              B
3         White              A


9.12.2 Reading CSV Files into Pandas DataFrames

The popular Rdatasets repository provides links to over 1100 free datasets in comma-separated values (CSV) format. These were originally provided with the R programming language for people learning about and developing statistical software, though they are not specific to R. They are now available on GitHub at:
https://vincentarelbundock.github.io/Rdatasets/datasets.html
This repository is so popular that there’s a pydataset module specifically for accessing Rdatasets.

Working with Locally Stored CSV Files 

You can load a CSV dataset into a DataFrame with the pandas function read_csv. The following loads and displays the CSV file accounts.csv that you created earlier in this chapter:


In [21]:
import pandas as pd

In [22]:
df = pd.read_csv('accounts.csv',
   ...:                  names=['account', 'name', 'balance'])

In [23]:
df

Unnamed: 0,account,name,balance
0,100,Jones,24.98
1,200,Doe,345.67
2,300,White,0.0
3,400,Stone,-42.16
4,500,Rich,224.62


9.12.3 Reading the Titanic Disaster Dataset: see project part 1 notebook

Self Check:  Load the grades.csv file you created in the Section 9.12.1’s Self Check into a DataFrame, then display it.

In [24]:
pd.read_csv('grades.csv', names=['ID', 'Name', 'Grade'])

Unnamed: 0,ID,Name,Grade
0,1,Red,A
1,2,Green,B
2,3,White,A
