### Reading and Writing Files

Variables are a fine way to store data while your program is running, but if you want your data to persist even after your program has finished, you need to save it to a file. You can think of a file’s contents as a single string value, potentially gigabytes in size. In this chapter, you will learn how to use Python to create, read, and save files on the hard drive.

### Files and File Paths

A file has two key properties: a filename (usually written as one word) and a path. The path specifies the location of a file on the computer. For example, there is a file on my Windows 7 laptop with the filename project.docx in the path C:\Users\asweigart\Documents. The part of the filename after the last period is called the file’s extension and tells you a file’s type. project.docx is a Word document, and Users, asweigart, and Documents all refer to folders (also called directories). Folders can contain files and other folders

In [1]:
import os

In [2]:
os.path.join('usr', 'bin', 'spam') # double blackslash to be escaped

'usr\\bin\\spam'

On Windows, paths are written using backslashes (\) as the separator between folder names. OS X and Linux, however, use the forward slash (/) as their path separator. If you want your programs to work on all operating systems, you will have to write your Python scripts to handle both cases.

In [3]:
# Joins names from a list of filenames to the end of a folder’s name
myFiles = ['accounts.txt', 'details.csv', 'invite.docx']
for filename in myFiles:
    print(os.path.join('C:\\Users\\asweigart', filename))

C:\Users\asweigart\accounts.txt
C:\Users\asweigart\details.csv
C:\Users\asweigart\invite.docx


### The Current Working Directory

Every program that runs on your computer has a current working directory, or cwd. Any filenames or paths that do not begin with the root folder are assumed to be under the current working directory. You can get the current working directory as a string value with the os.getcwd() function and change it with os.chdir(). 

In [9]:
# Get current working directory
# os.getcwd()

### Absolute vs. Relative Paths

There are two ways to specify a file path.

    An absolute path, which always begins with the root folder

    A relative path, which is relative to the program’s current working directory

There are also the dot (.) and dot-dot (..) folders. These are not real folders but special names that can be used in a path. A single period (“dot”) for a folder name is shorthand for “this directory.” Two periods (“dot-dot”) means “the parent folder.”

### Creating New Folders with os.makedirs()

Your programs can create new folders (directories) with the os.makedirs() function. Enter the following into the interactive shell:

import os

os.makedirs('C:\\delicious\\walnut\\waffles')

In [12]:
os.path.abspath(' .')
os.path.isabs(os.path.abspath('.'))

True

In [11]:
# Handling absolute and relative paths
os.path.abspath('.')
os.path.abspath('.\\Scripts')
os.path.isabs(os.path.abspath('.'))

True

In [13]:
# If you need a path’s dir name and base name together, you can just call os.path.split() to get a tuple value 
# with these two strings
calcFilePath = 'C:\\Windows\\System32\\calc.exe'

In [14]:
calcFilePath

'C:\\Windows\\System32\\calc.exe'

In [15]:
os.path.split(calcFilePath)

('C:\\Windows\\System32', 'calc.exe')

In [16]:
# You can create the same tuple by calling os.path.dirname() and os.path.basename() and placing the value in a tuple
(os.path.dirname(calcFilePath), os.path.basename(calcFilePath))

('C:\\Windows\\System32', 'calc.exe')

In [17]:
# split() is a nice shortcut if you need both values
# os.path.split() = (os.path.dirname(), os.path.basename())

Also, note that os.path.split() does not take a file path and return a list of strings of each folder. For that, use the split() string method and split on the string in os.sep. Recall from earlier that the os.sep variable is set to the correct folder-separating slash for the computer running the program.

In [19]:
# Split string will work and return a list on each of the path
calcFilePath.split(os.path.sep) # passing argument os.path.sep

['C:', 'Windows', 'System32', 'calc.exe']

### Finding File Sizes and Folder Contents

Once you have ways of handling file paths, you can then start gathering information about specific files and folders. The os.path module provides functions for finding the size of a file in bytes and the files and folders inside a given folder.


    Calling os.path.getsize(path) will return the size in bytes of the file in the path argument.

    Calling os.listdir(path) will return a list of filename strings for each file in the path argument. (Note that this function is in the os module, not os.path.)



In [21]:
# Get file size: os.path.getsize()
os.getcwd()

'C:\\Users\\David Ly\\Documents\\Programming\\Python\\Automate_The_Boring_Stuff'

In [22]:
os.path.getsize('Chapter_8.ipynb') # 9028 bytes in size

9028

In [23]:
# Get list of file names in path (current working directory): os.listdir(path)
os.listdir(os.getcwd())

['.ipynb_checkpoints',
 'Chapter_1.ipynb',
 'Chapter_2.ipynb',
 'Chapter_3.ipynb',
 'Chapter_4.ipynb',
 'Chapter_5.ipynb',
 'Chapter_6.ipynb',
 'Chapter_7.ipynb',
 'Chapter_8.ipynb',
 'Projects']

In [33]:
# Find the total size of all files in the directory
totalSize = 0

# Loop through each file to get each size
for filename in os.listdir(os.getcwd()):
    # Step 1 get file names, get file sizes
    print(filename, ': ', os.path.getsize(filename), ' bytes')
    # Step 2 add sizes together
    totalSize = totalSize + os.path.getsize(filename)

# Show total size
print('Total Directory File Size: ', totalSize)

.ipynb_checkpoints :  4096  bytes
Chapter_1.ipynb :  9358  bytes
Chapter_2.ipynb :  12339  bytes
Chapter_3.ipynb :  22547  bytes
Chapter_4.ipynb :  55143  bytes
Chapter_5.ipynb :  41900  bytes
Chapter_6.ipynb :  39535  bytes
Chapter_7.ipynb :  79345  bytes
Chapter_8.ipynb :  11406  bytes
Projects :  4096  bytes
Total Directory File Size:  279765


### Checking Path Validity

Many Python functions will crash with an error if you supply them with a path that does not exist. The os.path module provides functions to check whether a given path exists and whether it is a file or folder.

    Calling os.path.exists(path) will return True if the file or folder referred to in the argument exists and will return False if it does not exist.

    Calling os.path.isfile(path) will return True if the path argument exists and is a file and will return False otherwise.

    Calling os.path.isdir(path) will return True if the path argument exists and is a folder and will return False otherwise.





In [34]:
os.path.exists('C:\\Windows')

True

In [35]:
os.path.exists('C:\\Random')

False

You can determine whether there is a DVD or flash drive currently attached to the computer by checking for it with the os.path.exists() function. For instance, if I wanted to check for a flash drive with the volume named D:\ on my Windows computer, I could do that with the following:

In [36]:
os.path.exists('D:\\')

True

### The File Reading/Writing Process

Once you are comfortable working with folders and relative paths, you’ll be able to specify the location of files to read and write. The functions covered in the next few sections will apply to plaintext files. Plaintext files contain only basic text characters and do not include font, size, or color information. Text files with the .txt extension or Python script files with the .py extension are examples of plaintext files. These can be opened with Windows’s Notepad or OS X’s TextEdit application. Your programs can easily read the contents of plaintext files and treat them as an ordinary string value.

Binary files are all other file types, such as word processing documents, PDFs, images, spreadsheets, and executable programs.

#### There are three steps to reading or writing files in Python.

    Call the open() function to return a File object.

    Call the read() or write() method on the File object.

    Close the file by calling the close() method on the File object.



### Opening Files with the open() Function

To open a file with the open() function, you pass it a string path indicating the file you want to open; it can be either an absolute or relative path. The open() function returns a File object.

Try it by creating a text file named hello.txt using Notepad or TextEdit. Type Hello world! as the content of this text file and save it in your user home folder. 

In [37]:
# Open file with open()
helloFile = open('C:\\Users\\Folder\\text.txt')

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Folder\\text.txt'

### Writing to Files

Python allows you to write content to a file in a way similar to how the print() function “writes” strings to the screen. You can’t write to a file you’ve opened in read mode, though. Instead, you need to open it in “write plaintext” mode or “append plaintext” mode, or write mode and append mode for short.

Write mode will overwrite the existing file and start from scratch, just like when you overwrite a variable’s value with a new value. Pass 'w' as the second argument to open() to open the file in write mode. Append mode, on the other hand, will append text to the end of the existing file. You can think of this as appending to a list in a variable, rather than overwriting the variable altogether. Pass 'a' as the second argument to open() to open the file in append mode.

If the filename passed to open() does not exist, both write and append mode will create a new, blank file. After reading or writing a file, call the close() method before opening the file again.

In [39]:
# Open the file in write mode
baconFile = open('bacon.txt', 'w')

# Write and pass the string argument 'Hello world' writes the string to the file and returns the characters
baconFile.write('Hello worlddddddddddddd!\n') # 25 characters

25

In [43]:
# Close the file after opening
baconFile.close()

# Open the file  in append mode
baconFile = open('bacon.txt', 'a')

# After opening it in append mode, write a new sentence
baconFile.write('Bacon is not a vegetable.')

# Close again
baconFile.close()

In [44]:
# Open the file again and store the read variable to a new variable then close and call the new variable
baconFile = open('bacon.txt')
content = baconFile.read()
baconFile.close()
print(content)

Hello worlddddddddddddd!
Bacon is not a vegetable.Bacon is not a vegetable.


### Saving Variables with the shelve Module

You can save variables in your Python programs to binary shelf files using the shelve module. This way, your program can restore data to variables from the hard drive. The shelve module will let you add Save and Open features to your program. For example, if you ran a program and entered some configuration settings, you could save those settings to a shelf file and then have the program load them the next time it is run.

In [45]:
import shelve

# Call shelve.open() and pass it a file name and store it the returned shelf value in a variable
shelfFile = shelve.open('mydata')

# Create a list cats
cats = ['Zophie', 'Pooka', 'Simon']

# Write shelfFile['cats'] = cats to store the list
shelfFile['cats'] = cats

# Close the file
shelfFile.close()

In [46]:
os.getcwd()

'C:\\Users\\David Ly\\Documents\\Programming\\Python\\Automate_The_Boring_Stuff'

In [48]:
# Change paths
os.chdir(path)

# os.chdir("\\Projects\\Chapter 08")

NameError: name 'path' is not defined