### Reading and Writing Files

Variables are a fine way to store data while your program is running, but if you want your data to persist even after your program has finished, you need to save it to a file. You can think of a file’s contents as a single string value, potentially gigabytes in size. In this chapter, you will learn how to use Python to create, read, and save files on the hard drive.

### Files and File Paths

A file has two key properties: a filename (usually written as one word) and a path. The path specifies the location of a file on the computer. For example, there is a file on my Windows 7 laptop with the filename project.docx in the path C:\Users\asweigart\Documents. The part of the filename after the last period is called the file’s extension and tells you a file’s type. project.docx is a Word document, and Users, asweigart, and Documents all refer to folders (also called directories). Folders can contain files and other folders

In [1]:
import os

In [2]:
os.path.join('usr', 'bin', 'spam') # double blackslash to be escaped

'usr\\bin\\spam'

On Windows, paths are written using backslashes (\) as the separator between folder names. OS X and Linux, however, use the forward slash (/) as their path separator. If you want your programs to work on all operating systems, you will have to write your Python scripts to handle both cases.

In [3]:
# Joins names from a list of filenames to the end of a folder’s name
myFiles = ['accounts.txt', 'details.csv', 'invite.docx']
for filename in myFiles:
    print(os.path.join('C:\\Users\\asweigart', filename))

C:\Users\asweigart\accounts.txt
C:\Users\asweigart\details.csv
C:\Users\asweigart\invite.docx


### The Current Working Directory

Every program that runs on your computer has a current working directory, or cwd. Any filenames or paths that do not begin with the root folder are assumed to be under the current working directory. You can get the current working directory as a string value with the os.getcwd() function and change it with os.chdir(). 

In [9]:
# Get current working directory
# os.getcwd()

### Absolute vs. Relative Paths

There are two ways to specify a file path.

    An absolute path, which always begins with the root folder

    A relative path, which is relative to the program’s current working directory

There are also the dot (.) and dot-dot (..) folders. These are not real folders but special names that can be used in a path. A single period (“dot”) for a folder name is shorthand for “this directory.” Two periods (“dot-dot”) means “the parent folder.”

### Creating New Folders with os.makedirs()

Your programs can create new folders (directories) with the os.makedirs() function. Enter the following into the interactive shell:

import os

os.makedirs('C:\\delicious\\walnut\\waffles')

In [12]:
os.path.abspath(' .')
os.path.isabs(os.path.abspath('.'))

True

In [11]:
# Handling absolute and relative paths
os.path.abspath('.')
os.path.abspath('.\\Scripts')
os.path.isabs(os.path.abspath('.'))

True

In [13]:
# If you need a path’s dir name and base name together, you can just call os.path.split() to get a tuple value 
# with these two strings
calcFilePath = 'C:\\Windows\\System32\\calc.exe'

In [14]:
calcFilePath

'C:\\Windows\\System32\\calc.exe'

In [15]:
os.path.split(calcFilePath)

('C:\\Windows\\System32', 'calc.exe')

In [16]:
# You can create the same tuple by calling os.path.dirname() and os.path.basename() and placing the value in a tuple
(os.path.dirname(calcFilePath), os.path.basename(calcFilePath))

('C:\\Windows\\System32', 'calc.exe')

In [17]:
# split() is a nice shortcut if you need both values
# os.path.split() = (os.path.dirname(), os.path.basename())

Also, note that os.path.split() does not take a file path and return a list of strings of each folder. For that, use the split() string method and split on the string in os.sep. Recall from earlier that the os.sep variable is set to the correct folder-separating slash for the computer running the program.

In [19]:
# Split string will work and return a list on each of the path
calcFilePath.split(os.path.sep) # passing argument os.path.sep

['C:', 'Windows', 'System32', 'calc.exe']

### Finding File Sizes and Folder Contents

Once you have ways of handling file paths, you can then start gathering information about specific files and folders. The os.path module provides functions for finding the size of a file in bytes and the files and folders inside a given folder.


    Calling os.path.getsize(path) will return the size in bytes of the file in the path argument.

    Calling os.listdir(path) will return a list of filename strings for each file in the path argument. (Note that this function is in the os module, not os.path.)



In [21]:
# Get file size: os.path.getsize()
os.getcwd()

'C:\\Users\\David Ly\\Documents\\Programming\\Python\\Automate_The_Boring_Stuff'

In [22]:
os.path.getsize('Chapter_8.ipynb') # 9028 bytes in size

9028

In [23]:
# Get list of file names in path (current working directory): os.listdir(path)
os.listdir(os.getcwd())

['.ipynb_checkpoints',
 'Chapter_1.ipynb',
 'Chapter_2.ipynb',
 'Chapter_3.ipynb',
 'Chapter_4.ipynb',
 'Chapter_5.ipynb',
 'Chapter_6.ipynb',
 'Chapter_7.ipynb',
 'Chapter_8.ipynb',
 'Projects']

In [33]:
# Find the total size of all files in the directory
totalSize = 0
for filename in os.listdir(os.getcwd()):
    # Step 1 get file names, get file sizes
    print(filename, ': ', os.path.getsize(filename), ' bytes')
    # Step 2 add sizes together
    totalSize = totalSize + os.path.getsize(filename)

# Show total size
print('Total Directory File Size: ', totalSize)

.ipynb_checkpoints :  4096  bytes
Chapter_1.ipynb :  9358  bytes
Chapter_2.ipynb :  12339  bytes
Chapter_3.ipynb :  22547  bytes
Chapter_4.ipynb :  55143  bytes
Chapter_5.ipynb :  41900  bytes
Chapter_6.ipynb :  39535  bytes
Chapter_7.ipynb :  79345  bytes
Chapter_8.ipynb :  11406  bytes
Projects :  4096  bytes
Total Directory File Size:  279765
