# File Handling
- A file has two parts.  The filename and the path.
- **Root directory**--base directory.  C: drive on Windows.
- **Current working directory (CWD)**--active directory.  Usually this is the folder that contains the file we are currently working with, but we can change the CWD.
- **Absolute path**--path that starts with the root directory
- **Relative path**--path that starts with (written relative to) the current working directory
- When we work with files we normally write relative paths instead of absolute paths.  This allows the code to work on other users's computers.  More "portable".
- **Parent directory**--folder that contains the folder or file we are talking about.
- **`..\`** or **`../`** refers to the parent folder
- **`.\`** or **`./`** refers to the current working directory.  This is often omitted.
- Files can have the same name as long as they are in different folders

![](images/directories.jpg)

- **Home directory**--term for a users folder.  Contains personal files.  This is `C:/Users/<USERNAME>` on Windows.  Even on a shared computer, users have access to their home directory.  This is often a good place to save files.
- Note that paths on Windows use a `\`, while paths on Linux and Mac use `/`.  This is a problem because:
    1. Paths are normally stored as strings.  `\` with certain characters create escape characters.
    1. We want our code to be portable across operating systems
- There are a couple of solutions if we want our code to work on ALL operating systems
    1. Always use forward slashes when writing Python code.  Python 3 on Windows allows us to use forward slashes.
    1. Use the `pathlib` module.  `pathlib` path objects change the slashes based on the operating system.  There may be some Windows applications that still require paths with backslashes.
- There are a couple of solutions that work ONLY on the WINDOWS operating system
    1. Use two backslashes as an escape character
    1. Use raw strings
- Note that folder and file names are not case sensitive on Windows and Mac, but they are on Linux
- **Symlinks**--symbolic link.  Contains a text string that is automatically interpreted and followed by the operating system as a path to another file or directory. This other file or directory is called the "target".  A symlink is similar to a desktop shortcut because it points the OS to another folder or file.

- **Reading, writing, and file handling are common sources of error**
    1. Folder names and file names may be changed
    1. Folders or files may be moved
    1. Absolute paths normally break on another user's computer
    1. The CWD may not always be what we expect 
    1. Certain functions return errors if a folder or file already exists
    1. Certain functions return errors if a folder or file does not exist yet
 - For this reason, reading, writing, and file handling are often performed within try and except statements
- One example of file path error came when the same script was run in an IDE and the terminal.
    - The script used `./output/<FILENAME.EXT>`, which created a new file at a file path relative to the CWD
    - When run in an IDE (like Jupyter Lab), the CWD was the script's parent folder as expected.
    - When run in the terminal, the CWD was the home folder and caused an error because no output folder existed within the home folder.

- **It seems to me that BEST PRACTICE would be to:**
    1. Perform read, write, and file handling within try except statements
    1. Use `pathlib` to create path objects so paths work on all operating systems
    1. Construct paths relative to script being run, w/OUT relying on the CWD.  This can be done using `__file__`.  `__file__` returns the absolute file path of the script.  From there we can access the parent folder of the script using `.parent` and specify other folders and files if needed.  Using `__file__` is a more reliable way to write a path relative to the script's location.
        - Twist.   `__file__` strangely does not exist in Jupyter Lab.  However we always run our .ipynb notebook file from the Jupyter Lab IDE (never the terminal) so file paths relative using the CWD should work fine. 
        - Alternatively, there will be times we want to prompt the user to input paths or filenames.  Inputting desired directories manually is another way to avoid error.

- We'll cover a few different modules:
    1. `pathlib` module.  Newer module for file handling that should be used in place of most functions in  the `os` module.
    1. `os` module.  Contains sub-module `os.path`.  Basic functions.
    1. `shutil` module.  Shell utility. Contains a few functions not found in the `pathlib` module like copy + paste.
    1. `send2trash` library.  Not in Python Standard Library.  Deletes folders and files by sending them to trash (not permanent).
    1. `zip` module.  Create and extract from zip files.

---

## Pathlib

- The `pathlib` module creates and works with path objects
- Path objects are like fancy strings that contain the path.  We can call methods on the path objects.
- Whether the path object uses forward or backward slashes depends on whether the path object was created on a Windows operating system or a Unix-like operating system (Mac + Linux)
- Path objects can be used in `os` and `shutil` module functions where the input is a path string

![](images/path_object.jpg)

- **Glob pattern**--like simplified form of regular expressions often used in command line commands

Glob | RegEx Equivalent | Use
--- | --- | ---
`*` | `.*` | Matches any number of any characters including none
`?` | `.` | Matches any single character
`[abc]` | Same | Matches any single character in brackets
`[a-c]` | Same | Matches any single character in brackets

Code | Use
--- | ---
`pathlib` | Module
`Path()` | Capital P. Conventional to `from pathlib import Path`.  Pass individual folder name(s) and file name.  Return a path object that is a path with the correct slashes for the operating system. 
`Path.home()` | Returns path object of home directory
`Path.cwd()` | Returns path object of current working directory
`.exists` | Path object method.  Returns True or False based on whether path actually exists on the computer.  Helps prevent errors when creating, copy, moving, and deleting folders and files.
`.mkdir()` | Path object method.  Create new directory from path object. Can only make one directory at a time, unlike `os.makedirs()` .  Returns error if folder already exists.
`.glob()` |  Path object method.  List contents of a folder according to glob pattern.  Can show everything in a directory by using `'*'` as glob pattern.  Format is `<PATH_OBJECT>.glob('<GLOB_PATTERN>')`.
`.write_text()` | Path object method.   Creates new text file with text specified.  CAUTION! Overwrites text if file already exists.  Number returned is characters of text in file.  Closes file automatically.  Less functionality than normal `open()`, `w` seen below in section *Reading and Writing Files*.
`.read_text()` | Path object method.  Returns string of the full contents of a text file.  Closes file automatically  Less functionality than normal `open()`, `r` seen below in section *Reading and Writing Files*.  Returns error if file does not exist.

---

**EXAMPLES**

In [1]:
from pathlib import Path

**`Path()` and Attributes**

In [2]:
# Create path object with a single string with forward slashes

path_object = Path('folder1/folder2/filename.ext')
print(type(path_object))
print(path_object)

<class 'pathlib.WindowsPath'>
folder1\folder2\filename.ext


In [3]:
# Create path object with a raw string with backslashes

path_object = Path(r'folder1\folder2\filename.ext')
print(path_object)

folder1\folder2\filename.ext


In [4]:
# Create path object with a sequence of strings

path_object = Path('folder1', 'folder2', 'filename.ext')
print(path_object)

folder1\folder2\filename.ext


In [5]:
# Join paths

path_object = Path('folder1', 'folder2', 'filename.ext')

# This fails.  One of the two leftmost values must be a path object.
# path_object = 'C:' / 'folder0' / path_object 

# This works, but must include a slash in string
path_object = 'C:/' / path_object
print(path_object)

C:\folder1\folder2\filename.ext


In [6]:
print(f'Drive: {path_object.drive}')
print(f'Anchor: {path_object.anchor}')
print(f'Parent: {path_object.parent}')
print(f'Name: {path_object.name}')
print(f'Stem: {path_object.stem}')
print(f'Suffix: {path_object.suffix}')

Drive: C:
Anchor: C:\
Parent: C:\folder1\folder2
Name: filename.ext
Stem: filename
Suffix: .ext


**`home()`**

In [7]:
po_home = Path.home()
print(type(po_home))
print(po_home)

<class 'pathlib.WindowsPath'>
C:\Users\ChrisAttias


**`cwd()`**

In [8]:
po_cwd = Path.cwd()
print(po_cwd)

C:\Users\ChrisAttias\OneDrive\ChrisOneDrive\Python\python_language_notes


**`.glob()`**

In [9]:
po_cwd = Path.cwd()
glob_pattern = '*.ipynb'  
glob = po_cwd.glob(glob_pattern)
for thing in glob:
    print(thing)

C:\Users\ChrisAttias\OneDrive\ChrisOneDrive\Python\python_language_notes\00_introduction.ipynb
C:\Users\ChrisAttias\OneDrive\ChrisOneDrive\Python\python_language_notes\01_python_programming_fundamentals.ipynb
C:\Users\ChrisAttias\OneDrive\ChrisOneDrive\Python\python_language_notes\02_references_and_copies.ipynb
C:\Users\ChrisAttias\OneDrive\ChrisOneDrive\Python\python_language_notes\03_errors_and_debugging.ipynb
C:\Users\ChrisAttias\OneDrive\ChrisOneDrive\Python\python_language_notes\04_input_validation.ipynb
C:\Users\ChrisAttias\OneDrive\ChrisOneDrive\Python\python_language_notes\05_regular_expressions.ipynb
C:\Users\ChrisAttias\OneDrive\ChrisOneDrive\Python\python_language_notes\06_numeric_data.ipynb
C:\Users\ChrisAttias\OneDrive\ChrisOneDrive\Python\python_language_notes\07_data_science.ipynb
C:\Users\ChrisAttias\OneDrive\ChrisOneDrive\Python\python_language_notes\08_data_visualization.ipynb
C:\Users\ChrisAttias\OneDrive\ChrisOneDrive\Python\python_language_notes\09_dates_and_times.

**`.exists()`**

In [10]:
po_cwd = Path.cwd()
print(po_cwd.exists())

path_object = Path('folder1', 'folder2', 'filename.ext')
print(path_object.exists())

True
False


**`.mkdir()` or `os.makedirs()`**

In [11]:
Path('./output/spam').mkdir()
#os.makedirs('./output/spam')    # Could also use os.makedirs()

**`.write_text()`**

In [12]:
po_spam_text_file = Path('output/spam/beans.txt')
po_spam_text_file.write_text("This text is overwriting previous text.")

39

**`.read_text()`**

In [13]:
po_spam_text_file.read_text()

'This text is overwriting previous text.'

---

## Os

Code | Use
--- | ---
`os` | Module
`os.listdir()` | Returns list of folders within specified folder
`os.chdir()` | Change current working directory to specified path.  Returns error if path does not exist.
`os.makedirs()` | Creates new folder.  Returns error if folder already exists.  Creates all intermediate folders in path specified if they do not already exist.
`os.rmdir()` | Stands for remove directory.  CAUTION! Permanent deletion. Does NOT send to recycle bin.  Returns error if folder does not exist.  Returns error if folder not empty.
`os.remove()` | Delete file.  CAUTION! Permanent deletion. Does NOT send to recycle bin.  Returns error if file does not exist.  
`os.unlink()` | Same as `os.remove()`.  Unlink is older name for delete.
`os.walk()` | Walks through file path tree.  Returns folder name of the folder `walk()` is currently looking at, list of subfolders within that current folder, and list of filenames within that current folder.  It then "walks" down to each subfolder, makes that that the current folder, and repeats the process.  Results are used in for loop.  Good way to go through each subfolder and file.  Can print names out, and make edits with conditional statements.

---

**EXAMPLES**

In [14]:
import os

**`chdir()`**

In [15]:
po_cwd = Path.cwd()
os.chdir('./')  # Stay in same folder
os.chdir('./images')  # Change cwd with relative file path. './' optional.
os.chdir('../')  # Change to parent folder
os.chdir('C:')  # Change to C drive using absolute file path
os.chdir(po_cwd)  # Change back to original cwd
print(Path.cwd())

C:\Users\ChrisAttias\OneDrive\ChrisOneDrive\Python\python_language_notes


**`walk()`**

In [16]:
for folder_name, sub_folders, file_names in os.walk(Path('./output')):
    print(f'The folder name is {folder_name}.')
    print(f'The subfolders in {folder_name} are {str(sub_folders)}.')
    print(f'The filenames in {folder_name} are {str(file_names)}.')

The folder name is output.
The subfolders in output are ['spam', 'spam_extra_folder'].
The filenames in output are ['log_records.txt', 'spam.txt', 'spam.zip'].
The folder name is output\spam.
The subfolders in output\spam are [].
The filenames in output\spam are ['beans.txt'].
The folder name is output\spam_extra_folder.
The subfolders in output\spam_extra_folder are [].
The filenames in output\spam_extra_folder are ['spam.txt'].


---

## Shutil

Code | Use
--- | ---
`shutil` | Module
`shutil.copy()` | Copy + paste file to desired folder.  Arguments are `shutil.copy(<SOUCRE_PATH>/<FILENAME.EXT>, <DESTINATION_PATH>)`.  Optionally, change copy file name by including a new file name in the destination path.  Returns error if file does not exist.  CAUTION!.  If there is a file with the same name in the new location it overwrites it.
`shutil.move()` | Very similar `.copy()`, but move file (w/OUT copying) to desired folder.  Additional CAUTION!  if move can't find the last folder in the destination path, then move assumes that last folder is actually a new file name and renames the moved file.  Lastly, note that there is no "rename" function in `shutil`, but we could move file to the same folder with a new file name.  This "renames" the file.
`shutil.copytree()` | Copy + paste tree (folder and all folders and files within it) to desired folder.  Returns error if folder already exists in new location.
`shutil.rmtree()` | Deletes entire tree (folder and all folders and files within it).  CAUTION! Permanent deletion. Does NOT send to recycle bin.  Returns error if folder does not exist.

---

**EXAMPLES**

In [17]:
import shutil

**`move()`  Rename**

In [18]:
shutil.move('./output/spam/beans.txt', './output/spam/spam.txt')

'./output/spam/spam.txt'

**`copy()`**

In [19]:
shutil.copy('./output/spam/spam.txt', './output/spam/extra_spam.txt')

'./output/spam/extra_spam.txt'

**`copytree()`**

In [20]:
shutil.copytree('./output/spam', './output/spam_backup')

'./output/spam_backup'

---

## Send2trash

Code | Use
--- | ---
`send2trash` | Module
`send2trash.send2trash()` | Deletes files, folders, and trees by sending them to the Recycle Bin.  These can be restored if needed.  Returns error if folder or file does not exist.

**`send2trash()`**

---

**EXAMPLES**

In [21]:
from send2trash import send2trash

In [22]:
send2trash('./output/spam/spam.txt')  # Delete 1 file
send2trash(['./output/spam_backup/spam.txt', './output/spam_backup/extra_spam.txt'])  # Delete list of files
send2trash('./output/spam_backup')  # Delete empty folder
send2trash('./output/spam')  # Delete folder with file inside
print('No more spam : (')

No more spam : (


---

## Zipfile
- Python scripts can create and open (extract) ZIP files using functions in the zipfile module

Code | Use
--- | ---
`zipfile` | Module
`zipfile.ZipFile()` | Create zip object.  Similar to file handle/file object.  First argument is zip file path, which accepts a string or a path object.  E.g. `<PATH>/<FILENAME>.zip`.  Second argument is mode.
`r` | Read mode.  Default mode.  Allows a later specified file to be read.
`.namelist()` | Zip object method.  Returns list of compressed folders and files within zip file.
`.getinfo()` | Zip object method.  Returns ZipInfo object.  Argument is folder or file name within zip file.
`.file_size` | ZipInfo attribute.  Returns folder or file size when un-compressesed.
`.compress_size` | ZipInfo attribute.  Returns folder or file size when compressed within zip file.
`.extractall()` | Zip object method.  Extract all folders and files from zip file.  Argument is path where the extracted files will be placed.
`.extract()` | Zip object method.  Extract single folder or file from zip file.  First argument is the filename we want to extract.  Second argument is path where the extracted file will be placed.
`w` | Overwrite mode.  Allows a later specified file to be written to the zip file, overwriting any current contents of the zip file.  CAUTION! Overwrite.
`a` | Append mode.  Allows a later specified file to be written to the zip file, w/OUT overwriting (simply adding to).
`.write()` | Zip object method.  Add specified file to be compressed as zip file.  The first argument is the filename, `<FILENAME.EXT>`.  Do NOT include a path for the file to be zipped.  Those path folders become part of zip.  Only include filename.  For this reason we will likely need to change directory to parent folder of file we intend to compress.   The second argument is the compression method.  If unsure, go with with `compress_type=zipfile.ZIP_DEFLATED`.
`x` | Create mode.  Creates empty zip file.
`.close()` | Zip object method.  Close zip object.
`with zipfile.ZipFile() as <ZO>` | *Context Manager* combines open and close.  It automatically closes the file when we are done with it.  It knows we are done when we de-indent.  It also closes a file correctly if there is an exception at some point in the file handling.  Using the context manager is recommended.

---

**EXAMPLES**

In [23]:
import zipfile

**`ZipFile()` `w` Overwrite Mode**

In [24]:
# Change CWD.  
os.chdir('./output')

# Create text file we want to zip
po_spam_text_file = Path('spam.txt')
po_spam_text_file.write_text("This text is overwriting previous text")

# Create zip object
zip_object = zipfile.ZipFile('spam.zip', 'w')

# Do NOT include a path for the file to be compressed
# Those path folders become part of zip
# Only include filename
zip_object.write('spam.txt', compress_type=zipfile.ZIP_DEFLATED)

# Close zip object
zip_object.close()

# Change CWD back to original location
os.chdir('../')

**`ZipFile()` `r` Read Mode**

In [25]:
# Create zip object
po_zip = Path('./output/spam.zip')
zip_object = zipfile.ZipFile(po_zip, 'r')

# Get info about files within zip
print(zip_object.namelist())
zip_info_object = zip_object.getinfo('spam.txt')
print(zip_info_object.file_size)
print(zip_info_object.compress_size)

# Create new folder and extract compressed files to new folder
zip_object.extractall('./output/spam_extra_folder')  

# Close zip object
zip_object.close()

['spam.txt']
38
34


---