# <center>LECTURE OVERVIEW </center>

---

## By the end of the lecture, you'll be able to:
- interact with your system using the `os` module
- interact with your system using the `shutil` module
- interact with your system using the `pathlib` module

NOTE: there won't be exercises during this lecture

# <center>SYSTEM INTERACTION</center>

---

**<center>Q: How many people know how to use command line interface?<center>**

Often times, you'll be doing tasks that are operating system (OS) dependent. For instance, you may need to navigate your file system or create, delete, and modify files and/or folders. Before we dive into how Python can automatically perform this OS tasks, a quick primer.

# OS Primer

A **shell** is a computer program that interacts with your OS, also called a **command line interface**, that allows you to control your computer using commands entered into the interface via keyboard rather than a graphical user interface (GUI) with a mouse/keyboard combination. Each shell is a bit difference depending on your operating system. For instance, the commands used in Unix based shells (i.e., Mac OSX and Linux) are different than the commands used in Windows based shells, with some slight similarities.

At a high-level here are some useful commands:

| Unix | Windows | Description |
|:------:|:-------:|-------------|
| `pwd` | `Get-Location` | return the current working directory |
| `cd` | `Set-Location` | change directory |
| `ls` | `Get-ChildItem` | list directory contents |
| `mkdir` | `New-Item -ItemType Directory` | make directory |
| `rm` | `Remove-Item` | remove directory entries |
| `cp` | `Copy-Item` | copy files |
| `man` | `Get-Help` | manual pages |

Conveniently, Powershell has aliases that map to most of the Unix commands.

**<center>DEMO TERMINAL</center>**
    
[Here](https://devhints.io/bash) is a Unix Bash shell cheat sheet.

To learn more about the Unix Bash shell, check [this](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html) out.

[Here](http://ramblingcookiemonster.github.io/images/Cheat-Sheets/powershell-basic-cheat-sheet2.pdf) is a Windows Powershell cheat sheet.

To learn more about the Windows PowerShell, check [this](https://docs.microsoft.com/en-us/powershell) out.

# The `os` Module

## <font color='LIGHTGRAY'>By the end of the lecture, you'll be able to:</font>
- **interact with your system using the `os` module**
- <font color='LIGHTGRAY'>interact with your system using the shutil module</font>
- <font color='LIGHTGRAY'>interact with your system using the pathlib module</font>

Python allows developers to use several OS-dependent functionalities with the Python `os` module. This package abstracts the functionality previously discussed of your platform.

In [2]:
import os

## Some Basic Functions 

To demonstrate how vast this module let's take a look at the methods within the module:

In [3]:
print(dir(os))

['CLD_CONTINUED', 'CLD_DUMPED', 'CLD_EXITED', 'CLD_TRAPPED', 'DirEntry', 'EX_CANTCREAT', 'EX_CONFIG', 'EX_DATAERR', 'EX_IOERR', 'EX_NOHOST', 'EX_NOINPUT', 'EX_NOPERM', 'EX_NOUSER', 'EX_OK', 'EX_OSERR', 'EX_OSFILE', 'EX_PROTOCOL', 'EX_SOFTWARE', 'EX_TEMPFAIL', 'EX_UNAVAILABLE', 'EX_USAGE', 'F_LOCK', 'F_OK', 'F_TEST', 'F_TLOCK', 'F_ULOCK', 'MutableMapping', 'NGROUPS_MAX', 'O_ACCMODE', 'O_APPEND', 'O_ASYNC', 'O_CLOEXEC', 'O_CREAT', 'O_DIRECTORY', 'O_DSYNC', 'O_EXCL', 'O_EXLOCK', 'O_NDELAY', 'O_NOCTTY', 'O_NOFOLLOW', 'O_NONBLOCK', 'O_RDONLY', 'O_RDWR', 'O_SHLOCK', 'O_SYNC', 'O_TRUNC', 'O_WRONLY', 'POSIX_SPAWN_CLOSE', 'POSIX_SPAWN_DUP2', 'POSIX_SPAWN_OPEN', 'PRIO_PGRP', 'PRIO_PROCESS', 'PRIO_USER', 'P_ALL', 'P_NOWAIT', 'P_NOWAITO', 'P_PGID', 'P_PID', 'P_WAIT', 'PathLike', 'RTLD_GLOBAL', 'RTLD_LAZY', 'RTLD_LOCAL', 'RTLD_NODELETE', 'RTLD_NOLOAD', 'RTLD_NOW', 'R_OK', 'SCHED_FIFO', 'SCHED_OTHER', 'SCHED_RR', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'ST_NOSUID', 'ST_RDONLY', 'TMP_MAX', 'WCONTINUED', 

As you can see, there are tons of methods for interacting with the OS but we will only cover a fraction of them. 

For instance, we can use the 
```python
os.getcwd()
```
method to retrieve the path of the current working directory:

In [4]:
os.getcwd()

'/Users/mslivins/Projects/workshop-python-2021/week_3'

## List Folders and Files

To list the folders and files in the current directory, you can use the 
```python 
os.listdir()
```
method:

In [5]:
os.listdir()

['day_1_lab.ipynb',
 'day_4_assets',
 '.DS_Store',
 'day_3_lecture.ipynb',
 'day_4_lecture.ipynb',
 'day_2_assets',
 'day_3_lab.ipynb',
 'day_1_lecture.ipynb',
 'day_2_lecture.ipynb',
 '.ipynb_checkpoints',
 'day_2_lab.ipynb',
 'day_1_assets',
 'day_3_assets',
 'day_4_lab.ipynb']

The 
```python
os.walk(start_path, ...)
```
method generates the file names in a directory tree by walking the tree either top-down or bottom-up, from or to the `start_path`. 

For each directory in the tree rooted at `start_path`, it yields a 3-tuple `(dirpath, dirnames, filenames)`. 

For example, if we pass in the current working directory to `os.walk()` we get:

In [6]:
for dirpath, dirname, filename in os.walk(os.getcwd()):
    if ('.ipynb_checkpoints' not in dirpath):
        print(dirpath)
        print(dirname)
        print(filename)
        print()

/Users/mslivins/Projects/workshop-python-2021/week_3
['day_4_assets', 'day_2_assets', '.ipynb_checkpoints', 'day_1_assets', 'day_3_assets']
['day_1_lab.ipynb', '.DS_Store', 'day_3_lecture.ipynb', 'day_4_lecture.ipynb', 'day_3_lab.ipynb', 'day_1_lecture.ipynb', 'day_2_lecture.ipynb', 'day_2_lab.ipynb', 'day_4_lab.ipynb']

/Users/mslivins/Projects/workshop-python-2021/week_3/day_4_assets
['event_tables', 'lab', 'recipes', '.ipynb_checkpoints']
['.DS_Store', 'event_tables.zip', 'def_sqrt.pkl']

/Users/mslivins/Projects/workshop-python-2021/week_3/day_4_assets/event_tables
['.ipynb_checkpoints']
['bbq_event_2.csv', 'bbq_event_1.csv']

/Users/mslivins/Projects/workshop-python-2021/week_3/day_4_assets/lab
[]
['output.pkl', 'bbq_date_str.pkl']

/Users/mslivins/Projects/workshop-python-2021/week_3/day_4_assets/recipes
['.ipynb_checkpoints']
['pulled_pork_recipe.txt', 'cornbread_recipe.txt', 'smoked_mac_and_cheese_recipe.txt']

/Users/mslivins/Projects/workshop-python-2021/week_3/day_2_assets
[

To get the entire tree structure of this folder, lets write a function, `list_files()`, and use the `os.walk()` method to iterate over all the files in each folder of the current directory:

In [None]:
def list_files(start_path):
    for root, dirs, files in os.walk(start_path):
        if '.ipynb_checkpoints' not in root:                         # exclude hidden files/directories
            level = root.replace(start_path, '').count(os.sep)       # os.sep counts separate pathname components (e.g., '/') 
            indent = ' ' * 4 * (level)                               # create indentation based on path level
            print('{}{}/'.format(indent, os.path.basename(root)))    # os.path.basename(path) returns last directory in path
            subindent = ' ' * 4 * (level + 1)                        # create sub-indentation based on next level 
            for file in files:
                print('{}{}'.format(subindent, file))

We will then call this function using the current working directory path:

In [None]:
start_path = os.getcwd()
list_files(start_path)

## Change Working Directory

Let's navigate into the `day_3_assets` directory and see what's in there using the 
```python
os.chdir(path)
```
method:

In [None]:
os.chdir('day_3_assets')

In [None]:
list_files(os.getcwd())

# Creating Directories

Now, let's create a new directory called `vender_flyers` in this directory using the 
```python
os.mkdir(path, ...)
```
method:

In [None]:
os.mkdir('vender_flyers')

In [None]:
list_files(os.getcwd())

## **<font color='ORANGE'>Caution</font>**

Let's try to create a nested directory with `os.mkdir()`:

In [None]:
os.mkdir('past_bbq_events/2020')

Ah we get a `FileNotFoundError`! The reason we get this error is because the module is looking for a directory called `past_bbq_events` to create the directory `2020`. Since `past_bbq_events` does not exist, it throws a `FileNotFoundError`.

For situations like this, you will want to use the 
```python
os.mkdirs(path, ...)
```
method instead, which creates multiple directories recursively:

In [None]:
os.makedirs('past_bbq_events/2020')

In [None]:
list_files(os.getcwd())

## Removing Directories

To remove a directory we created, use the 
```python
os.rmdir(path, ...)
```
function:

In [None]:
os.rmdir('vender_flyers')

In [None]:
list_files(os.getcwd())

## **<font color='ORANGE'>Caution</font>**

Let's try and delete the nested directory structure of `past_bbq_events` and `2020` using `os.rmdir()`:

In [None]:
os.rmdir('past_bbq_events')

Another error! But this time it throws a `OSError` saying that `past_bbq_events` is not empty which is correct because it has `2020` underneath it. With the `os.rmdir()` method is is not possible to remove a non-empty directory (similar to the Unix command-line version).

Just like the `os.makedirs()` method, let's try 
```python
os.removedirs(path)
```
, which recursively removes directories in a tree structure:

In [None]:
os.removedirs('past_bbq_events/2020')

In [None]:
list_files(os.getcwd())

## Example with Data Processing

As of now, we have explored how to view, create, and remove a nested directory structure. Now let's see an example of how the `os` module helps with data processing.

For that, let's dive into the `event_tables` directory:

In [None]:
os.chdir('event_tables')

In [None]:
list_files(os.getcwd())

Let's merge the data from all of the events into a single CSV file:

In [None]:
import csv

# create a list to hold the data for each entry
entry_lst = []

# loop through all files and add CSV data to the list
for root, dirs, files in os.walk(os.getcwd()):
    if '.ipynb_checkpoints' not in root:
        for file in files:
            with open(file) as f:
                entry_lst += list(csv.DictReader(f))

# create a new CSV file with merged data
with open('bbq_event_merged.csv', 'w') as bbq_file:
    fieldnames = list(entry_lst[0].keys())
    bbq_writer = csv.DictWriter(bbq_file, fieldnames=fieldnames)
    
    bbq_writer.writeheader()
    for entry in entry_lst:
        bbq_writer.writerow(dict(entry))

In [None]:
list_files(os.getcwd())

Now, we will reset the current directory to the root week directory along with some misc. cleanup:

In [None]:
os.remove("bbq_event_merged.csv")
os.chdir('../..')
print(os.getcwd())

The `os` module has a lot more to offer. To learn more, check out the [documentation](https://docs.python.org/3/library/os.html).

# The `shutil` Module

## <font color='LIGHTGRAY'>By the end of the lecture, you'll be able to:</font>
- <font color='LIGHTGRAY'>interact with your system using the os module</font>
- **interact with your system using the `shutil` module**
- <font color='LIGHTGRAY'>interact with your system using the pathlib module</font>

The `shutil` module enables develpers to operate with file objects easily and without having to dive into file objects a lot. For daily file and directory management tasks, the `shutil` module provides a higher level interface that is easier to use than `os` module for certain tasks.

In [None]:
import shutil

## Copying a File

Using the 
```python
shutil.copyfile(source, destination, ...)
```
function, you can easily copy a file an existing file, `source`, to a new file, `destination`, in the current directory:

In [None]:
os.chdir('day_3_assets')

print('BEFORE:')
list_files(os.getcwd())

shutil.copyfile('bbq_notification.txt', 'bbq_notification_copy.txt')

print()
print('AFTER:')
list_files(os.getcwd())

## Copying Files to Another Directory

Using the 
```python
shutil.copy(source, destination, ...)
```
function, we can easily copy a file to another directory:

In [None]:
os.mkdir('backups')

print('BEFORE:')
list_files(os.getcwd())

shutil.copy('bbq_notification_copy.txt', 'backups')

print()
print('AFTER:')
list_files(os.getcwd())

## Copying Files with Metadata

If you need to make an exact clone of the file, along with the permissions and the metadata of a file as well, can can use the
```python
shutil.copy2(source, destination, ...)
```
function.

**<center>NOTE: this might not completely work on all file systems.</center>**

Here, we create a function `file_metadata()` to display the file metadata then copy a file using `shutil.copy2()` where **only the Mode of the file and Modified date is preserved**:

In [None]:
import time

def file_metadata(file_name):
    stat_info = os.stat(file_name)
    print('  Mode    :', oct(stat_info.st_mode))
    print('  Created :', time.ctime(stat_info.st_ctime))
    print('  Accessed:', time.ctime(stat_info.st_atime))
    print('  Modified:', time.ctime(stat_info.st_mtime))
    
print("shutil.copy2()")
print("--------------")
print('SOURCE FILE:')
file_metadata('bbq_notification_copy.txt')

shutil.copy2('bbq_notification_copy.txt', 'backups')

print('\nDESTINTATION FILE:')
file_metadata('backups/bbq_notification_copy.txt')

print("\nshutil.copy()")
print("--------------")
print('SOURCE FILE:')
file_metadata('bbq_notification_copy.txt')

shutil.copy('bbq_notification_copy.txt', 'backups')

print()
print('DESTINTATION FILE:')
file_metadata('backups/bbq_notification_copy.txt')

On certain systems, the Created and Accessed time would match exactly.

## Replicating a Complete Directory

With the 
```python
shutil.copytree(source, destination, ...)
```
function, you can completely copy a directory tree recursively. In other words, if there are more directories inside a directory, that directory will be cloned as well:

In [None]:
print("BEFORE:")
list_files(os.getcwd())

shutil.copytree('backups', 'backups_backups')

print("AFTER:")
list_files(os.getcwd())

## **<font color='ORANGE'>Caution</font>**

The new directory **must not exist** before running this command. Otherwise, you will get a `FileExistsError`:

In [None]:
shutil.copytree('backups', 'backups_backups')

## Removing a Directory

You can remove a directory using the
```python
shutil.rmtree(path, ...)
```
function. There's no need to recursively remove files or close file handling connections:

In [None]:
print("BEFORE:")
list_files(os.getcwd())

shutil.rmtree('backups_backups')

print()
print("AFTER:")
list_files(os.getcwd())

## Monitoring Filesystem Space

Lastly, we can get some useful information about the storage of our file system by using the
```python
shutil.disk_usage(path)
```
function:

In [None]:
total_b, used_b, free_b = shutil.disk_usage('.')

gb = 10 ** 9

print('Total: {:6.2f} GB'.format(total_b / gb))
print('Used : {:6.2f} GB'.format(used_b / gb))
print('Free : {:6.2f} GB'.format(free_b / gb))

Some misc. cleanup:

In [None]:
shutil.rmtree('backups')
os.remove("bbq_notification_copy.txt")

For more on the `shutil` module, check out the [documentation](https://docs.python.org/3/library/shutil.html).

# The `pathlib` Module

## <font color='LIGHTGRAY'>By the end of the lecture, you'll be able to:</font>
- <font color='LIGHTGRAY'>interact with your system using the os module</font>
- <font color='LIGHTGRAY'>interact with your system using the shutil module</font>
- **interact with your system using the `pathlib` module**

The `pathlib` module in Python simplifies the way in working with files and folders through a range of classes, representing filesystem path semantics appropriate for different operating systems. Specifically, the `Path` class within the module lets developers **manipulate paths without worrying about the semantics of their OS**.

For instance, say you were trying to parse a file path in Windows and retrieve the last directory in `path`, you would do something like this:

In [None]:
def last_dir_parser(path):
    return path.split("\\")[-2]

last_dir_parser("C:\\Users\\Matt\\Documents\\file.txt")

Someone reading your parser might not string what you are trying to do and the parser is only realive to a Windows path. This difference can lead to hard-to-spot errors.

But you still might be asking youself, "why should I use `pathlib` over other Python modules that we've learned such as `os`?"

Here's why.

## Why use the `pathlib` Module?

Let's say we want to make a file called `notifications/parking_notification.txt` within the current directory that is not OS path dependent. First, we would have to create the correct path and to do this with the `os` module, we would use the
```python
os.path.join(dirpath, name)
```
function and `os.getcwd()` function:

In [None]:
outpath = os.path.join(os.getcwd(), 'notifications')
outpath_file = os.path.join(outpath, 'parking_notification.txt')
print(outpath_file)

Although this code works, it's hard to read and maintain. Imagine how this code would look if we wanted to create an new file inside multiple nested directories?

The same code can be re-written using the `Path` class from the `pathlib` module along with the 
```python
Path.cwd()
```
method that gets the current working directory:

In [None]:
from pathlib import Path

In [None]:
outpath = Path.cwd() / 'notifications' / 'parking_notification.txt'
print(outpath)

Clearly, this format is much easier to parse. The code above uses the `/` operator instead of `os.path.join()` to combine parts of the path into a compound path object.

Another benefit of using the `pathlib` method is that a `Path` object is created rather than creating a string representation of the path (which can be troublesome depending on your filesystem).

## Reading and Writing Files

Traditionally, we use the built-in `open()` function to read or write files in Python. This is still true as the `open()` function can use `Path` objects directly.

For example:

In [None]:
path = Path.cwd() / 'bbq_notification.txt'
with open(path, 'r') as f:
    print(f.readlines())

An equivalent alternate is to call
```python
Path.open(mode='r', ...)
```
on the `Path` object:

In [None]:
path = Path.cwd() / 'bbq_notification.txt'
with path.open() as f:
    print(f.readlines())

For simple reading and writing of files, there are a couple of convenience methods in the `pathlib` library:

- `.read_text()`: open the path in text mode and return the contents as a string.
- `.read_bytes()`: open the path in binary/bytes mode and return the contents as a bytestring.
- `.write_text()`: open the path and write string data to it.
- `.write_bytes()`: open the path in binary/bytes mode and write data to it.

For example:

In [None]:
path = Path.cwd() / 'bbq_notification.txt'
path.read_text()

## Picking Out Components of a Path

The different parts of a path are conveniently available as properties. Basic examples include:

- `.name`: the file name without any directory
- `.parent`: the directory containing the file, or the parent directory if path is a directory
- `.stem`: the file name without the suffix
- `.suffix`: the file extension
- `.anchor`: the part of the path before the directories

For example:

In [None]:
path = Path.cwd() / 'bbq_notification.txt'
path

In [None]:
path.name

In [None]:
path.stem

In [None]:
path.suffix

In [None]:
path.parent

In [None]:
path.parent.parent

In [None]:
path.anchor

Note that `.parent` returns a new `Path` object, whereas the other properties return strings. 

This means for instance that `.parent` can be chained as in the last example or even combined with `/` to create completely new paths:

In [None]:
path.parent.parent / ('new' + path.suffix)

For an excellent `pathlib` cheatsheet that visuallizes these representations and methods, check [this](https://github.com/chris1610/pbpython/blob/master/extras/Pathlib-Cheatsheet.pdf) out.

## Examples using `pathlib`

Here we will show some examples on how to use `pathlib` with some simple challenges.

### Counting Files

There are a few different ways to list many files. The simplest is the 
```python
Path.iterdir()
```
method, which iterates over all files in the given directory. The following example combines `Path.iterdir()` with the `collections.Counter` class to count how many files there are of each filetype in the current directory:

In [1]:
import collections

collections.Counter(p.suffix for p in Path.cwd().iterdir())

NameError: name 'Path' is not defined

### Display a Directory Tree

In this next example, we will define a function, `tree()`, (like our other function `list_files()`) that will print a visual tree representing the file hierarchy, rooted at a given `directory`.

In [None]:
def tree(directory):
    print(f'+ {directory}')
    for path in sorted(directory.rglob('*')):             # list subdirectories
        depth = len(path.relative_to(directory).parts)    # use .relative_to() to get how far we are from the root
        spacer = '    ' * depth
        print(f'{spacer}+ {path.name}')

In [None]:
tree(Path.cwd())

To read more about the `pathlib` module, check out the [documentation](https://docs.python.org/3/library/pathlib.html).

# Conclusion

## You are now able to:
- interact with your system using the `os` module
- interact with your system using the `shutil` module
- interact with your system using the `pathlib` module

# References
- https://mathieubuisson.github.io/powershell-linux-bash/
- https://devhints.io/bash
- https://stackabuse.com/introduction-to-python-os-module/
- https://docs.python.org/3/library/os.html
- https://www.journaldev.com/20536/python-shutil-module#download-the-source-code
- https://docs.python.org/3/library/shutil.html
- https://stackabuse.com/introduction-to-the-python-pathlib-module/
- https://realpython.com/python-pathlib/
