<a href="https://colab.research.google.com/github/KenzieAcademy/python-notebooks/blob/master/activity_python_os.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img height="100px" src="https://drive.google.com/uc?export=view&id=1Wv8etUe0EkGNpFe2n3lDLA3DsIFxEtAf" />

# Exploring Python's OS Library

In Python, one of the most heavily used features of the language is the ability to manipulate local files:  Reading, writing, listing, sizing, checking, removing .. you get the idea.  To do this, Python provides a Standard Libary module named `os` (lowercase of course).  There is also a submodule within `os` named `os.path`.  The functionality between these modules can be roughly stated as:
 - os.path : Working with pathnames (joining directories, filenames, extensions, basenames, absolute paths)
 - os : Doing things with pathnames (creating, removing, copying, etc)

As the name implies, `os` is Python's interface with your local operating system (MacOS, Linux, Windows).  Python’s `os` module does a little bit of everything; it’s sort of a junk drawer for system-related stuff.  In this activity we'll explore some of the capabilities of both of these standard libraries.

## References
- https://docs.python.org/3/library/os.html
- https://docs.python.org/3/library/os.path.html
- https://doughellmann.com/blog/the-python-3-standard-library-by-example/

# Traversing the File System
You might recall that different filesystems have different ways of representing paths.
 - Windows: `C:\Users\name`
 - Mac/Linux: `/home/name`
Notice that Windows uses the backslash `\` as a path delimiter.  MacOS/Linux uses a forward slash `/`.   The `os` module is able to navigate around in the local filesystem, regardless of what OS you are using.  

Be careful when constructing your own hard-coded pathnames
```python
folder = "home"
user = "daniel"
pathname = folder + "/" + user  # NO
pathname = os.path.join(folder, user)  # YES
```
The first form is "brittle" and will break if you try to run your Python program on a Windows machine.  By allowing Python to choose the correct path delimiter with the `os.path.join` function, the second form is now flexible and will work on all platforms.

# Exercises
In each of the exercises, replace any `???` with an appropriate function call from the os module.  Example:
```python
# list files
os.??? --> os.listdir()
```

In [0]:
# Before we begin the exercises, let's set up a handy function to print data.
import json
def print_data(title, data):
    print(f"{title}\n{json.dumps(data, indent=4)}")

## 1. Listing items: listdir, scandir, walk
Note the title uses the word "items".  This is because a directory may contain files as well as more directories.  So we call everything an 'item' in a directory.  This notebook may be running on your local machine if you copied it, but chances are it is running on a cloud server somewhere.  Lets have a look around:

In [0]:
import os

# Where are we?
cwd = os.getcwd()
print_data("Current directory:", cwd)

### Output from cell below should look like this:
```console
Current Dir Items:
[
    ".config",
    "sample_data"
]
```

In [0]:
# What files and/or directories are here?
file_list = ???  # Gets a list of items in the current directory
print_data("Current Dir Items:", file_list)

If more information is needed than just the names of the files, it is more efficient to use `scandir()` than `listdir()` because more information is collected in one system call when the directory is scanned.



In [0]:
# Get some more details about each item
for item in ???:
    if item.is_dir():
        typ = 'D'
    elif item.is_file():
        typ = 'F'
    elif item.is_symlink():
        typ = 'L'
    else:
        typ = '?'
    print(f'{item.name} {typ}')

### A deeper look
We can use a for-loop to probe into the `file_list` that was returned.  Output from cell below should look like this:
```console
Contents of .config:
[
    ".last_update_check.json",
    ".last_opt_in_prompt.yaml",
    "gce",
    "logs",
    ".last_survey_prompt.yaml",
    "active_config",
    "config_sentinel",
    "configurations",
    ".metricsUUID"
]
Contents of sample_data:
[
    "README.md",
    "anscombe.json",
    "california_housing_train.csv",
    "california_housing_test.csv",
    "mnist_train_small.csv",
    "mnist_test.csv"
]
```


In [0]:
# Let's go one level deeper: List each item in the list.
for item in file_list:
    content_list = os.listdir(item)  # content_list is NOT sorted.
    print_data(f"Contents of {item}:", content_list)

### Walking around
The function `walk()` traverses a directory recursively and for each subdirectory generates a 3-tuple containing the directory path, a list of any immediate sub-directories of that path, and a list of the names of any files in that directory.  You have to supply a starting directory as a parameter.

Your output data should look something like this:
>```console
Contents of Directory .:
[
    ".config",
    "sample_data"
]
Contents of Directory ./.config:
[
    "logs",
    "configurations",
    ".last_update_check.json",
    ".last_opt_in_prompt.yaml",
    "gce",
    ".last_survey_prompt.yaml",
    "active_config",
...
(and perhaps more)
...
```  



In [0]:
# Traverse (iterate over) the entire current directory,
# print everything found
for dir_name, sub_dirs, files in ???:  # Note the 3-tuple unpacking
    # Mix the directory contents together
    contents = sub_dirs + files
    print_data(f"Contents of Directory {dir_name}:", contents)

## 2. Create a new directory
Let's now create a SINGLE new directory in our current working folder

In [0]:
import os
# create one new empty directory in current working directory
os.mkdir('new_directory')
# Should print something like ['.config', 'new_directory', 'sample_data']
print(os.listdir())

In [0]:
# create a new directory recursively, with multiple parent directories

# Leaf directory name.  It's at the end of the path "branch"
directory = "leafdir"
# Parent Directory heirarchy
parent_dir = "parent/subdir_1/subdir_2/subdir_3"
# Compose the path 
path = os.path.join(parent_dir, directory) 
# Create the directory
os.makedirs(path)
# Should print something like ['.config', 'parent', 'new_directory', 'sample_data']
print(os.listdir())

## 3. Rename a directory
Change the name of 'new_directory' to 'my_dir'

In [0]:
os.rename('new_directory', 'my_dir')
# Should print ['.config', 'parent', 'my_dir', 'sample_data']
print(os.listdir())

## 4. Create a new file in `my_dir`


In [0]:
# compose path + file
filename = os.path.join('my_dir', 'last_night.txt')
# write some text to file
with open(filename, 'w') as f:
    f.write("Sag mal was haben wir denn gestern Nacht so getrieben?")
# Should print ['last_night.txt']
print(os.listdir('my_dir'))

## 5. What is an absolute path?
*Absolute paths* always start with a `/` character which represents the root of the filesystem tree.  If a path does not start with `/`, it is a *relative path*, which means that in order to navigate to that resource, you need to know where your starting point is.  Find the absolute path of the `last_night.txt` file that we just created.

In [0]:
# Print the absolute path of the last_night.txt file
# Should print "/content/my_dir/last_night.txt"
print(os.path.abspath(filename))

## 6. What is a basename?
A file path consists of two components:  The path, and the actual filename.  The basename is the same as the filename.  For example, if the absolute path is `/content/my_dir/last_night.txt`, then the basename is `last_night.txt`.  From the absolute path, determine the basename.

In [0]:
full_path = os.path.abspath(filename)
print("The full (absolute) path is: ", full_path)
base_name = os.path.basename(full_path)
print("The basename is: ", base_name)

## 7. Check if a file exists
Sometimes you will need to check whether a particular file exists or does not exist, and take some action based on that.

In [0]:
# Does the last_night.txt file exist?
# This illustrates the LBYL way
if os.path.exists(filename):
    text = open(filename).read()
    size = os.path.getsize(filename)
    print(f"File {filename} has size {size} bytes")
    print(f"Contents of {filename}: {text}")
else:
    print(f"Unable to read {filename}")

### Warning about anti-patterns
The code snippet in exercise 7 contains an anti-pattern.  Anti-patterns are a common response to a recurring problem that is usually _ineffective_ and **risks being highly counterproductive**.  The problem here is that there is a small but non-zero time gap between checking whether the file exists, and reading it.  Since we are dealing with very fast computers however, this time gap is effectively an eternity.  Anything can happen during that time gap, including having this file deleted right after you thought it was safe to read from it.  A better way is to just try the read, and handle the error if it happens.  This illustrates the differene between LBYL and EAFP.

- LBYL = **"Look Before You Leap"**  e.g. check first, then do it.
- EAFP = **"Easier to Ask Forgiveness instead of Permission"** e.g. just try it, and handle errors after the fact

In [0]:
# A better approach than LBYL is EAFP in this case
try:
    text = open(filename).read()
    size = os.path.getsize(filename)
    print(f"File {filename} has size {size} bytes")
    print(f"Contents of {filename}: {text}")
except OSError:
    print(f"Unable to read {filename}")

## 8. Check if a path refers to a directory or a file
Sometimes it's not obvious whether a path is pointing at a directory or an actual file which can be opened.

In [0]:
if os.path.isdir(filename):
    print(f"{filename} is a directory")
else:
    print(f"{filename} is NOT a directory")

if os.path.isdir(path):
    print(f"{path} is a directory")
else:
    print(f"{path} is NOT a directory")

## 9. Delete a directory
In order to delete a directory with os, the directory must first be empty.
Note this operation will raise an OSError exception:

In [0]:
# Attempt to remove my_dir
os.rmdir('my_dir')
print(os.listdir())

In [0]:
# Delete the file first, then the directory can be removed
print('Removing file: ', filename)
os.remove(filename)
# Now the directory removal will succeed
os.rmdir('my_dir')
print(os.listdir())

## 10. Find current Process ID (PID)
Every program running on a computer has a PID.  PIDs are how the Operating System keeps track of all the different running applications

In [0]:
my_pid = os.getpid()
print("My current PID is: ", my_pid)

# Conclusions
The OS module in Python Standard Library provides a rich set of functions for interfacing directly with your Operating System, whether it is Windows, Linux, or MacOS. We have only tried out a handful of the available methods in this workbook.