<a href="https://colab.research.google.com/github/KenzieAcademy/python-notebooks/blob/master/activity_python_os.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img height="100px" src="https://drive.google.com/uc?export=view&id=1Wv8etUe0EkGNpFe2n3lDLA3DsIFxEtAf" />

# Exploring Python's OS Library

In Python, one of the most heavily used features of the language is the ability to manipulate local files:  Reading, writing, listing, sizing, checking, removing .. you get the idea.  To do this, Python provides a Standard Libary module named `os` (lowercase of course).  There is also a submodule within `os` named `os.path`.  The functionality between these modules can be roughly stated as:
 - os.path : Working with pathnames (joining directories, filenames, extensions, basenames, absolute paths)
 - os : Doing things with pathnames (creating, removing, copying, etc)

As the name implies, `os` is Python's interface with your local operating system (MacOS, Linux, Windows).  Python’s `os` module does a little bit of everything; it’s sort of a junk drawer for system-related stuff.  In this activity we'll explore some of the capabilities of both of these standard libraries.

## References
- https://docs.python.org/3/library/os.html

# Traversing the File System
You might recall that different filesystems have different ways of representing paths.
 - Windows: `C:\Users\name`
 - Mac/Linux: `/home/name`
Notice that Windows uses the backslash `\` as a path delimiter.  MacOS/Linux uses a forward slash `/`.   The `os` module is able to navigate around in the local filesystem, regardless of what OS you are using.  

Be careful when constructing your own hard-coded pathnames
```python
folder = "home"
user = "daniel"
pathname = folder + "/" + user  # NO
pathname = os.path.join(folder, user)  # YES
```
The first form is "brittle" and will break if you try to run your Python program on a Windows machine.  By allowing Python to choose the correct path delimiter with the `os.path.join` function, the second form is now flexible and will work on all platforms.

# Exercises
In each of the exercises, replace any `???` with an appropriate function call from the os module.  Example:
```python
# list files
os.??? --> os.listdir()
```

In [0]:
# Before we begin the exercises, let's set up a handy function to print data.
import json
def print_data(title, data):
    print(f"{title}\n{json.dumps(data, indent=4)}")

## 1. Listing items: listdir, scandir, walk
Note the title uses the word "items".  This is because a directory may contain files as well as more directories.  So we call everything an 'item' in a directory.  This notebook may be running on your local machine if you copied it, but chances are it is running on a cloud server somewhere.  Lets have a look around:

In [0]:
import os

# Where are we?
cwd = os.getcwd()
print_data("Current directory:", cwd)

### Output from cell below should look like this:
```console
Current Dir Items:
[
    ".config",
    "sample_data"
]
```

In [0]:
# What files and/or directories are here?
file_list = ???  # Gets a list of items in the current directory
print_data("Current Dir Items:", file_list)

If more information is needed than just the names of the files, it is more efficient to use `scandir()` than `listdir()` because more information is collected in one system call when the directory is scanned.



In [0]:
# Get some more details about each item
for item in ???:
    if item.is_dir():
        typ = 'D'
    elif item.is_file():
        typ = 'F'
    elif item.is_symlink():
        typ = 'L'
    else:
        typ = '?'
    print(f'{item.name} {typ}')

### A deeper look
We can use a for-loop to probe into the `file_list` that was returned.  Output from cell below should look like this:
```console
Contents of .config:
[
    ".last_update_check.json",
    ".last_opt_in_prompt.yaml",
    "gce",
    "logs",
    ".last_survey_prompt.yaml",
    "active_config",
    "config_sentinel",
    "configurations",
    ".metricsUUID"
]
Contents of sample_data:
[
    "README.md",
    "anscombe.json",
    "california_housing_train.csv",
    "california_housing_test.csv",
    "mnist_train_small.csv",
    "mnist_test.csv"
]
```


In [0]:
# Let's go one level deeper: List each item in the list.
for item in file_list:
    content_list = os.listdir(item)  # content_list is NOT sorted.
    print_data(f"Contents of {item}:", content_list)

### Walking around
The function `walk()` traverses a directory recursively and for each subdirectory generates a 3-tuple containing the directory path, a list of any immediate sub-directories of that path, and a list of the names of any files in that directory.  You have to supply a starting directory as a parameter.

Your output data should look something like this:
>```console
Contents of Directory .:
[
    ".config",
    "sample_data"
]
Contents of Directory ./.config:
[
    "logs",
    "configurations",
    ".last_update_check.json",
    ".last_opt_in_prompt.yaml",
    "gce",
    ".last_survey_prompt.yaml",
    "active_config",
...
(and perhaps more)
...
```  



In [0]:
# Traverse (iterate over) the entire current directory,
# print everything found
for dir_name, sub_dirs, files in ???:  # Note the 3-tuple unpacking
    # Mix the directory contents together
    contents = sub_dirs + files
    print_data(f"Contents of Directory {dir_name}:", contents)

In [0]:
#@title It's hidden {display-mode: "form"}

# This code will be hidden when