# Using the file system

Name:

Date:

Learning Objectives:
By the end of this lesson, you should be able to:
1. Use the `os` module to write platform-independent scripts to access information about the file system
2. Copy files using the `shutil` module
3. Use the `glob` module to write Unix-type file search commands
4. Use the `subprocess` module to run shell-script code from python

### Import Modules for this Notebook
In the previous notebook introducing modules, we imported modules as we needed them. However, it is good practice to import all of the modules you need in your notebook (or other scripts) in one import block near the top of the file. For this notebook, we will use 4 modules:

In [None]:
# import the os, shutil, glob, and subprocess modules


# Part 1: The `os` module and paths

Python's built-in `os` module is a useful too to acccess information about the file system. Many of the `os` functions mimic common shell scripting commands but return python objects that can be used in Python code. Let's take a look at few examples:

### The Current Working Directory

In [None]:
# check your current working directory


Questions:
1. What kind of path is this (absolute or relative)?
2. What is the equivalent command in your terminal?

### Contents of the Current Working Directory

In [None]:
# check the contents of your current working directory


Questions:
1. What files are visible in your file system?
2. What is the equivalent command in your terminal?

### Paths on your machine
Paths on your machine provide the address where certain data is stored. For example, the above list, we find that there is a directory called `data` in our present working directory. If we would like to provide a path in this folder, we just need to append `data` to our current path. However, different operating systems use different formats for the string representation of paths. The `os` module gives us a convenient way to write platform-independent paths.

In [None]:
# create an absolute path to the data folder


# print out the data folder path


# print the number of files in the data folder


### &#x1F914; Mini-Exercise
Goal: Get a list of all Jupyter Notebooks we've written in CS 122 so far using the `os` module.

In [None]:
# set an absolute path to your CS 122 directory


# loop through the contents of your Lecture directory to get a list
# of directories corresponding to each of the lectures we've had so far


# for each lecture directory, loop through the contents and see 
# if any of the files have the extension 'ipynb'. Note also that
# there may be hidden file for ipynb checkpoints, so be careful
# about how your file comparison is checked


# print out the list of notebooks, one per line


### Making new directories
The `os` module gives us the functionality to modify our file system. For example, we can make a new directory given an absolute (or relative) path.

In [None]:
# define a path for a new organized_data directory


# make a new directory called organized_data in the present working directory


# revise the above line to provide a check to determine whether the data exists - only make it if it does not exist

Question: What is the equivalent command in the terminal?

### Moving files to new directories
The `os` module also provides a means to move files from one location on your machine to another: the `rename` method.

In the following code block, we will practice creating directories and moving files using the 2022 data in the `data` folder. Begin by looping through the `data` directory and generating a folder in the `organized_data` directory for each month (e.g. 2022_01, 2022_02, etc). Then, move each 2022 file from the `data` directory into its corresponding month file in the `organized_data` directory

In [None]:
# make a new folder in the organized_data folder for each month in 2022


    # check that the file is from 2022


        # define the name of a new folder in the format YYYY_MM


        # if this year_month is not yet in the organized_data directory, then make it


        # move the file into the year_month folder
        # define the src_path and the dest_path
        # then, move the file


# Part 2: The `shutil` module
The `shutil` mode provides the utility to make copies of files on your file system. There are three main functions used for copying files, as follows:

|  | copyfile | copy | copy2 |
| -- | -------- | ---- | ----- |
| Destination can be a directory | N | Y | Y |
| Copies metadata | N | N | Y |
| Copies permissions | N | Y | Y |

In [None]:
# define a path to the source data file 2023_0101.txt in data


# define a destination path to the current directory with the file name


# try the copyfile method with the dst path
# what happens if you just provide the current directory?


# try the copy method with the dst path


# try the copy2 method with the dst path


### &#x1F914; Mini-Exercise
Modify the code above to make copies of the 2023 data in monthly directories in the `organized_data` directory

In [None]:
# make a new folder in the organized_data folder for each month in 2022


    # check that the file is from 2023


        # define the name of a new folder in the format YYYY_MM


        # if this year_month is not yet in the organized_data directory, then make it


        # make a copy of the file in the year_month folder
        # define the src_path and the dest_path
        # then, copy the file using one of the shutil functions


## Overview: Python Commands vs Unix Shell Commands

| Python | Unix | Purpose |
| ------ | ---- | ------- |
| os.getcwd() | pwd | Determine the current/present working directory |
| os.chdir() | cd | Change directory |
| os.mkdir() | mkdir | Make a directory |
| os.rename() | mv | Rename a file or move to a new location |
| os.listdir() | ls | List the files and folders in a directory |
| shutil.copy() | cp | Copy a file to a new location |

# Part 3: The `glob` module
When using Unix-type shell commands, wildcard symbols are extremely useful for finding and accessing subsets of files. There are 2 main wildcard symbols:

| symbol | use |
| ------ | --- |
| `?`    | Wildcard for a single symbol |
| `*`    | Wildcard symbol for any number of symbols |

Try these in the `data` directory in your shelf:
1. How would you determine the names of files that correspond to the first day of each month in 2023?
2. How would you determine the name of all files that correspond to December of 2023?

The `glob` module provides functionality to provide Unix-style searches of your file system.

In [None]:
# find all files names that correspond to the first day of each month in 2023


# find all files in December 2023


### &#x1F914; Mini-Exercise
Goal: Get a list of all Jupyter Notebooks we've written in CS 122 so far using the `glob` module.

In [None]:
# define a search path


# use the glob module to get the list of paths for the notebooks


# make a loop to just get the file name (not the whole path)


# print the notebook files, line by line


# Part 4: The `subprocess` module
The final module we will investigate in this notebook is the `subprocess` module. It's not necessarily related to using the file system, but its related to accessing the terminal and running shell scripts from Python, so it fits within the theme of this notebook.

The most useful method of the `subprocess` module if Popen.

In [None]:
# write a function to list the files in the current directory


# the first and second arguments from Popen are the standard output and standard error
# if a string is desired, then "pipe" the stdout


# print the type of output


# convert the type of the output to a string


# split the output and print line by line
