# 03: Useful standard library modules
(pathlib, shutil, sys, os, subprocess, zipfile, etc.)

These packages are part of the standard python library and provide very useful functionality for working with your operating system and files.  This notebook will provide these packages and demonstrate some of their functionality.  Online documentation is at https://docs.python.org/3/library/.


## Topics covered:
* **pathlib**:
    * listing files
    * creating, moving and deleting files
    * absolute vs relative paths
    * useful path object attributes
* **shutil**: 
    * copying, moving and deleting files AND folders
* **sys**: 
    * python and platform information
    * command line arguments
    * modifying the python path to import code from other locations
* **os**:
    * changing the working directory
    * recursive iteration through folder structures
    * accessing environmental variables
* **subprocess**: 
    * running system commands and checking the results
* **zipfile**:
    * creating and extracting from zip archives

In [1]:
import os
from pathlib import Path
import shutil
import subprocess
import sys
import zipfile

## ``pathlib`` — Object-oriented filesystem paths
Pathlib provides convenient "pathlike" objects for working with file paths across platforms (meaning paths or operations done with pathlib work the same on Windows or POSIX systems (Linux, OSX, etc)). The main entry point for users is the ``Path()`` class.

further reading:  
https://treyhunner.com/2018/12/why-you-should-be-using-pathlib/  
https://docs.python.org/3/library/pathlib.html

### Listing files

#### Start by making a ``Path()`` object for the current folder

In [2]:
cwd = Path('.')
cwd

PosixPath('.')

In [3]:
for f in cwd.iterdir():
    print(f)

10b_Rasterio_advanced.ipynb
00_python_basics_review.ipynb
03_useful-std-library-modules.ipynb
05_numpy.ipynb
11_xarray_mt_rainier_precip.ipynb
10a_Rasterio_intro.ipynb
07b_VSCode.md
09_b_Geopandas_ABQ.ipynb
data
06b_matplotlib_animation.ipynb
solutions
09_a_Geopandas.ipynb


#### List just the notebooks using the ``.glob()`` method

In [4]:
for nb in cwd.glob('*.ipynb'):
    print(nb)

10b_Rasterio_advanced.ipynb
00_python_basics_review.ipynb
03_useful-std-library-modules.ipynb
05_numpy.ipynb
11_xarray_mt_rainier_precip.ipynb
10a_Rasterio_intro.ipynb
09_b_Geopandas_ABQ.ipynb
06b_matplotlib_animation.ipynb
09_a_Geopandas.ipynb


#### Note: ``.glob()`` works across folders too
List all notebooks for both class components

In [5]:
for nb in cwd.glob('../*/*.ipynb'):
    print(nb)

../part0_python_intro/10b_Rasterio_advanced.ipynb
../part0_python_intro/00_python_basics_review.ipynb
../part0_python_intro/03_useful-std-library-modules.ipynb
../part0_python_intro/05_numpy.ipynb
../part0_python_intro/11_xarray_mt_rainier_precip.ipynb
../part0_python_intro/10a_Rasterio_intro.ipynb
../part0_python_intro/09_b_Geopandas_ABQ.ipynb
../part0_python_intro/06b_matplotlib_animation.ipynb
../part0_python_intro/09_a_Geopandas.ipynb
../part1_flopy/05_Unstructured_Grid_generation.ipynb
../part1_flopy/10a_prt_particle_tracking-demo.ipynb
../part1_flopy/08_Modflow-setup-demo.ipynb
../part1_flopy/09-gwt-voronoi-demo.ipynb
../part1_flopy/10b_modpath_particle_tracking-demo.ipynb
../part1_flopy/01-Flopy-intro.ipynb


#### But ``glob`` results aren't sorted alphabetically!
(and the sorting is platform-dependent)

https://arstechnica.com/information-technology/2019/10/chemists-discover-cross-platform-python-scripts-not-so-cross-platform/?comments=1&post=38113333

we can easily sort them by casting the results to a list

In [6]:
sorted(list(cwd.glob('../*/*.ipynb')))

[PosixPath('../part0_python_intro/00_python_basics_review.ipynb'),
 PosixPath('../part0_python_intro/03_useful-std-library-modules.ipynb'),
 PosixPath('../part0_python_intro/05_numpy.ipynb'),
 PosixPath('../part0_python_intro/06b_matplotlib_animation.ipynb'),
 PosixPath('../part0_python_intro/09_a_Geopandas.ipynb'),
 PosixPath('../part0_python_intro/09_b_Geopandas_ABQ.ipynb'),
 PosixPath('../part0_python_intro/10a_Rasterio_intro.ipynb'),
 PosixPath('../part0_python_intro/10b_Rasterio_advanced.ipynb'),
 PosixPath('../part0_python_intro/11_xarray_mt_rainier_precip.ipynb'),
 PosixPath('../part1_flopy/01-Flopy-intro.ipynb'),
 PosixPath('../part1_flopy/05_Unstructured_Grid_generation.ipynb'),
 PosixPath('../part1_flopy/08_Modflow-setup-demo.ipynb'),
 PosixPath('../part1_flopy/09-gwt-voronoi-demo.ipynb'),
 PosixPath('../part1_flopy/10a_prt_particle_tracking-demo.ipynb'),
 PosixPath('../part1_flopy/10b_modpath_particle_tracking-demo.ipynb')]

**Note:** There is also a glob module in the standard python library that works directly with string paths

In [7]:
import glob
sorted(list(glob.glob('../*/*.ipynb')))

['../part0_python_intro/00_python_basics_review.ipynb',
 '../part0_python_intro/03_useful-std-library-modules.ipynb',
 '../part0_python_intro/05_numpy.ipynb',
 '../part0_python_intro/06b_matplotlib_animation.ipynb',
 '../part0_python_intro/09_a_Geopandas.ipynb',
 '../part0_python_intro/09_b_Geopandas_ABQ.ipynb',
 '../part0_python_intro/10a_Rasterio_intro.ipynb',
 '../part0_python_intro/10b_Rasterio_advanced.ipynb',
 '../part0_python_intro/11_xarray_mt_rainier_precip.ipynb',
 '../part1_flopy/01-Flopy-intro.ipynb',
 '../part1_flopy/05_Unstructured_Grid_generation.ipynb',
 '../part1_flopy/08_Modflow-setup-demo.ipynb',
 '../part1_flopy/09-gwt-voronoi-demo.ipynb',
 '../part1_flopy/10a_prt_particle_tracking-demo.ipynb',
 '../part1_flopy/10b_modpath_particle_tracking-demo.ipynb']

#### List just the subfolders

In [8]:
[f for f in cwd.iterdir() if f.is_dir()]

[PosixPath('data'), PosixPath('solutions')]

### Creating files and folders

#### make a ``Path`` object for a new subdirectory

In [9]:
new_folder = cwd / 'more_files'
new_folder

PosixPath('more_files')

#### or an individual file

In [10]:
f = cwd / '00_python_basics_review.ipynb'
f

PosixPath('00_python_basics_review.ipynb')

#### check if it exists, or if it's a directory

In [11]:
f.exists(), f.is_dir()

(True, False)

#### make the actual folder

In [12]:
new_folder.mkdir(); new_folder.exists()

True

Note that if you try to run the above cell twice, you'll get an error that the folder already exists
``exist_ok=True`` suppresses these errors.

In [13]:
new_folder.mkdir(exist_ok=True)

#### make a new subfolder within a new subfolder
The ``parents=True`` argument allows for making subfolders within new subfolders

In [14]:
(new_folder / 'subfolder').mkdir(exist_ok=True, parents=True)

### absolute vs. relative pathing

Get the absolute location of the current working directory

In [15]:
abs_cwd = Path.cwd()
abs_cwd

PosixPath('/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/notebooks/part0_python_intro')

Go up two levels to the course repository

In [16]:
class_root = (abs_cwd / '../../')
class_root

PosixPath('/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/notebooks/part0_python_intro/../..')

Simplify or resolve the path

In [17]:
class_root = class_root.resolve()
class_root

PosixPath('/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source')

Get the cwd relative to the course repository

In [18]:
abs_cwd.relative_to(class_root)

PosixPath('notebooks/part0_python_intro')

check if this is an absolute or relative path

In [19]:
abs_cwd.relative_to(class_root).is_absolute()

False

In [20]:
abs_cwd.is_absolute()

True

**gottcha:** `Path.relative_to()` only works when the first path is a subpath of the second path, or if both paths are absolute

For example, try executing this line: 

```python
Path('../part1_flopy/').relative_to('data')
```

If you need a relative path that will work robustly in a script, `os.path.relpath` might be a better choice

In [21]:
os.path.relpath('../part1_flopy/', 'data')

'../../part1_flopy'

In [22]:
os.path.relpath('data', '../part1_flopy/')

'../part0_python_intro/data'

### useful attributes

In [23]:
abs_cwd.parent

PosixPath('/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/notebooks')

In [24]:
abs_cwd.parent.parent

PosixPath('/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source')

In [25]:
f.name

'00_python_basics_review.ipynb'

In [26]:
f.suffix

'.ipynb'

In [27]:
f.with_suffix('.junk')

PosixPath('00_python_basics_review.junk')

In [28]:
f.stem

'00_python_basics_review'

### Moving and deleting files

Make a file

In [29]:
fname = Path('new_file.txt')
with open(fname, 'w') as dest:
    dest.write("A new text file.")

In [30]:
fname.exists()

True

Move the file

In [31]:
fname2 = Path('new_file2.txt')
fname.rename(fname2)

PosixPath('new_file2.txt')

In [32]:
fname.exists()

False

Delete the file

In [33]:
fname2.unlink()

In [34]:
fname2.exists()

False

#### Delete the empty folder we made above
Note: this only works for empty directories (use ``shutil.rmtree()`` very carefully for removing folders and all contents within)

In [35]:
Path('more_files/subfolder/').rmdir()

## ``shutil`` — High-level file operations
module for copying, moving, and deleting files and directories.

https://docs.python.org/3/library/shutil.html

The functions from shutil that you may find useful are:

    shutil.copy()
    shutil.copy2()  # this preserves most metadata (i.e. dates); unlike copy()
    shutil.copytree()
    shutil.move()
    shutil.rmtree()  #obviously, you need to be careful with this one!
    
Give these guys a shot and see what they do.  Remember, you can always get help by typing:

    help(shutil.copy)


In [36]:
#try them here.  Be careful!

In [37]:
shutil.rmtree(new_folder)

## ``sys`` — System-specific parameters and functions

### Getting information about python and the os
where python is installed

In [38]:
print(sys.prefix)

/home/runner/micromamba/envs/pyclass-docs


In [39]:
print(sys.version_info)

sys.version_info(major=3, minor=11, micro=13, releaselevel='final', serial=0)


In [40]:
sys.platform

'linux'

### Adding command line arguments to a script
Here the command line arguments reflect that we're running a Juptyer Notebook. 

In a python script, command line arguments are listed after the first item in the list.

In [41]:
sys.argv

['/home/runner/micromamba/envs/pyclass-docs/lib/python3.11/site-packages/ipykernel_launcher.py',
 '-f',
 '/tmp/tmp0e6h5tq4.json',
 '--HistoryManager.hist_file=:memory:']

### Exercise: Make a script with a command line argument using sys.argv

1) Using a text editor such as VSCode, make a new ``*.py`` file with the following contents:

```python
import sys

if len(sys.argv) > 1:
    for argument in sys.argv[1:]:
        print(argument)
else:
    print("usage is: python <script name>.py argument")
    quit()
```

2) Try running the script at the command line

### modifying the python path

If you haven't seen `sys.path` already mentioned in a python script, you will soon.  `sys.path` is a list of directories.  This path list is used by python to search for python modules and packages.  If for some reason, you want to use a python package or  module that is not installed in the main python folder, you can add the directory containing your module to sys.path.

Any packages installed by linking the source code in place (i.e. ``pip install -e .`` will also show up here.

In [42]:
for pth in sys.path:
    print(pth)

/home/runner/micromamba/envs/pyclass-docs/lib/python311.zip
/home/runner/micromamba/envs/pyclass-docs/lib/python3.11
/home/runner/micromamba/envs/pyclass-docs/lib/python3.11/lib-dynload

/home/runner/micromamba/envs/pyclass-docs/lib/python3.11/site-packages


### Using ``sys.path`` to import code from an arbitrary location

1) Using a text editor such as VSCode (or ``pathlib`` and python) make a new ``*.py`` file in another folder (anything in the same folder as this notebook can already be imported). For example:

In [43]:
subfolder = Path('another_subfolder/scripts')
subfolder.mkdir(exist_ok=True, parents=True)

with open(subfolder / 'mycode.py', 'w') as dest:
    dest.write("stuff = {'this is': 'a dictionary'}")

Now add this folder to the python path

In [44]:
sys.path.append('another_subfolder/scripts')

Code can be imported by calling the containing module

In [45]:
from mycode import stuff

stuff

{'this is': 'a dictionary'}

**Note**: Generally, importing code using ``sys.path`` is often considered bad practice, because 

* it can hide dependencies.    

    * from the information above, we don't know whether ``mycode`` is a package that is installed, a module in the current folder, or anywhere else for that matter.
    * Similarly, we know that any modules from ``'another_subfolder/scripts'`` can be imported, but we don't know which modules in that folder are needed without some additional checking.

* importing code using ``sys.path`` is also sensitive to the location of the script relative to the path. If the script is moved or used on someone else's computer with a different file structure, it'll break.

* this all said, sometimes using ``sys.path`` is expedient in reproducible workflows in that it can allow code to be consolidated and re-used across multiple scripts in various locations

For code that is useful across multiple projects, [installing reusable code in a package can be the best way to go](https://learn.scientific-python.org/development/tutorials/). Packages provide a framework for organizing, documenting, testing and sharing code in a way that is easily understood by others.

Whatever you do, avoid importing with an `*` (i.e. ``from mycode import *``) at all costs. This imports everything from the namespace of a module, which can lead to unintended consequences.

## ``os`` — Miscellaneous operating system interfaces¶
Historically, the ``os.path`` module was the de facto standard for file and path manipulation. Since python 3.4 however, ``pathlib`` is generally cleaner and easier to use for most of these operations. But there are some exceptions.

### Changing the current working directory
``pathlib`` doesn't do this.   
Note: this can obviously lead to trouble in scripts, so should usually be avoided, but sometimes it is necessary. In groundwater modeling workflows, for example, this can help keep flow and transport model files organized in separate folders.

In [46]:
# Example of changing the working directory
old_wd = os.getcwd()

# Go up one directory
os.chdir('..')
cwd = os.getcwd()
print ('Now in: ', cwd)

# Change back to original
os.chdir(old_wd)
cwd = os.getcwd()
print('Switched back to: ', cwd)

Now in:  /home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/notebooks
Switched back to:  /home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/notebooks/part0_python_intro


### os.walk

os.walk() is a great way to recursively generate all the file names and folders in a directory.  The following shows how it can be used to identify large directories.

In [47]:
pth = Path('..')
results = list(os.walk(pth))
results

[('..', ['part0_python_intro', 'part1_flopy'], []),
 ('../part0_python_intro',
  ['another_subfolder', 'data', 'solutions'],
  ['10b_Rasterio_advanced.ipynb',
   '00_python_basics_review.ipynb',
   '03_useful-std-library-modules.ipynb',
   '05_numpy.ipynb',
   '11_xarray_mt_rainier_precip.ipynb',
   '10a_Rasterio_intro.ipynb',
   '07b_VSCode.md',
   '09_b_Geopandas_ABQ.ipynb',
   '06b_matplotlib_animation.ipynb',
   '09_a_Geopandas.ipynb']),
 ('../part0_python_intro/another_subfolder', ['scripts'], []),
 ('../part0_python_intro/another_subfolder/scripts',
  ['__pycache__'],
  ['mycode.py']),
 ('../part0_python_intro/another_subfolder/scripts/__pycache__',
  [],
  ['mycode.cpython-311.pyc']),
 ('../part0_python_intro/data',
  ['xarray', 'rasterio', 'pandas', 'geopandas', 'numpy', 'fileio'],
  ['dream.txt', 'theis_charles_vernon.jpg', 'netcdf_data.zip']),
 ('../part0_python_intro/data/xarray',
  [],
  ['daymet_prcp_rainier_1980-2018.nc',
   'aligned-19700901_ned1_2003_adj_4269.tif']),
 (

#### Make a more readable list of just the jupyter notebooks
Note: the key advantage of ``os.walk`` over ``glob`` is the recursion-- individual subfolder levels don't need to be known or specified a priori.

In [48]:
for root, dirs, files in os.walk(pth):
    for f in files:
        filepath = Path(root, f)
        if filepath.suffix == '.ipynb':
            print(filepath)

../part0_python_intro/10b_Rasterio_advanced.ipynb
../part0_python_intro/00_python_basics_review.ipynb
../part0_python_intro/03_useful-std-library-modules.ipynb
../part0_python_intro/05_numpy.ipynb
../part0_python_intro/11_xarray_mt_rainier_precip.ipynb
../part0_python_intro/10a_Rasterio_intro.ipynb
../part0_python_intro/09_b_Geopandas_ABQ.ipynb
../part0_python_intro/06b_matplotlib_animation.ipynb
../part0_python_intro/09_a_Geopandas.ipynb
../part0_python_intro/solutions/01_functions_script__solution.ipynb
../part0_python_intro/solutions/09_Geopandas__solutions.ipynb
../part0_python_intro/solutions/08_pandas.ipynb
../part0_python_intro/solutions/02_Namespace_objects_modules_packages__solution.ipynb
../part0_python_intro/solutions/05_numpy__solutions.ipynb
../part0_python_intro/solutions/04_files_and_strings.ipynb
../part0_python_intro/solutions/03_useful-std-library-modules-solutions.ipynb
../part0_python_intro/solutions/06_matplotlib__solution.ipynb
../part0_python_intro/solutions/07a_

### Accessing environmental variables

In [49]:
os.environ

environ{'GITHUB_STATE': '/home/runner/work/_temp/_runner_file_commands/save_state_45f0dbec-0123-4edc-b22c-d9b41056a97e',
        'CONDA_PROMPT_MODIFIER': '(pyclass-docs) ',
        'DOTNET_NOLOGO': '1',
        'USER': 'runner',
        'CI': 'true',
        'GITHUB_ENV': '/home/runner/work/_temp/_runner_file_commands/set_env_45f0dbec-0123-4edc-b22c-d9b41056a97e',
        'PIPX_HOME': '/opt/pipx',
        'RUNNER_ENVIRONMENT': 'github-hosted',
        'JAVA_HOME_8_X64': '/usr/lib/jvm/temurin-8-jdk-amd64',
        'SHLVL': '1',
        'CONDA_SHLVL': '1',
        'HOME': '/home/runner',
        'RUNNER_TEMP': '/home/runner/work/_temp',
        'GITHUB_EVENT_PATH': '/home/runner/work/_temp/_github_workflow/event.json',
        'GITHUB_REPOSITORY_OWNER': 'DOI-USGS',
        'JAVA_HOME_11_X64': '/usr/lib/jvm/temurin-11-jdk-amd64',
        'PIPX_BIN_DIR': '/opt/pipx_bin',
        'ANDROID_NDK_LATEST_HOME': '/usr/local/lib/android/sdk/ndk/28.2.13676358',
        'GRADLE_HOME': '/usr/share/gr

#### Example: get the location of the current python (Conda) environment

In [50]:
os.environ['CONDA_PREFIX']

'/home/runner/micromamba/envs/pyclass-docs'

### Running system commands
`os.system` provides a limited way to run system commands. For more flexibility, use `subprocess` (below).

In [51]:
os.system('ls -l')

total 276
-rw-r--r-- 1 runner runner 32326 Sep 24 19:03 00_python_basics_review.ipynb
-rw-r--r-- 1 runner runner 31027 Sep 24 19:03 03_useful-std-library-modules.ipynb
-rw-r--r-- 1 runner runner 45695 Sep 24 19:03 05_numpy.ipynb
-rw-r--r-- 1 runner runner  2460 Sep 24 19:03 06b_matplotlib_animation.ipynb
-rw-r--r-- 1 runner runner 12297 Sep 24 19:03 07b_VSCode.md
-rw-r--r-- 1 runner runner 27811 Sep 24 19:03 09_a_Geopandas.ipynb
-rw-r--r-- 1 runner runner  9730 Sep 24 19:03 09_b_Geopandas_ABQ.ipynb
-rw-r--r-- 1 runner runner 22961 Sep 24 19:03 10a_Rasterio_intro.ipynb
-rw-r--r-- 1 runner runner 29620 Sep 24 19:03 10b_Rasterio_advanced.ipynb
-rw-r--r-- 1 runner runner 35369 Sep 24 19:03 11_xarray_mt_rainier_precip.ipynb
drwxr-xr-x 3 runner runner  4096 Sep 24 19:03 another_subfolder
drwxr-xr-x 8 runner runner  4096 Sep 24 19:01 data
drwxr-xr-x 2 runner runner  4096 Sep 24 19:03 solutions


0

In [52]:
# on Windows
os.system('dir')

00_python_basics_review.ipynb	     10a_Rasterio_intro.ipynb
03_useful-std-library-modules.ipynb  10b_Rasterio_advanced.ipynb
05_numpy.ipynb			     11_xarray_mt_rainier_precip.ipynb
06b_matplotlib_animation.ipynb	     another_subfolder
07b_VSCode.md			     data
09_a_Geopandas.ipynb		     solutions
09_b_Geopandas_ABQ.ipynb


0

## ``subprocess`` — Subprocess management

The subprocess module offers a way to execute system commands, for example MODFLOW, or any operating system command that you can type at the command line.

The recommended approach to invoking subprocesses is to use the ``run()`` function for all use cases it can handle. For more advanced use cases, the underlying ``Popen`` interface can be used directly.

Take a look at the following help descriptions for ``run``.

Note, that on Windows, you may have to specify "shell=True" in order to access system commands.

In [53]:
help(subprocess.run)

Help on function run in module subprocess:

run(*popenargs, input=None, capture_output=False, timeout=None, check=False, **kwargs)
    Run command with arguments and return a CompletedProcess instance.
    
    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them,
    or pass capture_output=True to capture both.
    
    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.
    
    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.
    
    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    

In [54]:
# if on mac/unix
print(subprocess.run(['ls', '-l']))

total 276
-rw-r--r-- 1 runner runner 32326 Sep 24 19:03 00_python_basics_review.ipynb
-rw-r--r-- 1 runner runner 31027 Sep 24 19:03 03_useful-std-library-modules.ipynb
-rw-r--r-- 1 runner runner 45695 Sep 24 19:03 05_numpy.ipynb
-rw-r--r-- 1 runner runner  2460 Sep 24 19:03 06b_matplotlib_animation.ipynb
-rw-r--r-- 1 runner runner 12297 Sep 24 19:03 07b_VSCode.md
-rw-r--r-- 1 runner runner 27811 Sep 24 19:03 09_a_Geopandas.ipynb
-rw-r--r-- 1 runner runner  9730 Sep 24 19:03 09_b_Geopandas_ABQ.ipynb
-rw-r--r-- 1 runner runner 22961 Sep 24 19:03 10a_Rasterio_intro.ipynb
-rw-r--r-- 1 runner runner 29620 Sep 24 19:03 10b_Rasterio_advanced.ipynb
-rw-r--r-- 1 runner runner 35369 Sep 24 19:03 11_xarray_mt_rainier_precip.ipynb
drwxr-xr-x 3 runner runner  4096 Sep 24 19:03 another_subfolder
drwxr-xr-x 8 runner runner  4096 Sep 24 19:01 data
drwxr-xr-x 2 runner runner  4096 Sep 24 19:03 solutions
CompletedProcess(args=['ls', '-l'], returncode=0)


With the `cwd` argument, we can control the working directory for the command. Here we list the files in the parent directory.

In [55]:
print(subprocess.run(['ls', '-l'], cwd='..'))

total 8
drwxr-xr-x 5 runner runner 4096 Sep 24 19:03 part0_python_intro
drwxr-xr-x 5 runner runner 4096 Sep 24 19:03 part1_flopy
CompletedProcess(args=['ls', '-l'], returncode=0)


In [56]:
# if on windows
print(subprocess.run(['dir'], shell=True))

00_python_basics_review.ipynb	     10a_Rasterio_intro.ipynb
03_useful-std-library-modules.ipynb  10b_Rasterio_advanced.ipynb
05_numpy.ipynb			     11_xarray_mt_rainier_precip.ipynb
06b_matplotlib_animation.ipynb	     another_subfolder
07b_VSCode.md			     data
09_a_Geopandas.ipynb		     solutions
09_b_Geopandas_ABQ.ipynb
CompletedProcess(args=['dir'], returncode=0)


## ``zipfile`` — Work with ZIP archives

### zip up one of the files in data/

In [57]:
with zipfile.ZipFile('junk.zip', 'w') as dest:
    dest.write('data/xarray/daymet_prcp_rainier_1980-2018.nc')

### now extract it

In [58]:
with zipfile.ZipFile('junk.zip') as src:
    src.extract('data/xarray/daymet_prcp_rainier_1980-2018.nc', path='extracted_data')

## Testing Your Skills with a truly awful example:

### the problem:
Pretend that the file `data/fileio/netcdf_data.zip` contains some climate data (in the NetCDF format with the ``*.nc`` extension) that we downloaded. If you open `data/fileio/netcdf_data.zip`, you'll see that within a subfolder `zipped` are a bunch of additional subfolders, each for a different year. Within each subfolder is another zipfile. Within each of these zipfiles is yet another subfolder, inside of which is the actual data file we want (`prcp.nc`). 

In [59]:
with zipfile.ZipFile('data/netcdf_data.zip') as src:
    for f in src.namelist()[:10]:
        print(f)

netcdf_data/
netcdf_data/zipped/
netcdf_data/zipped/zipped_1991/
netcdf_data/zipped/zipped_1991/12270_1991.zip
netcdf_data/zipped/zipped_1996/
netcdf_data/zipped/zipped_1996/12270_1996.zip
netcdf_data/zipped/zipped_1998/
netcdf_data/zipped/zipped_1998/12270_1998.zip
netcdf_data/zipped/zipped_1999/
netcdf_data/zipped/zipped_1999/12270_1999.zip


### the goal:
To extract all of these `prcp.nc` files into a single folder, after renaming them with their respective years (obtained from their enclosing folders or zip files). e.g.  
```
prcp_1980.nc
prcp_1981.nc
...
```
This will allow us to open them together as a dataset in `xarray` (more on that later). Does this sound awful? I'm not making this up. This is the kind of structure you get when downloading tiles of climate data with the [Daymet Tile Selection Tool](https://daymet.ornl.gov/gridded/)

### hint:
you might find these functions helpful:
```
ZipFile.extractall
ZipFile.extract
Path.glob
Path.mkdir
Path.stem
Path.parent
Path.name
shutil.move
Path.rmdir()
```

### hint: start by using ``ZipFile.extractall()`` to extract all of the individual zip files from the main zip archive
This extracts the entire contents of the zip file to a designated folder

In [60]:
output_folder = Path('03-output')
output_folder.mkdir(exist_ok=True)

with zipfile.ZipFile('data/netcdf_data.zip') as src:
    src.extractall(output_folder)

Make a list of the zipfiles

In [61]:
zipfiles = list(output_folder.glob('netcdf_data/zipped/*/*.zip'))
zipfiles[:5]

[PosixPath('03-output/netcdf_data/zipped/zipped_2000/12270_2000.zip'),
 PosixPath('03-output/netcdf_data/zipped/zipped_1981/12270_1981.zip'),
 PosixPath('03-output/netcdf_data/zipped/zipped_2011/12270_2011.zip'),
 PosixPath('03-output/netcdf_data/zipped/zipped_2004/12270_2004.zip'),
 PosixPath('03-output/netcdf_data/zipped/zipped_1991/12270_1991.zip')]

### Part 1: extract with a single file

In [62]:
f = zipfiles[0]
f

PosixPath('03-output/netcdf_data/zipped/zipped_2000/12270_2000.zip')

#### 1a) Use ``ZipFile.namelist()`` (as above) list the contents

This will yield the name of the ``*.nc`` file that we need to extract

#### 1b) Use ``ZipFile.extract()`` to extract the ``*.nc`` file to the destination folder
(you may need to create the destination folder first)

#### 1c) Move the extracted file out of any enclosing subfolders, and rename to ``prcp_<year>.nc``
(so that if we repeat this for subsequent files, the extracted ``*.nc`` files will end up in the same place)

#### 1d) Remove the extra subfolders that were extracted

### Part 2: put the above steps together into a loop to repeat the workflow for all of the NetCDF files

## Bonus Application -- Using ``os`` to find the location of an executable

There are often times that you run an executable that is nested somewhere deep within your system path.  It can often be a good idea to know exactly where that executable is located.  This might help you one day from accidentally using an older version of an executable, such as MODFLOW.

In [63]:
# Define two functions to help determine 'which' program you are using
def is_exe(fpath):
    """
    Return True if fpath is an executable, otherwise return False
    """
    return os.path.isfile(fpath) and os.access(fpath, os.X_OK)

def which(program):
    """
    Locate the program and return its full path.  Return
    None if the program cannot be located.
    """
    fpath, fname = os.path.split(program)
    if fpath:
        if is_exe(program):
            return program
    else:
        # test for exe in current working directory
        if is_exe(program):
            return program
        # test for exe in path statement
        for path in os.environ["PATH"].split(os.pathsep):
            path = path.strip('"')
            exe_file = os.path.join(path, program)
            if is_exe(exe_file):
                return exe_file
    return None

In [64]:
which('mf6')

'/home/runner/.local/bin/mf6'