## Review of NDVI workflow

Below we will review the workflow to calculate a difference NDVI from two dates (e.g. pre and post fire event).  

In [None]:
# Import necessary packages
import os
from glob import glob
import matplotlib.pyplot as plt
import numpy as np
from shapely.geometry import box
import geopandas as gpd
import rasterio as rio
from rasterio.plot import plotting_extent
from rasterio.mask import mask
import earthpy as et
import earthpy.spatial as es
import earthpy.plot as ep

# Get data and set working directory
data = et.data.get_data('cold-springs-fire')
os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))

In [None]:
# Open fire boundary
fire_bound_path = os.path.join("data", "cold-springs-fire", "vector_layers",
                               "fire-boundary-geomac", "co_cold_springs_20160711_2200_dd83.shp")
fire_bound = gpd.read_file(fire_bound_path)

naip_2015_path = os.path.join("data", "cold-springs-fire", "naip",
                              "m_3910505_nw_13_1_20150919", "crop",
                              "m_3910505_nw_13_1_20150919_crop.tif")

# Add path for your download of the naip_2017 data


# Open 2015 data


# Open 2017 data and crop to the boundary of the 2015 data



In [None]:
# Calculate ndvi for 2015 and 2017 NAIP data
naip_ndvi_2015 = es.normalized_diff(naip_2015_crop[3], naip_2015_crop[0])
naip_ndvi_2017 = es.normalized_diff(
    naip_2017_crop[3].astype(int), naip_2017_crop[0].astype(int))

# Calculate NDVI Difference: post minus pre
ndvi_diff = naip_ndvi_2017 - naip_ndvi_2015

In [None]:
# Plot difference NDVI
fig, ax = plt.subplots(figsize=(12,12))

ep.plot_bands(ndvi_diff,
              cmap='PiYG',
              #extent=naip_2015_extent,
              #scale=False,
              ax=ax,
              title="NAIP NDVI Difference -  \n Post minus Pre fire (2017 - 2015)")

fire_bound_utmz13.plot(ax=ax, color='None', edgecolor='black', linewidth=2)

plt.show()

## Review of os and glob

The section below provides a review of `glob` and `os`, plus includes some new functionality in `os` that you have not learned to parse file names.

Using `glob` to create lists and `os` to parse file names are handy tasks when you are trying to automate workflows!

In [None]:
# Download data
data2 = et.data.get_data("ndvi-automation")

## Create Directories that Work Across Operating Systems - os.path.join

When you are working across different computers and platforms, it is useful to create paths that can be recognized by the Windows, Mac and Linux operating systems. The `join()` function from the `os.path` module creates a path in the format that the operating system upon which the code is being run (i.e. whatever your computer is running) requires.

This saves you the time of creating and fixing paths as you work on different machines. This approach becomes very useful when you need to move your workflow from say your laptop to a cloud or HPC environment. 

`os.path.join` takes as many strings are you provide in. It reads each string as a directory name and then creates an output path.

`os.path.join("dir1", "dir2", "dir3")`

IMPORTANT: you can create bad paths this way! This function does not actually test to ensure the path exists!

In [None]:
# Create a path
path = os.path.join("data", "ndvi-automation", "sites")
path

In [None]:
# Does the path exist?
os.path.exists(path)

In [None]:
# This path doesn't exist
path2 = os.path.join("Data", "NDVI-automation", "Sites")
os.path.exists(path2)

## Get Lists of Files Using glob and path.join

In a workflow where you are processing many files and directories, you can use `glob` with `path.join` to create a path and get a list of files in that path. 

By default, `glob()` returns only the files within that directory. 

In [None]:
# There are no individual files within the sites directory on this machine
path = os.path.join("data", "ndvi-automation", "sites")
glob(path)

You can add the syntax `*/` to tell glob to provide a list of directories rather than files. 

In [None]:
# Add a trailing slash to force listing of directories
another_path = os.path.join("data", "ndvi-automation", "sites")
all_sites = glob(os.path.join(another_path, "*/"))
all_sites

You can nest the above steps into one step as well.

In [None]:
# This single line of code is the same as the line of code above
glob(os.path.join("data", "ndvi-automation", "sites", "*/"))

Once you have a list of directories, you could loop through each directory and do something with data within that directory.

In [None]:
# Print out all site directories
for site_files in all_sites:
    print(site_files)

If you want to create a list of all directories within the landsat_crop dir of each site subdirectory, there are a few ways to do this! 

We'll look at three here. The first is using for loops to go through the directories.

In [None]:
# Define the directory name
landsat_dir = "landsat-crop"

# Loop through each site directory
for site_files in all_sites:

    # Get a list of subdirectories for that site
    new_path = os.path.join(site_files, landsat_dir)
    all_dirs = glob(new_path + "/*/")

    # Loop and print the path for each subdirectory
    for adir in all_dirs:
        print(adir)

The second way to get this list uses the * syntax in `glob` to customize the list of folders returned. Remember, anywhere in a file path you want to be variable you can replace with a `*`. 

Seeing as this is the case, we can get all of the folders within the `landsat-crop` folders by specifying the middle folder, as shown below. Notice how it finds everything within the `landsat-crop` folder in both the HARV and SJER folders.  

In [None]:
glob(os.path.join("data", "ndvi-automation", "sites", "*", "landsat-crop", "*"))

This way works with glob well, but there's another way to get this list using `glob`! 

By forcing only listing directories with a trailing /, we can make `glob` return this same list of direcotries without specifying the `landsat-crop` folder. 

This only works because none of the other directories within the `HARV` and `SJER` directories contain more directories, they all store individual files. 

In [None]:
glob(os.path.join("data", "ndvi-automation", "sites", "*", "*", "*/"))

### Sorting `glob` Lists

Notice that these lists aren't sorted. If it's important for a list to be in a certain order (such as satellite bands, for example) than make sure to sort the list after glob gives it to you.

Sorting can be unreliable, so make sure that your sort is in the order you thought it would be in before you move on with your project! 

For example, if two items have identical path names, but one ends in `10` and the other ends in `1`, sometimes the file ending in `10` will be put above the file ending in `1`. Always double check!

In [None]:
# Sort the list glob returned
sorted(glob(os.path.join('data', 'ndvi-automation', 'sites', 'HARV',
                         'landsat-crop', 'LC080130302017072301T1-SC20181023152048', '*band*')))

### Why Sort `glob` Lists?

The way that `glob` returns files from a folder can vary drastically. Depending on the operating system being used, or the way the files are stored, different people may get results from a `glob` list in different orders. This can lead to data errors when running projects across computers. Below shows how sorting a `glob` list changes what files you access when getting an index from the list. Notice how the same index (4) returns two different files. 

In [None]:
# Indexes can change once a list is sorted!
unsorted_list = glob(os.path.join('data', 'ndvi-automation', 'sites', 'HARV',
                                  'landsat-crop', 'LC080130302017072301T1-SC20181023152048', '*band*'))

sorted_list = sorted(glob(os.path.join('data', 'ndvi-automation', 'sites', 'HARV',
                                       'landsat-crop', 'LC080130302017072301T1-SC20181023152048', '*band*')))
unsorted_list[4], sorted_list[4]

### Using Ranges

In addition to using `*` to specify which parts of a file name are important to you, you can use `[]` to specify a range of characters to search for. This range is for characters only, not strings. You can search for numbers 2-7 with `[2-7]` but you would not be able to search for number `[2-14]` as `14` is a string, not a character. 

This is not just limited to numbers. `[d-q]` would also filter results for characters between the letters `d` and `q`. 

In [None]:
# Get a range of data
glob(os.path.join('data', 'ndvi-automation', 'sites', 'HARV',
                  'landsat-crop', 'LC080130302017072301T1-SC20181023152048', '*band[1-3]*'))

In [None]:
# Get a date range incorrectly
# NOTE: [2017-2018] does not work here since those are strings, not characters.
glob(os.path.join('data', 'ndvi-automation', 'sites', 'HARV',
                  'landsat-crop', '*201[7-8]*'))

### `?` Operator

Similar to the `*` operator, the `?` operator is the same idea, but for a single character. 

If one character in the file name can be variable, but everything else must stay the same, than `?` is a good way to just replace that one character. 

`?` is not limited to one use per search, and can be used to replace more than one character in a query. 

In [None]:
# ? operator
glob(os.path.join('data', 'ndvi-automation', 'sites', 'HARV',
                  'landsat-crop', 'LC080130302017072301T1-SC20181023152048', '*band?.tif'))

In [None]:
# Multiple ? operators
glob(os.path.join('data', 'ndvi-automation', 'sites', 'HARV',
                  'landsat-crop', 'LC080130302017072301T1-SC20181023152048', '*band?????'))

## Grab Parts of a Directory Path

There are several ways that you can grab just a part of a path. Sometimes a file path has metadata in it that can be useful for creating useful variable names in your script. In your NDVI workflow, you may want to grab the site name from the directory path to use for your workflow. 

You can use a combination of `normpath()` and `basename()` functions from `os.path` to access the last directory in a path. In your case, this path contains your site name!


In [None]:
# Example of normpath cleaning up path
example_path = "home//user//example_dir"
os.path.normpath(example_path)

In [None]:
# Use normpath and basename together to get the last directory
sitename = os.path.basename(os.path.normpath(site_files))
sitename

There are endless ways to use the sitename as a variable in an automated workflow.

In [None]:
# Create a file name needed to open a file
print(os.path.join(site_files, "vector", sitename + "-crop.shp"))

# Create an output path to an output csv file
print(os.path.join('data', "ndvi-automation", "outputs", "final.csv"))

If you want to grab both the last directory name and the path prior to that directory, you can use `os.path.split` with `normpath()`.

In [None]:
os.path.split(os.path.normpath(site_files))

## Parse Text From Directory Names

There are numerous options to parse text from a file path. In your homework, you need to grab the date when each Landsat scene was collected. To grab just the date from the directory, you will need to:

1. get the full directory path
2. find the date embedded within the path name

If you refer back to the Landsat metadata, you will see that every scene has the same naming convention. 

This means that you can count the characters (i.e. indices) in the directory name to find the collection date (which is the first date in the string) and use the same indices for every scene!

In this case, you can find the date using a string index like this:

`astring[startindex:endindex]`

In [None]:
# View directory name
dir_name = os.path.basename(os.path.normpath(adir))

In [None]:
# Get landsat date from directory name
date = dir_name[10:18]
date

You can also break the entire path apart, if you need to do so, using `string_name.split()`.

`.split()` is a built in python function that splits a string into a list of strings based on a seperator 
character. For file paths, `os.sep` is a system friendly way to seperate file paths into their base parts. 

In [None]:
# Break paths into components
path = os.path.normpath(adir)
path.split(os.sep)

As you see, `string_name.split()` produces a list that you can query to get a specific component.

In [None]:
# Get the site name from the path
path_components = path.split(os.sep)
path_components[3]