# Step 0: Bulk download of Landsat image subsets through AWS

_Last modified 2022-07-01._

This script is run to download Landsat images over the glaciers available through the Amazon Web Services (AWS) s3 bucket. The workflow is streamlined to analyze images for 10s to 100s of glaciers, specifically, the marine-terminating glaciers along the periphery of Greenland. Sections of code that may need to be modified are indicated as below:

    ##########################################################################################

    code to modify

    ##########################################################################################

 
### First, configure your AWS profile to access the Landsat images on the s3 bucket:

Follow instructions at https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html to get required __aws command line software__.

Set up your AWS profile with a payment account. Then configure it to your machine following these steps:

    aws configure --profile terminusmapping
    
Enter in your credentials, which will be stored locally on your computer. The Boto3 package will be used to access your credentials without leaking your keys. **Protect your AWS access keys!** DO NOT print anything that involves your ACCESS_KEY and SECRET_KEY. GitGuardian may help track any leaked keys. In order to use these credentials to download subsets of images from AWS using vsi3, **GDAL version 3.2. or newer must be installed**.

In [1]:
# AWS settings
from rasterio.session import AWSSession
import boto3
import boto3.session

cred = boto3.Session(profile_name='terminusmapping').get_credentials()
ACCESS_KEY = cred.access_key
SECRET_KEY = cred.secret_key
SESSION_TOKEN = cred.token  ## optional


s3client = boto3.client('s3', 
                        aws_access_key_id = ACCESS_KEY, 
                        aws_secret_access_key = SECRET_KEY, 
#                         aws_session_token = SESSION_TOKEN
                       )

######################################################################################
# path to the collection on AWS usgs-landsat s3 bucket:
collectionpath = 'collection02/level-1/standard/' # collection 2 level 1 data being used
######################################################################################

# Overview of steps:

    1. Set-up: import packages, set paths, and enter glaciers IDs
    2. Find all the Landsat footprints that overlap the glaciers
    3. Download Landsat metadata (*MTL.txt) files from AWS for all overlapping scenes
    4. Calculate cloud % over terminus box using Landsat quality band (QA_PIXEL)
    5. Create buffer zone around terminus boxes and rasterize terminus boxes
    6. Download non-cloudy Landsat images from AWS
    7. Grab image acquisition dates from metadata files
    8. Delete the *QA_PIXEL.TIF files downloaded in step (4) to save space

# 1) Set-up: import packages, set paths, and enter glaciers IDs


In [2]:
import numpy as np
import pandas as pd
import scipy
import math
import subprocess
import os
import shutil
import datetime
import cv2
from PIL import Image
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import glob

# geospatial packages
import fiona
import geopandas as gpd
from shapely.geometry import Polygon, Point, LineString
import shapely
from matplotlib.pyplot import imshow
import rasterio as rio

# Enable fiona KML file reading driver
fiona.drvsupport.supported_drivers['KML'] = 'rw'

# import necessary functions from automated-glacier-terminus.py
from automated_terminus_functions import distance

###  Enter in the glacier BoxIDs:

The Greenland peripheral glacier terminus boxes were referenced using their 3 digit BoxID: Box###.
For other glaciers, replace this code with a list of IDs corresponding to the glaciers and corresponding shapefiles (e.g. BoxHelheim.shp). 

In [4]:
######################################################################################
# uncomment for NAMED BOXES
BoxIDs = ['Variegated']

#uncomment for NUMBERED BOXES
# boxes = list(map(str, np.arange(1, 101, 1))) #12, 13, 1
# for BoxID in boxes: # convert integers to 3-digit strings with leading zeros
#     BoxID = BoxID.zfill(3)
#     BoxIDs.append(BoxID)


print(BoxIDs) # show the final BoxIDs
######################################################################################

['Variegated']


### Define paths, satellites, geographic projections:

In [23]:
######################################################################################
# ADJUST THESE VARIABLES: must create basepath & downloadpath directories beforehand
# basepath = '/home/jukes/Documents/Sample_glaciers/' # folder containing the all glacier shapefile(s)
# downloadpath ='/media/jukes/jukes1/LS8aws/' # folder to eventually contain downloaded Landsat images
basepath = '/home/jukes/Documents/Alaska-glaciers/' # folder containing the all glacier shapefile(s)
downloadpath ='/home/jukes/Documents/Alaska-glaciers/LS8aws/' # folder to eventually contain downloaded Landsat images

sats = ['L7','L8','L9'] # names of landsats to download images from ('L7' = Landsat 7, 'L8' = Landsat 8, 'L9' = Landsat 9)
L9_yrs = np.arange(2021,2023).astype(str) # set target years for L9: 2021-present
L8_yrs = np.arange(2013,2023).astype(str) # set target years for L8: 2013-present
L7_yrs = np.arange(1999,2004).astype(str) # set target years for L7: 1999-2003
L9_bands = [8] # panchromatic band for L9
L8_bands = [8] # panchromatic band for L8
L7_bands = [8] # panchromatic band for L7

repopath = '/home/jukes/automated-glacier-terminus/' # path to this repository
os.chdir(repopath) # change directories to this repo

# select coordinate reference system: MAKE SURE YOUR SHAPEFILE IS IN THE COORDINATE REFERENCE SYSTEM SPECIFIED HERE
source_srs = '32607' # EPSG code for the native projection of the glacier shapefile(s)
# (3413 = Greenland polar stereographic, 3031=Antarctic polar stereographic, 326## = UTM N. Hemi., 327## UTM S. Hemi.)

# enter a file suffix for the CSV files produced 
# csvext = 'peripheral-glaciers.csv' 
csvext = '_VG.csv' 
# that describes the analysis (e.g., glacier or group of glaciers)

# specify paths for shapefiles (same file for both if not focusing on terminus delineation)
# RGIpath = '/media/jukes/jukes1/RGI_shps/' # path to folder with all individual RGI glacier outline shapefiles
# boxespath = '/media/jukes/jukes1/Boxes_individual/' # folder with all individual glacier terminus box shapes
RGIpath = basepath+'ROIs/' # path to folder with all individual RGI glacier outline shapefiles
RGIfile = 'Variegated_Box_UTM07' #name of specific outline if only grabbing data for one site with a custom name
boxespath = basepath+'ROIs/' # folder with all individual glacier terminus box shapes
boxesfile = 'Variegated_Box_UTM07' #name of specific outline if only grabbing data for one site with a custom name

######################################################################################

In [6]:
# filenames that will be written in this script
# all with common extension
print("CSV files that will be produced:"); print()
PR_FILENAME = 'LS_pathrows'+csvext; print(PR_FILENAME) # glacier Landsat path, row, zone info
BOX_FILENAME = 'Buffdist'+csvext; print(BOX_FILENAME) # buffer distances around glacier terminus boxes
DATES_FILENAME = 'imgdates'+csvext; print(DATES_FILENAME) # acquisition dates for downloaded Landsat images

CSV files that will be produced:

LS_pathrows_VG.csv
Buffdist_VG.csv
imgdates_VG.csv


#### These CSV files will be used in later scripts in the workflow.

### Create new folders corresponding to these glaciers:

In [10]:
# create new BoxID folders 
for BoxID in BoxIDs:
    # create folder to hold glacier shapefiles
    shapefilepath = basepath+'Box'+BoxID+'/' # path to that folder
    if os.path.exists(shapefilepath):
#         shutil.rmtree(shapefilepath) # remove the old folder
        print("Path exists already for Box", BoxID)
    else:
        os.mkdir(basepath+'Box'+BoxID)
            
    # create folder to hold glacier images (inside downloadpath)
    if os.path.exists(downloadpath+'Box'+BoxID):
        print("Path exists already in LS8aws for Box", BoxID)
    else:
        os.mkdir(downloadpath+'Box'+BoxID)
    
    # Now place terminus box shapefile and RGI glacier outline shapefile into the
    # boxespath folder. Done automatically below for the Greenland peripheral glaciers:
    if len(BoxID) == 3:
        ID = int(BoxID) # make into an integer in order to grab the .shp files

        # if the terminus box shapefile is not in this folder, then move it
        if not os.path.exists(shapefilepath+'Box'+BoxID+'.shp'):
            for filename in os.listdir(boxespath):
                if filename.startswith('BoxID_'+str(ID)):
                    shutil.copyfile(boxespath+filename, basepath+'Box'+BoxID+'/Box'+BoxID+filename[-4:])
                    print("Box"+BoxID+filename[-4:], "moved")
        else:
            print("Box"+BoxID+'.shp', "already in folder")

        if not os.path.exists(shapefilepath+'RGI_Box'+BoxID+'.shp'): # if the RGI shapfile is not in this folder
            # move RGI glacier outline into the new folder
            for filename in os.listdir(RGIpath):
                if filename.startswith('BoxID_'+str(ID)):
                    shutil.copyfile(RGIpath+filename, basepath+'Box'+BoxID+'/RGI_Box'+BoxID+filename[-4:])
                    print("RGI_Box"+BoxID+filename[-4:], "moved")
        else:
            print("RGI_Box"+BoxID+'.shp', "already in folder")
    else: #if working with a named site, not a numbered site
        
        # if the terminus box shapefile is not in this folder, then move it
        if not os.path.exists(shapefilepath+'Box'+BoxID+'.shp'):
            for filename in os.listdir(boxespath):
                if filename.startswith('Box'+BoxID):
                    shutil.copyfile(boxespath+filename, basepath+'Box'+BoxID+'/Box'+BoxID+filename[-4:])
                    print("Box"+BoxID+filename[-4:], "moved")
        else:
            print("Box"+BoxID+'.shp', "already in folder")

        if not os.path.exists(shapefilepath+'RGI_Box'+BoxID+'.shp'): # if the RGI shapfile is not in this folder
            # move RGI glacier outline into the new folder
            for filename in os.listdir(RGIpath):
                if filename.startswith('Box'+BoxID):
                    shutil.copyfile(RGIpath+filename, basepath+'Box'+BoxID+'/RGI_Box'+BoxID+filename[-4:])
                    print("RGI_Box"+BoxID+filename[-4:], "moved")
        else:
            print("RGI_Box"+BoxID+'.shp', "already in folder")

Path exists already for Box Variegated



# 2) Find all the Landsat footprints that overlap the glaciers

This step requires the **WRS-2_bound_world_0.kml** file containing the footprints of all the Landsat scene boundaries available through the USGS (https://www.usgs.gov/media/files/landsat-wrs-2-scene-boundaries-kml-file). Place this file in your base directory (basepath). 

To check if they overlap the glacier terminus box shapefiles, the box shapefiles must be in WGS84 coordinates (ESPG: 4326). If they are not yet, we use the following GDAL command to reproject them into WGS84:

        ogr2ogr -f "ESRI Shapefile" -t_srs EPSG:NEW_EPSG_NUMBER -s_srs EPSG:OLD_EPSG_NUMBER out.shp in.shp

In [12]:
######################################################################################
# open the kml file with the Landsat path, row footprints:
WRS = fiona.open(basepath+'WRS-2_bound_world_0.kml', driver='KML') # check the path to the world bounds file
print('Landsat footprint file opened.')
######################################################################################

Landsat footprint file opened.


In [26]:
# Reproject terminus box shapefiles to WGS84 if in a different projection
for BoxID in BoxIDs:
    boxespath = basepath+"Box"+BoxID+"/Box"+BoxID # access the BoxID folders created 
    
    #NOT ELEGANT! set up to check if a single RGI outline is provided as a shapefile with a unique name (Not 'Box'+BoxID+'.shp')
    if os.path.exists(RGIpath+RGIfile+'.shp'): 
        # construct the gdal command for the individual RGI outline
        rp = "ogr2ogr -f 'ESRI Shapefile' -t_srs EPSG:4326 -s_srs EPSG:"+source_srs+" "
        rp +=boxespath+"_WGS.shp "+RGIpath+RGIfile+".shp"
        print("Command:", rp) # check command
    else:
        # construct the gdal command
        rp = "ogr2ogr -f 'ESRI Shapefile' -t_srs EPSG:4326 -s_srs EPSG:"+source_srs+" "
        rp +=boxespath+"_WGS.shp "+boxespath+".shp"
        print("Command:", rp) # check command
        
    # run the command on terminal
    subprocess.run(rp, shell=True, check=True) 
    
    # if an error is produced, check the error output on the terminal window that runs this notebook

Command: ogr2ogr -f 'ESRI Shapefile' -t_srs EPSG:4326 -s_srs EPSG:32607 /home/jukes/Documents/Alaska-glaciers/BoxVariegated/BoxVariegated_WGS.shp /home/jukes/Documents/Alaska-glaciers/ROIs/Variegated_Box_UTM07.shp


In [27]:
# Grab the WGS84 coordinates of the boxes
box_points = {} # dictionary of points
for BoxID in BoxIDs:
    boxpath = basepath+"Box"+BoxID+"/Box"+BoxID # path to the reprojected terminus box
    print(boxpath)
    termbox = fiona.open(boxpath+'_WGS.shp') # open reprojected terminus box
    box = termbox.next(); box_coords=box['geometry']['coordinates'][0] # grab coords
    points = [] # to hold the box vertices
    
    # read coordinates and convert to a shapely object
    for coord_pair in box_coords: 
        lat = coord_pair[0]; lon = coord_pair[1]   
        point = shapely.geometry.Point(lat, lon) # create shapely point 
        points.append(point) # append to points list
        
    box_points.update({BoxID: points}) # update dictionary
    print("Box"+BoxID+" coordinates recorded.") # keep track of progress

/home/jukes/Documents/Alaska-glaciers/BoxVariegated/BoxVariegated
BoxVariegated coordinates recorded.


  import sys


In [28]:
paths = []; rows = []; boxes = [] # create lists to hold the paths and rows and BoxIDs

#loop through all Landsat scenes (path, row footprints)
for feature in WRS:
    # create shapely polygons from the Landsat footprints
    coordinates = feature['geometry']['coordinates'][0]
    coords = [xy[0:2] for xy in coordinates]
    pathrow_poly = Polygon(coords)
    
    # grab the path and row name from the WRS kml file:
    pathrowname = feature['properties']['Name']  
    path = pathrowname.split('_')[0]; row = pathrowname.split('_')[1]
    
    # for each feature, loop through each of the vertices stored in the dictionary
    for BoxID in box_points:  
        box_points_in = 0 # counter for number of box_points in the pathrow_geom:
        points = box_points.get(BoxID) # grab the points corresponding to the ID
        for i in range(0, len(points)):
            point = points[i]
            if point.within(pathrow_poly): # if the pathrow shape contains the point
                box_points_in = box_points_in+1 # append the counter
        
        #set up to currently only focus on Landsat footprints that totally cover the entire polygon, 
        #but if you have a BIG polygon then you should change this to some fraction of len(points) (e.g. 0.5*len(points))
        if box_points_in >= len(points): # if all box vertices are inside the footprint, save the path, row, BoxID
            paths.append('%03d' % int(path))
            rows.append('%03d' % int(row))
            boxes.append(BoxID)

# Store in dataframe
boxes_pr_df = pd.DataFrame(list(zip(boxes, paths, rows)), columns=['BoxID','Path', 'Row'])
boxes_pr_df = boxes_pr_df.sort_values(by='BoxID')
boxes_pr_df # display

Unnamed: 0,BoxID,Path,Row
0,Variegated,62,18
1,Variegated,61,18


In [29]:
# save to file
boxes_pr_df.to_csv(path_or_buf = basepath+PR_FILENAME, sep=',') # write to csv

# 3) Download metadata files from AWS s3 for overlapping Landsat scenes
     
The syntax for listing the Collection 2 Landsat image files AWS s3 bucket is as follows:

    aws s3 ls --profile terminusmapping --request-payer requester s3://usgs-landsat/collection02/level-2/standard/oli-tirs/yyyy/path/row/LC08_LS2R_pathrow_yyyyMMdd_yyyyMMdd_02_T1/ 
    
__NOTE: Including the --request-payer requester as part of this line indicates that the referenced user will be charged for data download.__

We can use the paths and rows in the dataframe to access the full Landsat scene list and the corresponding metdata files. Read https://docs.opendata.aws/landsat-pds/readme.html to learn more.
    
The metadata files will be downloaded into folders corresponding to the Landsat footprint, identified by the Path Row numbers:
    
    aws s3api get-object --bucket usgs-landsat --key collection02/level-2/standard/oli-tirs/yyyy/path/row/LC08_L2SP_pathrow_yyyyMMdd_yyyyMMdd_02_T1/LC08_L2SP_pathrow_yyyyMMdd_yyyyMMdd_02_T1_MTL.txt  --request-payer requester LC08_L2SP_pathrow_yyyyMMdd_yyyyMMdd_02_T1_MTL.txt

In [30]:
# Read in csv file from Step 2
boxes_pr_df = pd.read_csv(basepath+PR_FILENAME, dtype=str)
boxes_pr_df = boxes_pr_df.set_index('BoxID'); boxes_pr_df

Unnamed: 0_level_0,Unnamed: 0,Path,Row
BoxID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Variegated,0,62,18
Variegated,1,61,18


In [33]:
# Loop through the dataframe containing overlapping path, row info:
listofbaddies = []
for index, row in boxes_pr_df.iterrows():
    p = row['Path']; r = row['Row']; folder_name = 'Path'+p+'_Row'+r+'_c2' # folder name
    bp_out = downloadpath+folder_name+'/' # output path for the downloaded files
    print("Downloaded metadata files are stored in:",bp_out)
    
    # create Path_Row folders if they don't exist already
    if os.path.exists(bp_out):
        print(folder_name, " exists already, skip directory creation")
    else:
        os.mkdir(bp_out)
        print(folder_name+" directory made")
    
    for sat in sats: # for each satellite
        if sat == 'L9':
            collectionfolder = 'oli-tirs/'; years = L9_yrs; prefix='LC09' # set folder, years, file prefix
        elif sat == 'L8':
            collectionfolder = 'oli-tirs/'; years = L8_yrs; prefix='LC08' # set folder, years, file prefix
        elif sat == 'L7':
            collectionfolder = 'etm/'; years = L7_yrs; prefix='LE07' # set folder, years, file prefix
        
        # loop through years
        for year in years:
        # grab list of images in each year, path, row folder
            find_imgs = 'aws s3 ls --profile terminusmapping --request-payer requester s3://usgs-landsat/'
            find_imgs += collectionpath+collectionfolder
            find_imgs += year+'/'+p+'/'+r+'/'
            
            # Grab AWS image names from returned results
            if subprocess.run(find_imgs,shell=True).returncode != 0:
                print('No results found for '+collectionpath+collectionfolder+year+'/'+p+'/'+r+'/')
                results = [] # empty results
            else:
                result = subprocess.check_output(find_imgs,shell=True) # grab the available images
                results = result.split() # split string

            imagenames = []
            for line in results: # loop through strings
                line = str(line)
                if prefix in line and ('T1' in line or 'T2' in line): # find just the Tier-1 images
                    imgname = line[2:-2]; imagenames.append(imgname)

            # download the metadata (MTL.txt) file if it doesn't exist
            for imgname in imagenames:
                print(imgname)
                if imgname != 'LC08_L1TP_037004_20160525_20200906_02_T1' and not os.path.exists(bp_out+imgname+'_MTL.txt'): # check in output directory
                    command = 'aws s3api get-object --bucket usgs-landsat --key '+collectionpath+collectionfolder
                    command += year+'/'+p+'/'+r+'/'
                    command += imgname+'/'+imgname+'_MTL.txt'
                    command += ' --profile terminusmapping --request-payer requester '
                    command += bp_out+imgname+'_MTL.txt'
                    print('Downloading', imgname+'_MTL.txt')
                    try:
                        subprocess.run(command,shell=True,check=True)
                    except subprocess.CalledProcessError as e:
                        print(e.output)
                        listofbaddies.append(imgname)
                        pass
                else:
                    print(imgname+'_MTL.txt exists. Skip.')

print('Done downloading metadata files!')
print(listofbaddies)

Downloaded metadata files are stored in: /home/jukes/Documents/Alaska-glaciers/LS8aws/Path062_Row018_c2/
Path062_Row018_c2  exists already, skip directory creation
LE07_L1GT_062018_19990721_20200918_02_T2
LE07_L1GT_062018_19990721_20200918_02_T2_MTL.txt exists. Skip.
LE07_L1GT_062018_19990822_20200918_02_T2
LE07_L1GT_062018_19990822_20200918_02_T2_MTL.txt exists. Skip.
LE07_L1GT_062018_19990923_20200918_02_T2
LE07_L1GT_062018_19990923_20200918_02_T2_MTL.txt exists. Skip.
LE07_L1GT_062018_19991025_20200918_02_T2
LE07_L1GT_062018_19991025_20200918_02_T2_MTL.txt exists. Skip.
LE07_L1GT_062018_19991228_20200918_02_T2
LE07_L1GT_062018_19991228_20200918_02_T2_MTL.txt exists. Skip.
LE07_L1TP_062018_19990705_20200918_02_T1
LE07_L1TP_062018_19990705_20200918_02_T1_MTL.txt exists. Skip.
LE07_L1TP_062018_19990806_20200918_02_T1
LE07_L1TP_062018_19990806_20200918_02_T1_MTL.txt exists. Skip.
LE07_L1TP_062018_19990907_20200918_02_T1
LE07_L1TP_062018_19990907_20200918_02_T1_MTL.txt exists. Skip.
LE07

LC08_L1GT_062018_20130804_20200912_02_T2
LC08_L1GT_062018_20130804_20200912_02_T2_MTL.txt exists. Skip.
LC08_L1GT_062018_20130921_20200912_02_T2
LC08_L1GT_062018_20130921_20200912_02_T2_MTL.txt exists. Skip.
LC08_L1GT_062018_20131023_20200912_02_T2
LC08_L1GT_062018_20131023_20200912_02_T2_MTL.txt exists. Skip.
LC08_L1GT_062018_20131210_20200912_02_T2
LC08_L1GT_062018_20131210_20200912_02_T2_MTL.txt exists. Skip.
LC08_L1GT_062018_20131226_20200912_02_T2
LC08_L1GT_062018_20131226_20200912_02_T2_MTL.txt exists. Skip.
LC08_L1TP_062018_20130414_20200912_02_T1
LC08_L1TP_062018_20130414_20200912_02_T1_MTL.txt exists. Skip.
LC08_L1TP_062018_20130617_20200912_02_T1
LC08_L1TP_062018_20130617_20200912_02_T1_MTL.txt exists. Skip.
LC08_L1TP_062018_20130703_20200912_02_T1
LC08_L1TP_062018_20130703_20200912_02_T1_MTL.txt exists. Skip.
LC08_L1TP_062018_20130719_20200912_02_T1
LC08_L1TP_062018_20130719_20200912_02_T1_MTL.txt exists. Skip.
LC08_L1TP_062018_20130820_20200913_02_T1
LC08_L1TP_062018_201308

LC08_L1GT_062018_20170119_20200905_02_T2
LC08_L1GT_062018_20170119_20200905_02_T2_MTL.txt exists. Skip.
LC08_L1GT_062018_20170612_20200903_02_T2
LC08_L1GT_062018_20170612_20200903_02_T2_MTL.txt exists. Skip.
LC08_L1GT_062018_20170831_20200903_02_T2
LC08_L1GT_062018_20170831_20200903_02_T2_MTL.txt exists. Skip.
LC08_L1GT_062018_20171018_20200902_02_T2
LC08_L1GT_062018_20171018_20200902_02_T2_MTL.txt exists. Skip.
LC08_L1GT_062018_20171205_20200902_02_T2
LC08_L1GT_062018_20171205_20200902_02_T2_MTL.txt exists. Skip.
LC08_L1TP_062018_20170103_20200905_02_T2
LC08_L1TP_062018_20170103_20200905_02_T2_MTL.txt exists. Skip.
LC08_L1TP_062018_20170204_20200905_02_T2
LC08_L1TP_062018_20170204_20200905_02_T2_MTL.txt exists. Skip.
LC08_L1TP_062018_20170220_20200905_02_T1
LC08_L1TP_062018_20170220_20200905_02_T1_MTL.txt exists. Skip.
LC08_L1TP_062018_20170308_20200904_02_T1
LC08_L1TP_062018_20170308_20200904_02_T1_MTL.txt exists. Skip.
LC08_L1TP_062018_20170324_20200904_02_T1
LC08_L1TP_062018_201703

LC08_L1GT_062018_20210114_20210307_02_T2
LC08_L1GT_062018_20210114_20210307_02_T2_MTL.txt exists. Skip.
LC08_L1GT_062018_20210506_20210517_02_T2
LC08_L1GT_062018_20210506_20210517_02_T2_MTL.txt exists. Skip.
LC08_L1GT_062018_20210826_20210901_02_T2
LC08_L1GT_062018_20210826_20210901_02_T2_MTL.txt exists. Skip.
LC08_L1GT_062018_20211130_20211208_02_T2
LC08_L1GT_062018_20211130_20211208_02_T2_MTL.txt exists. Skip.
LC08_L1GT_062018_20211216_20211223_02_T2
LC08_L1GT_062018_20211216_20211223_02_T2_MTL.txt exists. Skip.
LC08_L1TP_062018_20210130_20210303_02_T2
LC08_L1TP_062018_20210130_20210303_02_T2_MTL.txt exists. Skip.
LC08_L1TP_062018_20210215_20210301_02_T2
LC08_L1TP_062018_20210215_20210301_02_T2_MTL.txt exists. Skip.
LC08_L1TP_062018_20210303_20210312_02_T1
LC08_L1TP_062018_20210303_20210312_02_T1_MTL.txt exists. Skip.
LC08_L1TP_062018_20210319_20210328_02_T1
LC08_L1TP_062018_20210319_20210328_02_T1_MTL.txt exists. Skip.
LC08_L1TP_062018_20210404_20210409_02_T1
LC08_L1TP_062018_202104

LE07_L1GT_061018_20001121_20200918_02_T2
LE07_L1GT_061018_20001121_20200918_02_T2_MTL.txt exists. Skip.
LE07_L1GT_061018_20001207_20200917_02_T2
LE07_L1GT_061018_20001207_20200917_02_T2_MTL.txt exists. Skip.
LE07_L1TP_061018_20000106_20200918_02_T1
LE07_L1TP_061018_20000106_20200918_02_T1_MTL.txt exists. Skip.
LE07_L1TP_061018_20000122_20200918_02_T1
LE07_L1TP_061018_20000122_20200918_02_T1_MTL.txt exists. Skip.
LE07_L1TP_061018_20000223_20200918_02_T1
LE07_L1TP_061018_20000223_20200918_02_T1_MTL.txt exists. Skip.
LE07_L1TP_061018_20000310_20200918_02_T1
LE07_L1TP_061018_20000310_20200918_02_T1_MTL.txt exists. Skip.
LE07_L1TP_061018_20000326_20200918_02_T1
LE07_L1TP_061018_20000326_20200918_02_T1_MTL.txt exists. Skip.
LE07_L1TP_061018_20000411_20200918_02_T1
LE07_L1TP_061018_20000411_20200918_02_T1_MTL.txt exists. Skip.
LE07_L1TP_061018_20000427_20200918_02_T1
LE07_L1TP_061018_20000427_20200918_02_T1_MTL.txt exists. Skip.
LE07_L1TP_061018_20000513_20200918_02_T1
LE07_L1TP_061018_200005

LC08_L1GT_061018_20140104_20200912_02_T2
LC08_L1GT_061018_20140104_20200912_02_T2_MTL.txt exists. Skip.
LC08_L1GT_061018_20140613_20200911_02_T2
LC08_L1GT_061018_20140613_20200911_02_T2_MTL.txt exists. Skip.
LC08_L1GT_061018_20140816_20200911_02_T2
LC08_L1GT_061018_20140816_20200911_02_T2_MTL.txt exists. Skip.
LC08_L1GT_061018_20141003_20200910_02_T2
LC08_L1GT_061018_20141003_20200910_02_T2_MTL.txt exists. Skip.
LC08_L1TP_061018_20140120_20200912_02_T1
LC08_L1TP_061018_20140120_20200912_02_T1_MTL.txt exists. Skip.
LC08_L1TP_061018_20140205_20200912_02_T1
LC08_L1TP_061018_20140205_20200912_02_T1_MTL.txt exists. Skip.
LC08_L1TP_061018_20140221_20200911_02_T1
LC08_L1TP_061018_20140221_20200911_02_T1_MTL.txt exists. Skip.
LC08_L1TP_061018_20140309_20200911_02_T1
LC08_L1TP_061018_20140309_20200911_02_T1_MTL.txt exists. Skip.
LC08_L1TP_061018_20140325_20200911_02_T1
LC08_L1TP_061018_20140325_20200911_02_T1_MTL.txt exists. Skip.
LC08_L1TP_061018_20140410_20200911_02_T1
LC08_L1TP_061018_201404

LC08_L1GT_061018_20180115_20200902_02_T2
LC08_L1GT_061018_20180115_20200902_02_T2_MTL.txt exists. Skip.
LC08_L1GT_061018_20180320_20200901_02_T2
LC08_L1GT_061018_20180320_20200901_02_T2_MTL.txt exists. Skip.
LC08_L1GT_061018_20181014_20200830_02_T2
LC08_L1GT_061018_20181014_20200830_02_T2_MTL.txt exists. Skip.
LC08_L1TP_061018_20180131_20200902_02_T1
LC08_L1TP_061018_20180131_20200902_02_T1_MTL.txt exists. Skip.
LC08_L1TP_061018_20180216_20200902_02_T1
LC08_L1TP_061018_20180216_20200902_02_T1_MTL.txt exists. Skip.
LC08_L1TP_061018_20180304_20200902_02_T1
LC08_L1TP_061018_20180304_20200902_02_T1_MTL.txt exists. Skip.
LC08_L1TP_061018_20180405_20200901_02_T1
LC08_L1TP_061018_20180405_20200901_02_T1_MTL.txt exists. Skip.
LC08_L1TP_061018_20180421_20200901_02_T1
LC08_L1TP_061018_20180421_20200901_02_T1_MTL.txt exists. Skip.
LC08_L1TP_061018_20180507_20200901_02_T1
LC08_L1TP_061018_20180507_20200901_02_T1_MTL.txt exists. Skip.
LC08_L1TP_061018_20180523_20200901_02_T1
LC08_L1TP_061018_201805

LC08_L1GT_061018_20220110_20220122_02_T2
LC08_L1GT_061018_20220110_20220122_02_T2_MTL.txt exists. Skip.
LC08_L1GT_061018_20220211_20220222_02_T2
LC08_L1GT_061018_20220211_20220222_02_T2_MTL.txt exists. Skip.
LC08_L1GT_061018_20220315_20220322_02_T2
LC08_L1GT_061018_20220315_20220322_02_T2_MTL.txt exists. Skip.
LC08_L1GT_061018_20220806_20220818_02_T2
LC08_L1GT_061018_20220806_20220818_02_T2_MTL.txt exists. Skip.
LC08_L1GT_061018_20221110_20221121_02_T2
LC08_L1GT_061018_20221110_20221121_02_T2_MTL.txt exists. Skip.
LC08_L1GT_061018_20221212_20230113_02_T2
LC08_L1GT_061018_20221212_20230113_02_T2_MTL.txt exists. Skip.
LC08_L1TP_061018_20220126_20220205_02_T1
LC08_L1TP_061018_20220126_20220205_02_T1_MTL.txt exists. Skip.
LC08_L1TP_061018_20220227_20220309_02_T1
LC08_L1TP_061018_20220227_20220309_02_T1_MTL.txt exists. Skip.
LC08_L1TP_061018_20220331_20220406_02_T1
LC08_L1TP_061018_20220331_20220406_02_T1_MTL.txt exists. Skip.
LC08_L1TP_061018_20220416_20220420_02_T1
LC08_L1TP_061018_202204

In [34]:
imagenames

['LC09_L1GT_061018_20220203_20230429_02_T2',
 'LC09_L1GT_061018_20221102_20230323_02_T2',
 'LC09_L1TP_061018_20220118_20230501_02_T2',
 'LC09_L1TP_061018_20220219_20230427_02_T1',
 'LC09_L1TP_061018_20220307_20230425_02_T1',
 'LC09_L1TP_061018_20220323_20230424_02_T1',
 'LC09_L1TP_061018_20220408_20230422_02_T1',
 'LC09_L1TP_061018_20220424_20230419_02_T1',
 'LC09_L1TP_061018_20220510_20230417_02_T1',
 'LC09_L1TP_061018_20220526_20230415_02_T1',
 'LC09_L1TP_061018_20220611_20230413_02_T1',
 'LC09_L1TP_061018_20220627_20230409_02_T1',
 'LC09_L1TP_061018_20220713_20230407_02_T1',
 'LC09_L1TP_061018_20220729_20230405_02_T1',
 'LC09_L1TP_061018_20220814_20230402_02_T1',
 'LC09_L1TP_061018_20220830_20230331_02_T1',
 'LC09_L1TP_061018_20220915_20230329_02_T1',
 'LC09_L1TP_061018_20221001_20230327_02_T1',
 'LC09_L1TP_061018_20221017_20230325_02_T1',
 'LC09_L1TP_061018_20221118_20230321_02_T1']

# 4) Download QAPIXEL band over terminus box to determine cloud cover

If the terminus box shapefiles were not originally in UTM projection, will need to reproject them into UTM to match the Landsat projection. The code automatically finds the UTM zones from the metadata files and fills in the following syntax to reproject:
    
    ogr2ogr -f "ESRI Shapefile" -t_srs EPSG:326zone output.shp input.shp
    
#### If the terminus box shapefiles are already in UTM projection, skip the following cell and rename the files to end with "\_UTM\_##.shp" where ## corresponds to the zone number (e.g., "\_UTM\_07.shp", "\_UTM\_21.shp").

In [35]:
zones = {} # initialize dictionary to hold UTM zone for each Landsat scene path row
zone_list = [] # list of zones

# Loop through all scenes:
for index, row in boxes_pr_df.iterrows():
    BoxID = str(index)
    p = row['Path']; r = row['Row']; folder_name = 'Path'+p+'_Row'+r+'_c2' # Landsat path and row
    pr_folderpath = downloadpath+folder_name+'/' # path to the downloaded metadata files
    pathtoshp = basepath+"Box"+BoxID+"/Box"+BoxID # path to the terminus box shapefiles (all projections)
#     print(pr_folderpath)
#     print(pathtoshp)
    
    if len(os.listdir(pr_folderpath)) > 0: # if there are files in the folder
        # grab UTM Zone from the first metadata file
        mtl_scene = glob.glob(pr_folderpath+'*_MTL.txt')[0]
        mtl = open(mtl_scene, 'r')
        
        #identify the appropriate projection for the files
        if source_srs == '3031':
            zones.update({folder_name: source_srs}); zone_list.append(source_srs) # add to zone lists
            Lproj = '_PS'+source_srs
            
            # reproject shapefile(s) into Antarctic PS
            rp_shp = 'ogr2ogr -f "ESRI Shapefile" '+pathtoshp+Lproj+'.shp '+pathtoshp+'_WGS.shp'
            rp_shp += ' -t_srs EPSG:'+source_srs
        else:
            # loop through lines in the metadata file to find the UTM ZONE
            for line in mtl:  
                variable = line.split("=")[0] # grab the variable name
                if ("UTM_ZONE" in variable):
                    zone = '%02d' % int(line.split("=")[1][1:-1]) # grab the 2-digit zone number
                    zones.update({folder_name: zone}); zone_list.append(zone) # add to zone lists
                    Lproj = '_UTM_'+zone
                    break
        
            # reproject shapefile(s) into UTM
            zone = zones[folder_name]
            rp_shp = 'ogr2ogr -f "ESRI Shapefile" '+pathtoshp+Lproj+'.shp '+pathtoshp+'_WGS.shp'
            rp_shp += ' -t_srs EPSG:326'+zone
        
        # reproject Box coordinates from WGS to Landsat projection
        subprocess.run(rp_shp, shell=True,check=True)
        
    else: # if no files in folder, zone = nan, must fill in manually
        zone_list.append(np.nan)
        
boxes_pr_df['Zone'] = zone_list # add to the path row dataframe
boxes_pr_df.head()

Unnamed: 0_level_0,Unnamed: 0,Path,Row,Zone
BoxID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Variegated,0,62,18,7
Variegated,1,61,18,7


In [36]:
# overwrite path row csv file with UTM zone information, see above for variable PR_FILENAME
boxes_pr_df.to_csv(path_or_buf = basepath+PR_FILENAME, sep=',')

Use GDAL and __vsi3__ link to download subset of the quality band we will use to determine cloud cover over the terminus:

    gdalwarp -cutline path_to_shp.shp -crop_to_cutline /vsi3/usgs-landsat/collection02/level-1/standard/oli-tirs/yyyy/path/row/scene/scene_QA_PIXEL.TIF path_to_subset_QA_PIXEL.TIF


In [37]:
# Loop through all scenes:
for index, row in boxes_pr_df.iterrows():
    p = row['Path']; r = row['Row']; zone = row['Zone'] # grab path, row, zone
    BoxID = str(index)
    folder_name = 'Path'+p+'_Row'+r+'_c2'
    pr_folderpath = downloadpath+folder_name+'/' # path to the downloaded metadata files
    pathtoshp = basepath+"Box"+BoxID+"/Box"+BoxID # path to the terminus box shapefiles (all projections)
    if str(zone) == '3031':
        pathtoshp_rp = pathtoshp+'_PS'+str(zone) # path to the PS projected box shapefile
    else:
        pathtoshp_rp = pathtoshp+'_UTM_'+str(zone) # path to the UTM projected box shapefile

    files = os.listdir(pr_folderpath) # grab the names of the Landsat scenes
    
    # for all files in the path row folders
    for file in files:
        scene = file[:40] # slice the filename to grab the scene name

        if scene.startswith('L') and ('T1' in scene or 'T2' in scene): # L1TP scenes
            scene_year = scene[17:21] # grab the year from the scene name
            
            if scene.startswith('LC09'):
                collectionfolder='oli-tirs/'
            elif scene.startswith('LC08'):
                collectionfolder='oli-tirs/'
            elif scene.startswith('LE07'):
                collectionfolder='etm/'
                
            # set path to the QA pixel Landsat files
            pathtoQAPIXEL='/vsis3/usgs-landsat/'+collectionpath+collectionfolder
            pathtoQAPIXEL+=scene_year+'/'
            pathtoQAPIXEL+=p+'/'+r+'/'
            pathtoQAPIXEL+=scene+'/'+scene+"_QA_PIXEL.TIF"
            
            # set path to the subset QA pixel files inside the path row folders
            subsetout = pr_folderpath+scene+'_QA_PIXEL_Box'+BoxID+'.TIF' 
            
            # if the file hasn't already been downloaded
            if not os.path.exists(subsetout):
                print('Downloading', scene)
                # construct download command
                QAPIXEL_dwnld_cmd='gdalwarp -overwrite -cutline '+pathtoshp_rp+'.shp -crop_to_cutline '
                QAPIXEL_dwnld_cmd+= pathtoQAPIXEL+' '+subsetout
                QAPIXEL_dwnld_cmd+=' --config AWS_REQUEST_PAYER requester --config AWS_REGION us-west-2'
                QAPIXEL_dwnld_cmd+=' --config AWS_SECRET_ACCESS_KEY '+SECRET_KEY
                QAPIXEL_dwnld_cmd+=' --config AWS_ACCESS_KEY_ID '+ACCESS_KEY

                try:
                    subprocess.run(QAPIXEL_dwnld_cmd, shell=True, check=True)
                except subprocess.CalledProcessError as e:
                    print(e.output)
                    pass

Downloading LE07_L1TP_062018_20030326_20200915_02_T1
Downloading LC09_L1GT_062018_20221109_20230322_02_T2
Downloading LC08_L1TP_062018_20190226_20200829_02_T1
Downloading LE07_L1GT_062018_20000317_20200918_02_T2
Downloading LC08_L1TP_062018_20151013_20200908_02_T1
Downloading LC08_L1TP_062018_20210130_20210303_02_T2
Downloading LC08_L1GT_062018_20130804_20200912_02_T2
Downloading LC08_L1TP_062018_20180701_20200831_02_T1
Downloading LC08_L1TP_062018_20211013_20211019_02_T1
Downloading LE07_L1TP_062018_19990806_20200918_02_T1
Downloading LC08_L1GT_062018_20200722_20200911_02_T2
Downloading LC08_L1TP_062018_20190415_20200829_02_T1
Downloading LC08_L1TP_062018_20220626_20220706_02_T1
Downloading LC08_L1TP_062018_20180802_20200831_02_T1
Downloading LC08_L1TP_062018_20151216_20200908_02_T2
Downloading LC09_L1TP_062018_20220821_20230401_02_T1
Downloading LC08_L1TP_062018_20170324_20200904_02_T1
Downloading LC09_L1TP_062018_20220805_20230404_02_T1
Downloading LC08_L1TP_062018_20150506_20200909

Downloading LC08_L1TP_062018_20160202_20200907_02_T2
Downloading LC08_L1GT_062018_20181021_20200830_02_T2
Downloading LC09_L1TP_062018_20220210_20230428_02_T1
Downloading LE07_L1TP_062018_20020408_20200916_02_T1
Downloading LE07_L1TP_062018_20020713_20200916_02_T1
Downloading LC08_L1TP_062018_20160117_20200907_02_T2
Downloading LE07_L1TP_062018_20010608_20200917_02_T1
Downloading LC09_L1TP_062018_20211208_20230505_02_T2
Downloading LE07_L1GT_062018_19990923_20200918_02_T2
Downloading LE07_L1TP_062018_20000520_20200918_02_T1
Downloading LE07_L1TP_062018_20031020_20200915_02_T1
Downloading LC08_L1TP_062018_20130905_20200913_02_T1
Downloading LC08_L1GT_062018_20150420_20200909_02_T2
Downloading LE07_L1TP_062018_20030529_20200915_02_T1
Downloading LC08_L1TP_062018_20200924_20201007_02_T1
Downloading LC08_L1TP_062018_20221219_20221227_02_T2
Downloading LC08_L1TP_062018_20220306_20220314_02_T1
Downloading LC08_L1GT_062018_20161218_20200905_02_T2
Downloading LC08_L1TP_062018_20140924_20200910

Downloading LE07_L1TP_062018_20000909_20200917_02_T1
Downloading LE07_L1TP_062018_20030801_20200915_02_T1
Downloading LC08_L1GT_062018_20210826_20210901_02_T2
Downloading LC08_L1TP_062018_20191125_20200825_02_T1
Downloading LC08_L1TP_062018_20160305_20200907_02_T1
Downloading LC08_L1GT_062018_20210506_20210517_02_T2
Downloading LC09_L1GT_062018_20220125_20230430_02_T2
Downloading LC08_L1TP_062018_20140722_20200911_02_T1
Downloading LC08_L1TP_062018_20150215_20200909_02_T1
Downloading LC09_L1TP_062018_20220602_20230414_02_T1
Downloading LC08_L1TP_062018_20140417_20200911_02_T1
Downloading LC08_L1TP_062018_20140228_20200911_02_T1
Downloading LC08_L1GT_062018_20180311_20200901_02_T2
Downloading LC08_L1TP_061018_20160618_20200906_02_T1
Downloading LE07_L1TP_061018_20021026_20200916_02_T1
Downloading LE07_L1TP_061018_20000411_20200918_02_T1
Downloading LE07_L1TP_061018_20010329_20200917_02_T1
Downloading LC08_L1TP_061018_20180827_20200831_02_T1
Downloading LE07_L1TP_061018_20001105_20200918

Downloading LC08_L1TP_061018_20220907_20220914_02_T1
Downloading LC08_L1TP_061018_20190526_20200828_02_T1
Downloading LC08_L1TP_061018_20160415_20200907_02_T1
Downloading LC08_L1TP_061018_20130712_20200912_02_T1
Downloading LC08_L1TP_061018_20131117_20200912_02_T1
Downloading LC08_L1TP_061018_20220126_20220205_02_T1
Downloading LE07_L1TP_061018_20020807_20200916_02_T1
Downloading LC08_L1TP_061018_20160227_20200907_02_T1
Downloading LC08_L1TP_061018_20150515_20200909_02_T1
Downloading LC08_L1TP_061018_20221009_20221013_02_T1
Downloading LE07_L1TP_061018_20030319_20200916_02_T1
Downloading LC08_L1TP_061018_20170925_20200903_02_T1
Downloading LC08_L1TP_061018_20190611_20200828_02_T1
Downloading LC08_L1TP_061018_20150819_20200908_02_T1
Downloading LC08_L1TP_061018_20160314_20200907_02_T1
Downloading LC08_L1TP_061018_20170504_20200904_02_T1
Downloading LC08_L1GT_061018_20150123_20200910_02_T2
Downloading LE07_L1TP_061018_20020127_20200917_02_T1
Downloading LC08_L1TP_061018_20200917_20201005

Downloading LC08_L1GT_061018_20220315_20220322_02_T2
Downloading LC08_L1TP_061018_20180405_20200901_02_T1
Downloading LC08_L1TP_061018_20220416_20220420_02_T1
Downloading LC09_L1TP_061018_20221118_20230321_02_T1
Downloading LE07_L1TP_061018_19991103_20200918_02_T1
Downloading LC08_L1TP_061018_20200901_20200906_02_T1
Downloading LC08_L1TP_061018_20150224_20200909_02_T1
Downloading LC08_L1TP_061018_20140901_20200911_02_T1
Downloading LC08_L1GT_061018_20170520_20200904_02_T2
Downloading LC08_L1GT_061018_20210920_20210925_02_T2
Downloading LE07_L1TP_061018_20030215_20200916_02_T1
Downloading LC08_L1TP_061018_20191102_20200825_02_T1
Downloading LC08_L1GT_061018_20160720_20200906_02_T2
Downloading LC08_L1TP_061018_20140528_20200911_02_T1
Downloading LC08_L1TP_061018_20190424_20200828_02_T1
Downloading LC08_L1TP_061018_20210718_20210729_02_T1
Downloading LE07_L1GT_061018_20030927_20200916_02_T2
Downloading LC08_L1TP_061018_20221126_20221206_02_T1
Downloading LE07_L1TP_061018_20000716_20200917

# 5) Create buffer around terminus boxes

We download subsets of the Landsat scenes surrounding the terminus box. The subset is defined by a buffer width corresponding to the maximum dimension of the image. 

First, we grab the buffer distance, i.e., the maximum dimension of the image (in meters).

In [38]:
buffers = []
# Calculate a buffer distance around the terminus box:
for BoxID in BoxIDs:
    for file in os.listdir(basepath+'Box'+BoxID+'/'):
        if ('UTM' in file or 'PS' in file) and '.shp' in file and "Box" in file: # identify reprojected box
            boxpath = basepath+"Box"+BoxID+"/"+file  
            termbox = fiona.open(boxpath)
            
    # grab the box coordinates:
    box = termbox.next(); box_geom= box.get('geometry'); box_coords = box_geom.get('coordinates')[0]
    points = []
    for coord_pair in box_coords:
        lat = coord_pair[0]; lon = coord_pair[1]; points.append([lat, lon])
    
    # if the shapefile is a rectangle, use its dimensions to create a buffer
    if len(points) == 5:
        # Calculate distance between coord 1 and 2 and between 2 and 3
        coord1 = points[0]; coord2 = points[1]; coord3 = points[2]   
        dist1 = distance(coord1[0], coord1[1], coord2[0], coord2[1]);
        dist2 = distance(coord2[0], coord2[1], coord3[0], coord3[1]) 
        buff_dist = int(np.max([dist1, dist2])) # pick the longer one as the buffer distance
    else: #use the bounding box to define the buffer
        # get the coordinates for the bounding box
        box_bbox = termbox.bounds
        print(box_bbox)
            
        # Calculate distance between coord 1 and 2 and between 2 and 3 
        dist1 = distance(box_bbox[0], box_bbox[1], box_bbox[0], box_bbox[3]);
        dist2 = distance(box_bbox[0], box_bbox[1], box_bbox[2], box_bbox[1]) 
        buff_dist = int(np.max([dist1, dist2])) # pick the longer one as the buffer distance
        print(buff_dist)
    
#     # prescribe a maximum buffer distance for HUGE boxes
#     if buff_dist > 5000:
#         buff_dist = 5000
    buffers.append(buff_dist)

# store as dataframe:
buff_df = pd.DataFrame(list(zip(BoxIDs, buffers)), columns=['BoxID', 'Buff_dist_m'])
buff_df

  # Remove the CWD from sys.path while we load stuff.


Unnamed: 0,BoxID,Buff_dist_m
0,Variegated,14533


In [39]:
# write to csv
buff_df.to_csv(basepath+BOX_FILENAME)

Then, we create the buffer zone shapefile and reproject it to UTM (or Antarctic PS) using GDAL. To create the buffer zone shapefile, we use the GDAL command **ogr2ogr** with the following syntax:

    ogr2ogr Buffer###.shp path_to_terminusbox###.shp  -dialect sqlite -sql "SELECT ST_Buffer(geometry, buffer_distance) AS geometry,*FROM 'Box###'" -f "ESRI Shapefile"

We then use **ogr2ogr** to reproject the the buffer shapefiles to each UTM zone covering the glacier.

In [45]:
#UPDATE TO ACCOUNT FOR POTENTIALLY DIFFERENT BOX LOCATION AND NAME FOR NAMED SITES


# loop through the buffer distance dataframe:
for index, row in buff_df.iterrows():
    BoxID = row['BoxID']
    zones = boxes_pr_df.loc[BoxID, 'Zone'] # grab zone matching BoxID from other dataframe
    buff_dist = str(row['Buff_dist_m'])
    
    # Reprojection needs to happen for each apth-row pair to acount for (potentially) different projections
    for zone in zones:
        # paths
        terminusbox_path = basepath+"Box"+BoxID+"/Box"+BoxID+"_UTM_"+zone+".shp" # path to box shapefile
        outputbuffer_path = basepath+"Box"+BoxID+"/Buffer"+BoxID+".shp" # path and name of new buffer file

        # Set buffer creation command
        buffer_cmd = 'ogr2ogr '+outputbuffer_path+" "+terminusbox_path
        buffer_cmd +=' -dialect sqlite -sql "SELECT ST_Buffer(geometry, '+buff_dist+") AS geometry,*FROM 'Box"
        buffer_cmd +=BoxID+"_UTM_"+zone+"'"+'" -f "ESRI Shapefile"'
        print("Command:", buffer_cmd)
        subprocess.run(buffer_cmd, shell=True, check=True) # run on terminal
        
        if zone == '3031':
            rp_shp = 'ogr2ogr -f "ESRI Shapefile" -t_srs EPSG:'+source_srs+' -s_srs EPSG:'+source_srs+' '
            rp_shp += outputbuffer_path[:-4]+"_PS"+source_srs+".shp "+outputbuffer_path[:-4]+'.shp'
            print(rp_shp)
        else: 
            rp_shp = 'ogr2ogr -f "ESRI Shapefile" -t_srs EPSG:326'+str(zone)+' -s_srs EPSG:'+source_srs+' '
            rp_shp += outputbuffer_path[:-4]+"_UTM_"+str(zone)+".shp "+outputbuffer_path[:-4]+'.shp'
        #subprocess.run(rp_shp, shell=True, check=True) # reproject
    try:
        subprocess.run(rp_shp, shell=True, check=True) # reproject
    except subprocess.CalledProcessError as e:
        print(e.output)
        pass

Command: ogr2ogr /home/jukes/Documents/Alaska-glaciers/BoxVariegated/BufferVariegated.shp /home/jukes/Documents/Alaska-glaciers/BoxVariegated/BoxVariegated_UTM_07.shp -dialect sqlite -sql "SELECT ST_Buffer(geometry, 14533) AS geometry,*FROM 'BoxVariegated_UTM_07'" -f "ESRI Shapefile"
Command: ogr2ogr /home/jukes/Documents/Alaska-glaciers/BoxVariegated/BufferVariegated.shp /home/jukes/Documents/Alaska-glaciers/BoxVariegated/BoxVariegated_UTM_07.shp -dialect sqlite -sql "SELECT ST_Buffer(geometry, 14533) AS geometry,*FROM 'BoxVariegated_UTM_07'" -f "ESRI Shapefile"


# 6) Download non-cloudy Landsat images from AWS

To remove cloudy images, we will find the number of pixels in our terminus box that exceed a threshold value in the QA_PIXEL band corresponding to cloud and cloud shadow likelihood. If the fraction of cloudy pixels with values is above the threshold, we won't download the image. See Landsat Collection 2 Level 2 Science Product Guide for [Landsat 8](https://d9-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/media/files/LSDS-1619_Landsat-8-9-C2-L2-ScienceProductGuide-v4.pdf) and [Landsat 7](https://d9-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/atoms/files/LSDS-1618_Landsat-4-7_C2-L2-ScienceProductGuide-v3.pdf) more information on how the QA_PIXEL threshold values are chosen.

Additionally, we remove images that are primarily black (fill value of 1 in QA_PIXEL band). This ensures that the scenes that cut off halfway across the glacier are not included in further analysis. The fill percent threshold may need to be adjusted.

In [46]:
######################################################################################
# These are the recommended values based on the Collection 2 Level 2 Science Product Guide.
# Adjust thresholds here:
QAPIXEL_thresh_lower_L7 = 5696.0 # minimum QA pixel value threshold to be considered cloud for L7 images
QAPIXEL_thresh_upper_L7 = 7568.0 # maximum QA pixel value threshold to be considered cloud for L7 images

# Landsat 8 requires two lower thresholds, we select between 22280 and 24472 and above 54596
QAPIXEL_thresh_lower_L8 = 22080.0 # minimum QA pixel value threshold to be considered cloud for L8 images
QAPIXEL_thresh_upper_L8 = 30048.0 # maximum QA pixel value threshold to be considered cloud for L8 images
QAPIXEL_thresh2_lower_L8 = 54596.0 # 2nd minimum QA pixel value threshold to be considered cloud for L8 images

cpercent_thresh = 20.0 # maximum cloud cover % in terminus box
fpercent_thresh = 60.0 # maximum fill % in terminus box
######################################################################################

In [47]:
# Download images that pass these thresholds:
for index, row in boxes_pr_df.iterrows():
    # grab paths
    p = row['Path']; zone = row['Zone']; r = row['Row']; BoxID = index; 
    folder_name = 'Path'+p+'_Row'+r+'_c2'
    pr_folderpath = downloadpath+folder_name+'/'
    bp_out = downloadpath+'Box'+BoxID+'/' # folder name for downloaded images
    if os.path.exists(bp_out): # create folder if it does not exist
        print("Box"+BoxID, " exists already. Skip creation of directory.")
    else:
        os.mkdir(bp_out)
        print("Box"+BoxID+" directory made.")
    
    # path to the shapefile covering the region that will be downloaded
    if zone == '3031':
        pathtobuffer = basepath+'Box'+BoxID+'/Buffer'+BoxID+'_PS'+source_srs+'.shp'
    else:
        pathtobuffer = basepath+'Box'+BoxID+'/Buffer'+BoxID+'_UTM_'+str(zone)+'.shp'  # buffer around box - recommended
#     pathtobuffer = basepath+'Box'+BoxID+'/Box'+BoxID+'_UTM_'+zone+'.shp' # just the box
    
    for scene in os.listdir(pr_folderpath):
        if scene.startswith('L') and scene.endswith(".TIF") and ('T1' in scene or 'T2' in scene): # For Tier-1 images
            scene = scene[:40] # scene name
            year = scene[17:21] # grab acquisition year
            
            QApixelpath = pr_folderpath+scene+'_QA_PIXEL_Box'+BoxID+'.TIF' # path to QA_PIXEL file
            subsetQApixel = mpimg.imread(QApixelpath) # read in QAPIXEL file as numpy array
            totalpixels = subsetQApixel.shape[0]*subsetQApixel.shape[1] # count total number of pixels
            
            if scene.startswith("LC09"): # Landsat 9 (assuming same cloud thresholds at Landsat 8)
                collectionfolder = 'oli-tirs/'; bands = L9_bands; 
                # countcloudy pixels based on thresholds:
                cloudQApixel = subsetQApixel[((subsetQApixel >= QAPIXEL_thresh_lower_L8) &
                                             (subsetQApixel < QAPIXEL_thresh_upper_L8) | 
                                             (subsetQApixel >= QAPIXEL_thresh2_lower_L8))]
            elif scene.startswith("LC08"): # Landsat 8
                collectionfolder = 'oli-tirs/'; bands = L8_bands; 
                # countcloudy pixels based on thresholds:
                cloudQApixel = subsetQApixel[((subsetQApixel >= QAPIXEL_thresh_lower_L8) &
                                             (subsetQApixel < QAPIXEL_thresh_upper_L8) | 
                                             (subsetQApixel >= QAPIXEL_thresh2_lower_L8))]
            elif scene.startswith("LE07"): # Landsat 7
                collectionfolder = 'etm/'; bands = L7_bands
                # countcloudy pixels based on thresholds:
                cloudQApixel = subsetQApixel[((subsetQApixel >= QAPIXEL_thresh_lower_L7) & 
                                             (subsetQApixel < QAPIXEL_thresh_upper_L7))]
 
            # calculate percentages of cloud and fill pixels
            fillQApixel = subsetQApixel[subsetQApixel < 2.0] # fill pixels (value = 0 or 1)
            cloudpixels = len(cloudQApixel); fillpixels = len(fillQApixel) # count the cloudy and fill pixels
            cloudpercent = int(float(cloudpixels)/float(totalpixels)*100) # calculate percent cloudy
            fillpercent = int(float(fillpixels)/float(totalpixels)*100) # calculate percent fill
            
            
            # evaluate thresholds
            if cloudpercent <= cpercent_thresh and fillpercent <= fpercent_thresh:
                # download the bands for that scene into your scene folders:
                for band in bands:
                        band = str(band) # string format
                        
                        # input path to your bands in AWS:
                        pathin = '/vsis3/usgs-landsat/'+collectionpath+collectionfolder+year+'/'+p+"/"+r+"/"+scene+"/"+scene+"_B"+band+".TIF"
                        
                        outfilename = scene+"_B"+band+'_Buffer'+BoxID+'.TIF' # output file name
                        pathout = downloadpath+'Box'+BoxID+'/'+outfilename # full output file path
                        
                        # if the file hasn't already been downloaded
                        if not os.path.exists(pathout):
                            # download
                            download_cmd = 'gdalwarp -overwrite -cutline '+pathtobuffer+' -crop_to_cutline '+pathin+' '+pathout
                            download_cmd+=' --config AWS_REQUEST_PAYER requester --config AWS_REGION us-west-2'
                            download_cmd+=' --config AWS_SECRET_ACCESS_KEY '+SECRET_KEY
                            download_cmd+=' --config AWS_ACCESS_KEY_ID '+ACCESS_KEY   
                            print('Downloading:', outfilename)
                        #subprocess.run(download_cmd, shell=True, check=True)
                        try:
                            subprocess.run(download_cmd, shell=True, check=True)
                        except subprocess.CalledProcessError as e:
                            print(e.output)
                            pass
                        else:
                            print(outfilename, 'exists')
            else:
                print(scene, 'failed cloud & fill thresholds: Cloud % ', cloudpercent, 'Fill %', fillpercent)
                
print('image downloads complete!') #let the user know when the downloads are done

BoxVariegated  exists already. Skip creation of directory.
LE07_L1GT_062018_19991025_20200918_02_T2 failed cloud & fill thresholds: Cloud %  68 Fill % 0
LE07_L1TP_062018_20020915_20200916_02_T1 failed cloud & fill thresholds: Cloud %  87 Fill % 0
LC08_L1TP_062018_20130719_20200912_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LE07_L1TP_062018_20020729_20200916_02_T1 failed cloud & fill thresholds: Cloud %  85 Fill % 0
LC08_L1GT_062018_20220407_20220412_02_T2 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1TP_062018_20140417_20200911_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1TP_062018_20140807_20200911_02_T1 failed cloud & fill thresholds: Cloud %  98 Fill % 0
LC08_L1GT_062018_20220728_20220805_02_T2 failed cloud & fill thresholds: Cloud %  29 Fill % 0
LC08_L1GT_062018_20200128_20200823_02_T2 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1GT_062018_20200706_20200913_02_T2 failed cloud & fill thresholds: Cloud %  96 Fill % 0
L

LC09_L1TP_062018_20221024_20230324_02_T1_B8_BufferVariegated.TIF exists
LC08_L1TP_062018_20180802_20200831_02_T1 failed cloud & fill thresholds: Cloud %  98 Fill % 0
LE07_L1TP_062018_20030529_20200915_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LE07_L1GT_062018_20001230_20200917_02_T2 failed cloud & fill thresholds: Cloud %  99 Fill % 0
Downloading: LC08_L1TP_062018_20140722_20200911_02_T1_B8_BufferVariegated.TIF
LC08_L1TP_062018_20140722_20200911_02_T1_B8_BufferVariegated.TIF exists
LC08_L1GT_062018_20221016_20221031_02_T2 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1TP_062018_20171119_20200902_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
Downloading: LE07_L1GT_062018_20030902_20200916_02_T2_B8_BufferVariegated.TIF
LE07_L1GT_062018_20030902_20200916_02_T2_B8_BufferVariegated.TIF exists
LC08_L1TP_062018_20150215_20200909_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1TP_062018_20211114_20211124_02_T1 failed cloud & fill thr

LC08_L1TP_062018_20160117_20200907_02_T2_B8_BufferVariegated.TIF exists
LC08_L1GT_062018_20131023_20200912_02_T2 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1GT_062018_20200722_20200911_02_T2 failed cloud & fill thresholds: Cloud %  98 Fill % 0
Downloading: LC08_L1TP_062018_20190501_20200829_02_T1_B8_BufferVariegated.TIF
LC08_L1TP_062018_20190501_20200829_02_T1_B8_BufferVariegated.TIF exists
Downloading: LC08_L1TP_062018_20190226_20200829_02_T1_B8_BufferVariegated.TIF
LC08_L1TP_062018_20190226_20200829_02_T1_B8_BufferVariegated.TIF exists
LC08_L1TP_062018_20201026_20201106_02_T1 failed cloud & fill thresholds: Cloud %  98 Fill % 0
Downloading: LC08_L1TP_062018_20131124_20200912_02_T1_B8_BufferVariegated.TIF
LC08_L1TP_062018_20131124_20200912_02_T1_B8_BufferVariegated.TIF exists
LC08_L1TP_062018_20151013_20200908_02_T1 failed cloud & fill thresholds: Cloud %  31 Fill % 0
Downloading: LC08_L1GT_062018_20220829_20220910_02_T2_B8_BufferVariegated.TIF
LC08_L1GT_062018_2022082

LC08_L1TP_062018_20151216_20200908_02_T2_B8_BufferVariegated.TIF exists
LC08_L1TP_062018_20210709_20210720_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1TP_062018_20180106_20200902_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
Downloading: LC08_L1TP_062018_20140503_20200911_02_T1_B8_BufferVariegated.TIF
LC08_L1TP_062018_20140503_20200911_02_T1_B8_BufferVariegated.TIF exists
LC08_L1TP_062018_20170714_20200903_02_T1 failed cloud & fill thresholds: Cloud %  44 Fill % 0
Downloading: LC08_L1TP_062018_20150506_20200909_02_T1_B8_BufferVariegated.TIF
LC08_L1TP_062018_20150506_20200909_02_T1_B8_BufferVariegated.TIF exists
LE07_L1GT_062018_19991228_20200918_02_T2 failed cloud & fill thresholds: Cloud %  80 Fill % 0
LC08_L1GT_062018_20170612_20200903_02_T2 failed cloud & fill thresholds: Cloud %  98 Fill % 0
LC08_L1TP_062018_20180701_20200831_02_T1 failed cloud & fill thresholds: Cloud %  36 Fill % 0
LC08_L1TP_062018_20190821_20200827_02_T1 failed cloud & fill thr

LC08_L1TP_062018_20170103_20200905_02_T2_B8_BufferVariegated.TIF exists
LC08_L1TP_062018_20210215_20210301_02_T2 failed cloud & fill thresholds: Cloud %  56 Fill % 0
LC08_L1GT_062018_20211216_20211223_02_T2 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1TP_062018_20200924_20201007_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1TP_062018_20180717_20200831_02_T1 failed cloud & fill thresholds: Cloud %  97 Fill % 0
LC08_L1TP_062018_20151029_20200908_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1GT_062018_20160625_20200906_02_T2 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LE07_L1GT_062018_19990822_20200918_02_T2 failed cloud & fill thresholds: Cloud %  88 Fill % 0
LE07_L1GT_062018_20000418_20200918_02_T2 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1TP_062018_20130905_20200913_02_T1 failed cloud & fill thresholds: Cloud %  93 Fill % 0
LC08_L1TP_062018_20140401_20200911_02_T1 failed cloud & fill thresholds: Cloud %  

LE07_L1TP_061018_20000411_20200918_02_T1_B8_BufferVariegated.TIF exists
LC09_L1TP_061018_20220830_20230331_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1TP_061018_20131016_20200912_02_T1 failed cloud & fill thresholds: Cloud %  29 Fill % 0
LC08_L1TP_061018_20131101_20200912_02_T1 failed cloud & fill thresholds: Cloud %  96 Fill % 0
LC08_L1TP_061018_20190102_20200829_02_T1 failed cloud & fill thresholds: Cloud %  98 Fill % 0
LC08_L1TP_061018_20200222_20200822_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1TP_061018_20150819_20200908_02_T1 failed cloud & fill thresholds: Cloud %  96 Fill % 0
LE07_L1TP_061018_20000122_20200918_02_T1 failed cloud & fill thresholds: Cloud %  75 Fill % 0
LC08_L1TP_061018_20150616_20200909_02_T1 failed cloud & fill thresholds: Cloud %  41 Fill % 0
LE07_L1TP_061018_20010820_20200917_02_T1 failed cloud & fill thresholds: Cloud %  89 Fill % 0
Downloading: LE07_L1GT_061018_20010804_20200917_02_T2_B8_BufferVariegated.TIF
LE07

LC08_L1TP_061018_20160110_20200907_02_T1_B8_BufferVariegated.TIF exists
Downloading: LC09_L1TP_061018_20220526_20230415_02_T1_B8_BufferVariegated.TIF
LC09_L1TP_061018_20220526_20230415_02_T1_B8_BufferVariegated.TIF exists
Downloading: LE07_L1TP_061018_20020127_20200917_02_T1_B8_BufferVariegated.TIF
LE07_L1TP_061018_20020127_20200917_02_T1_B8_BufferVariegated.TIF exists
LC08_L1TP_061018_20141222_20200910_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC09_L1TP_061018_20220627_20230409_02_T1 failed cloud & fill thresholds: Cloud %  39 Fill % 0
LC08_L1TP_061018_20200121_20200823_02_T1 failed cloud & fill thresholds: Cloud %  98 Fill % 0
LC08_L1TP_061018_20190510_20200829_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1TP_061018_20221228_20230104_02_T1 failed cloud & fill thresholds: Cloud %  38 Fill % 0
LC08_L1GT_061018_20220211_20220222_02_T2 failed cloud & fill thresholds: Cloud %  99 Fill % 0
Downloading: LC08_L1TP_061018_20130610_20200912_02_T1_B8_BufferV

LC09_L1TP_061018_20220611_20230413_02_T1_B8_BufferVariegated.TIF exists
LC08_L1TP_061018_20160501_20200907_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LE07_L1TP_061018_20030810_20200916_02_T1 failed cloud & fill thresholds: Cloud %  21 Fill % 14
LC08_L1TP_061018_20140410_20200911_02_T1 failed cloud & fill thresholds: Cloud %  59 Fill % 0
LC08_L1TP_061018_20220721_20220801_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LE07_L1TP_061018_20021026_20200916_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
Downloading: LE07_L1TP_061018_20000529_20200918_02_T1_B8_BufferVariegated.TIF
LE07_L1TP_061018_20000529_20200918_02_T1_B8_BufferVariegated.TIF exists
LE07_L1TP_061018_20021010_20200916_02_T1 failed cloud & fill thresholds: Cloud %  49 Fill % 0
LC09_L1TP_061018_20221017_20230325_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
Downloading: LE07_L1TP_061018_20001105_20200918_02_T1_B8_BufferVariegated.TIF
LE07_L1TP_061018_20001105_20200918_02_T1_

LE07_L1TP_061018_20020401_20200917_02_T1_B8_BufferVariegated.TIF exists
LC08_L1TP_061018_20180507_20200901_02_T1 failed cloud & fill thresholds: Cloud %  82 Fill % 0
LE07_L1TP_061018_20020604_20200917_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1TP_061018_20191118_20200825_02_T1 failed cloud & fill thresholds: Cloud %  98 Fill % 0
LC08_L1GT_061018_20221110_20221121_02_T2 failed cloud & fill thresholds: Cloud %  99 Fill % 0
Downloading: LC08_L1TP_061018_20151123_20200908_02_T1_B8_BufferVariegated.TIF
LC08_L1TP_061018_20151123_20200908_02_T1_B8_BufferVariegated.TIF exists
LE07_L1TP_061018_20001020_20200917_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC09_L1TP_061018_20220713_20230407_02_T1 failed cloud & fill thresholds: Cloud %  99 Fill % 0
LC08_L1TP_061018_20140426_20200911_02_T1 failed cloud & fill thresholds: Cloud %  76 Fill % 0
Downloading: LE07_L1TP_061018_20011210_20200917_02_T2_B8_BufferVariegated.TIF
LE07_L1TP_061018_20011210_20200917_02_T2_B

# 7) Automatically grab the image acquisition dates from the metadata files

Exports a CSV file containing the Landsat scene names and the acquisition dates for each scene.

In [48]:
datetimes = [] # list of scene acquisition dates
scenes_dated = [] # list of scenes

for BoxID in BoxIDs:
    bp_out = downloadpath+'Box'+BoxID+'/' # path to downloaded images for that glacier
    
    # Grab all path row folder names from boxes_pr_df:
    paths = boxes_pr_df.loc[BoxID,'Path']; rows = boxes_pr_df.loc[BoxID,'Row']
    
    # Grab the downloaded scenes
    downloaded_scenes = os.listdir(bp_out)
    for scene in downloaded_scenes:
        if scene.startswith('L') and ('T1' in scene or 'T2' in scene) and scene.endswith('.TIF'):
            scenename = scene[:40]
            
            # Search for metadata file in each path, row folder:
            found = False # not found yet
            for a in range(0, len(paths)): # look in each path row folder
                folder_name = 'Path'+paths[a]+'_Row'+rows[a]+'_c2'
                folderpath = downloadpath+folder_name+'/'
                
                # if not there
                if not os.path.exists(folderpath+scenename+'_MTL.txt'):
                    continue # skip to the next folder
                else: # if there
                    # open the file
                    mdata = open(folderpath+scenename+"_MTL.txt", "r")
                    # find the acquisition date in the file
                    for line in mdata:
                        variable = line.split("=")[0]
                        if ("DATE_ACQUIRED" in variable):
                            date = line.split("=")[1][1:-1] # find acquisition date
                    # save scenename and date
                    dates = datetime.datetime.strptime(date, '%Y-%m-%d') # save as datetime object
                    print(scenename, dates)
                    datetimes.append(dates); scenes_dated.append(scenename) # store in lists
                    
                    found = True # found the file
                    break # stop search
            
            if found == False: # if the file was not found at all
                # grab acquisition date from the filename
                date = scene[17:25]
                dates = datetime.datetime.strptime(date, '%Y%m%d') # save as datetime object
                print(scenename, 'missing metadata file. Guessing from filename instead:', dates)

# Store in a dataframe
datetime_df = pd.DataFrame(list(zip(scenes_dated, datetimes)), columns=['Scene', 'datetime'])
datetime_df = datetime_df.sort_values(by='datetime', ascending=True); datetime_df = datetime_df.drop_duplicates()
datetime_df

LE07_L1TP_061018_20011124_20200917_02_T1 2001-11-24 00:00:00
LE07_L1TP_061018_20010430_20200917_02_T1 2001-04-30 00:00:00
LE07_L1TP_062018_20000824_20200917_02_T1 2000-08-24 00:00:00
LC08_L1TP_062018_20140722_20200911_02_T1 2014-07-22 00:00:00
LC08_L1TP_062018_20190226_20200829_02_T1 2019-02-26 00:00:00
LC09_L1TP_062018_20220415_20230422_02_T1 2022-04-15 00:00:00
LC08_L1TP_062018_20220306_20220314_02_T1 2022-03-06 00:00:00
LC08_L1TP_062018_20221203_20221212_02_T1 2022-12-03 00:00:00
LC08_L1TP_062018_20160117_20200907_02_T2 2016-01-17 00:00:00
LE07_L1TP_061018_20000411_20200918_02_T1 2000-04-11 00:00:00
LC08_L1TP_061018_20160314_20200907_02_T1 2016-03-14 00:00:00
LE07_L1TP_062018_20000909_20200917_02_T1 2000-09-09 00:00:00
LE07_L1TP_061018_20031013_20200916_02_T1 2003-10-13 00:00:00
LC08_L1TP_062018_20150522_20200909_02_T1 2015-05-22 00:00:00
LC08_L1GT_061018_20131219_20200912_02_T2 2013-12-19 00:00:00
LC08_L1TP_061018_20201120_20210314_02_T1 2020-11-20 00:00:00
LC09_L1TP_061018_2022072

Unnamed: 0,Scene,datetime
113,LE07_L1TP_062018_19990705_20200918_02_T1,1999-07-05
124,LE07_L1TP_061018_19990714_20200918_02_T1,1999-07-14
85,LE07_L1TP_062018_20000113_20200918_02_T2,2000-01-13
95,LE07_L1TP_062018_20000214_20200918_02_T1,2000-02-14
44,LE07_L1TP_061018_20000310_20200918_02_T1,2000-03-10
...,...,...
134,LC08_L1TP_061018_20221025_20221107_02_T1,2022-10-25
64,LC08_L1TP_062018_20221117_20221128_02_T2,2022-11-17
122,LC09_L1TP_061018_20221118_20230321_02_T1,2022-11-18
7,LC08_L1TP_062018_20221203_20221212_02_T1,2022-12-03


In [None]:
# write dates to csv
datetime_df.to_csv(basepath+DATES_FILENAME, sep=',') 

# 8) Delete all quality band files (*QA_PIXEL.TIF) to save space

These files will not be needed after the download step, so they can be removed to save space.

In [None]:
for BoxID in BoxIDs:
    # Grab all path row folder names from boxes_pr_df:
    paths = boxes_pr_df.loc[BoxID,'Path']; rows = boxes_pr_df.loc[BoxID,'Row']
    
    for a in range(0, len(paths)): # look in each path row folder
        folder_name = 'Path'+paths[a]+'_Row'+rows[a]+'_c2'
        folderpath = downloadpath+folder_name+'/'
        
        # remove all files with QA_PIXEL in the name
        for file in os.listdir(folderpath):
            if 'QA_PIXEL' in file:
                os.remove(folderpath+file)