# Code to download LS8 images for Greenland peripheral glaciers using Amazon Web Services (aws) 

### Jukes Liu

The following code automatically downloads Landsat 8 scenes available through Amazon Web Services that have less than a threshold % of cloud cover. The Landsat 8 scenes over each glacier are identified using their pre-determined path and row, stored in a .csv file. The scenes are filtered for cloud cover using their metadata files.

## 1) Set up:

#### Install AWS using pip or pip3

Must have Amazon Web Services installed on your terminal. Follow instructions at https://docs.aws.amazon.com/cli/latest/userguide/install-linux-al2017.html to get aws commands onto your shell terminal.

#### Import packages

In [2]:
import pandas as pd
import numpy as np
import os
import subprocess

#### Read in LS path and row for each peripheral glacier by BoxID into a DataFrame

The LS Path and Row information for each peripheral glacier is stored in a .csv file. 

Note: Many glaciers exist in the same Landsat scene, so some Paths and Rows are repeated. Therefore, the subsequent code will not repeat download for a path and row combination that already exists in the output directory.

In [3]:
#set basepath
basepath = '/home/jukes/Documents/Sample_glaciers/'
#basepath = '/home/automated-glacier-terminus/'
outputpath = '/media/jukes/jukes1/'

#read the path row csv file into a dataframe
pathrows_df = pd.read_csv(basepath+'LS_pathrows.csv', sep=',', usecols =[0,1,2], dtype=str, nrows =10)
pathrows_df = pathrows_df.set_index('BoxID')
pathrows_df

Unnamed: 0_level_0,Path,Row
BoxID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,34,5
2,31,5
4,31,5
33,8,14
120,232,17
174,232,17
235,232,15
259,232,15
277,232,15
531,11,2


In [3]:
#check the df dimensions
pathrows_df.shape

(10, 2)

#### Create output directory: LS8aws

In [4]:
#create LS8aws folder
if os.path.exists(outputpath+'LS8aws')==True:
    print("Path exists already")
else:
    os.mkdir(outputpath+'LS8aws')
    print("LS8aws directory made")

Path exists already


## 2) Download B8 (panchromatic band) and MTL.txt (metadata) files for all available images over the path/row of the glaciers

The Landsat8 scenes stored in AWS can be accessed using the landsat-pds bucket and the path and row information. Each of the bands and a metadata file can be accessed separately. 

We are interested in the panchromatic band (B8.TIF) and the metadata file to filter for cloud cover (MTL.txt). The download commands will use the following syntax:


    aws --no-sign-request s3 cp s3://landsat-pds/L8/path/row/LC8pathrowyear001LGN00/LC8pathrowyear001LGN00_MTL.txt /path_to/output/

    aws --no-sign-request s3 cp s3://landsat-pds/L8/path/row/LC8pathrowyear001LGN00/LC8pathrowyear001LGN00_B8.TIF /path_to/output/

Access https://docs.opendata.aws/landsat-pds/readme.html to learn more.

### 2A) Option 1: For one BoxID (one glacier) at a time

In [61]:
#choose a glacier: Box002
boxid = pathrows_df.index[1]
path = pathrows_df['Path'][1]
row = pathrows_df['Row'][1] 
print('BoxID ', boxid, 'path', path, 'row', row)

#set path row folder name
folder_name = 'Path'+path+'_Row'+row
print(folder_name)

#set input path
bp_in = 's3://landsat-pds/L8/'
totalp_in = bp_in+path+'/'+row+'/'
print(totalp_in)

#set output path
bp_out = outputpath+'LS8aws/'+folder_name+'/'
print(bp_out)

BoxID  002 path 031 row 005
Path031_Row005
s3://landsat-pds/L8/031/005/
/media/jukes/jukes1/LS8aws/Path031_Row005/


#### Create the Path_Row folder

In [62]:
#create Path_row folder and write path names to txt files
if os.path.exists(bp_out):
    print(folder_name, " EXISTS ALREADY. SKIP.")
else:
    os.mkdir(bp_out)
    print(folder_name+" directory made")

Path031_Row005  EXISTS ALREADY. SKIP.


#### Download all the metadata text files using os.system aws commands

Use the following syntax:

    aws --no-sign-request s3 cp s3://landsat-pds/L8/031/005/ Output/path/LS8aws/Path031_Row005/ --recursive --exclude "*" --include "*.txt" 

In [63]:
#Check command syntax:
command = 'aws --no-sign-request s3 cp '+totalp_in+' '+bp_out+' --recursive --exclude "*" --include "*.txt"'
print(command)

aws --no-sign-request s3 cp s3://landsat-pds/L8/031/005/ /media/jukes/jukes1/LS8aws/Path031_Row005/ --recursive --exclude "*" --include "*.txt"


In [64]:
#call the command line that downloads the metadata files using aws
subprocess.call(command, shell=True)

0

#### Filter for cloud cover

If the metadata files indicate land cloud cover is less than the threshold, then download the B8, otherwise, delete the folder. Not all metadata files contain the land cloud cover, some only contain the overall cloud cover. If land cloud cover is not found, use the cloud cover value to determine whether the image should be downloaded .Use the following metadata attributes:

  GROUP = IMAGE_ATTRIBUTES
  
    CLOUD_COVER = 23.58
    CLOUD_COVER_LAND = 20.41

In [9]:
#set cloud cover % thresholds
ccland_thresh = 30.0
cc_thresh = 50.0

#set paths:
#set path row folder name
folder_name = 'Path'+path+'_Row'+row
print(folder_name)

#set input path
bp_in = 's3://landsat-pds/L8/'
totalp_in = bp_in+path+'/'+row+'/'
print(totalp_in)

#set output path
bp_out = outputpath+'LS8aws/'+folder_name+'/'
print(bp_out)

Path011_Row002
s3://landsat-pds/L8/011/002/
/media/jukes/jukes1/LS8aws/Path011_Row002/


In [10]:
#loop through all the metadata files in the path_row folder:
for image in os.listdir(bp_out):
    if image.startswith("LC"):
        #list the name of the image folder
        print(image)
        
        #open the metadata file within that folder
        mdata = open(bp_out+image+"/"+image+"_MTL.txt", "r")
        
        #set a detection variable for whether or not the metadata contains land cloud cover
        ccl_detected = False
        
        #loop through each line in metadata to find Land Cloud Cover
        for line in mdata:
            cc_variable = line.split("=")[0]
            
            #if there is land cloud cover:
            if ("CLOUD_COVER_LAND" in cc_variable):
                #save it:
                ccl = np.float(line.split("=")[1])
                         
                #switch the ccl_detected variable to True!
                ccl_detected = True
                    
                #if the ccl is less than the threshold, delete the file
                if ccl > ccland_thresh:
                    #remove the image directory
                    subprocess.call('rm -r '+bp_out+image, shell=True)
                    print(ccl, ' > ', ccland_thresh, ", ", image, "removed")
                #otherwise: 
                else:
                    #DOWNLOAD THE B8 FILE
                    subprocess.call('source activate aws; aws --no-sign-request s3 cp '+totalp_in+' '+bp_out+' --recursive --exclude "*" --include "*B8.TIF"', shell=True)
                    print(image, "B8 downloaded -ccl ")
        
        #Was the ccl detected?
        print("CCL detected = ", ccl_detected)
                        
        #if False,use the overall cloud cover:
        if ccl_detected == False:   
            print("CCL not detected, use CC.")
            
            #open the metadata file again
            mdata = open(bp_out+image+"/"+image+"_MTL.txt", "r")
            for line in mdata:
                variable = line.split("=")[0]
                
                #now there should only be one line starting with cloud_cover
                if ("CLOUD_COVER" in variable):       
                    #save the cloud cover:
                    cc = np.float(line.split("=")[1])

                    #if the cc is less than the threshold, delete the file:
                    if cc > cc_thresh:
                        #remove the image directory
                        subprocess.call('rm -r '+bp_out+image, shell=True)
                        print(cc, ' > ', cc_thresh, ", ", image, "removed")

                    #otherwise: 
                    else:
                        #DOWNLOAD THE B8 FILE
                        subprocess.call('source activate aws; aws --no-sign-request s3 cp '+totalp_in+' '+bp_out+' --recursive --exclude "*" --include "*B8.TIF"', shell=True)
                        print(image, "B8 downloaded -cc")
print('Done.')

LC80110022014214LGN00
LC80110022014214LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80110022016140LGN00
LC80110022016140LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80110022015153LGN00
LC80110022015153LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80110022016268LGN00
LC80110022016268LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80110022014230LGN00
LC80110022014230LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80110022017094LGN00
LC80110022017094LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80110022017110LGN00
LC80110022017110LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80110022017078LGN00
LC80110022017078LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80110022016204LGN00
LC80110022016204LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80110022014246LGN00
93.23  >  30.0 ,  LC80110022014246LGN00 removed
CCL detected =  True
LC80110022016092LGN00
LC80110022016092LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80110022013243LGN00
LC80110022013243LGN00 B8 downloaded -

### 2B) Option 2: For all glaciers, loop through the DataFrame and perform all the Option 1 steps

In [6]:
#SET cloud cover thresholds for filtering
ccland_thresh = 30.0
cc_thresh = 50.0



#LOOP through each of the glaciers in the DataFrame and download for each path and row
for i in range(0, len(pathrows_df.index)):
    #SET path and row variables to the LS path and rows of the box
    path = pathrows_df['Path'][i]
    row = pathrows_df['Row'][i]
    #print(path, row)
    
    #1) CREATE path and row folders to download into and set input output paths
    #SET path row folder name
    folder_name = 'Path'+path+'_Row'+row
    print(folder_name)
    
    #SET input path
    bp_in = 's3://landsat-pds/L8/'
    totalp_in = bp_in+path+'/'+row+'/'
    #print(totalp_in)

    #SET output path
    bp_out = outputpath+'LS8aws/'+folder_name+'/'
    
    
    
    #IF the folder exists, it's already been downloaded, do not attempt download.
    if os.path.exists(bp_out):
        print(folder_name, " EXISTS ALREADY. SKIP.")
    #2) OTHERWISE, create the folder and download into it
    else:
        os.mkdir(bp_out)
        print(folder_name+" directory made")

        
        #3) DOWNLOAD metadata files into the new path-row folder
        #CHECK COMMAND SYNTAX
        command = 'source activate aws; aws --no-sign-request s3 cp '+totalp_in+' '+bp_out+' --recursive --exclude "*" --include "*.txt"'
        #print(command)
        subprocess.call(command, shell=True)
        
        
        
        #4) LOOP through all the files in the path_row folder to download based on cloud cover
        # in the metadata files
        for image in os.listdir(bp_out):
            if image.startswith("LC"):
                #list the name of the image folder
                print(image)

                #open the metadata file within that folder
                mdata = open(bp_out+image+"/"+image+"_MTL.txt", "r")

                #set a detection variable for whether or not the metadata contains land cloud cover
                ccl_detected = False

                #loop through each line in metadata to find the land cloud cover
                for line in mdata:
                    cc_variable = line.split("=")[0]

                    #if there is land cloud cover:
                    if ("CLOUD_COVER_LAND" in cc_variable):
                        #save it:
                        ccl = np.float(line.split("=")[1])

                        #switch the ccl detected variable to True
                        ccl_detected = True

                        #if the ccl is less than the threshold, delete the file
                        if ccl > ccland_thresh:
                            #remove the image directory
                            #subprocess.call('rm -r '+bp_out+image, shell=True)
                            print(ccl, ' > ', ccland_thresh, ", ", image, "removed")
                        #otherwise: 
                        else:
                            #download the B8 file
                            subprocess.call('source activate aws; aws --no-sign-request s3 cp '+totalp_in+' '+bp_out+' --recursive --exclude "*" --include "*B8.TIF"', shell=True)
                            print(image, "B8 downloaded -ccl ")

                print("CCL detected = ", ccl_detected)

                #if False,use the overall cloud cover:
                if ccl_detected == False:   
                    print("CCL not detected, use CC.")

                    #open the metadata file again
                    mdata = open(bp_out+image+"/"+image+"_MTL.txt", "r")
                    for line in mdata:
                        variable = line.split("=")[0]

                        #now there should only be one line starting with cloud_cover
                        if ("CLOUD_COVER" in variable):       
                            #save the cloud cover:
                            cc = np.float(line.split("=")[1])

                            #if the cc is less than the threshold, delete the file:
                            if cc > cc_thresh:
                                #remove the image directory
                                #subprocess.call('rm -r '+bp_out+image, shell=True)
                                print(cc, ' > ', cc_thresh, ", ", image, "removed")

                            #otherwise: 
                            else:
                                #DOWNLOAD THE B8 FILE
                                subprocess.call('source activate aws; aws --no-sign-request s3 cp '+totalp_in+' '+bp_out+' --recursive --exclude "*" --include "*B8.TIF"', shell=True)
                                print(image, "B8 downloaded -cc")

Path034_Row005
Path034_Row005 directory made
LC80340052014215LGN00
84.84  >  30.0 ,  LC80340052014215LGN00 removed
CCL detected =  True
LC80340052013148LGN00
LC80340052013148LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80340052016125LGN00
LC80340052016125LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80340052014183LGN00
33.55  >  30.0 ,  LC80340052014183LGN00 removed
CCL detected =  True
LC80340052014135LGN00
52.56  >  30.0 ,  LC80340052014135LGN00 removed
CCL detected =  True
LC80340052016157LGN00
LC80340052016157LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80340052015106LGN00
CCL detected =  False
CCL not detected, use CC.
52.51  >  50.0 ,  LC80340052015106LGN00 removed
LC80340052016093LGN00
LC80340052016093LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80340052015170LGN00
67.3  >  30.0 ,  LC80340052015170LGN00 removed
CCL detected =  True
LC80340052014247LGN00
CCL detected =  False
CCL not detected, use CC.
LC80340052014247LGN00 B8 downloaded -cc
LC80340052015202LGN00
LC

LC80080142013302LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80080142015196LGN00
LC80080142015196LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80080142017105LGN00
LC80080142017105LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80080142015244LGN00
76.35  >  30.0 ,  LC80080142015244LGN00 removed
CCL detected =  True
LC80080142015116LGN00
LC80080142015116LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80080142015068LGN00
96.07  >  30.0 ,  LC80080142015068LGN00 removed
CCL detected =  True
LC80080142017089LGN00
36.21  >  30.0 ,  LC80080142017089LGN00 removed
CCL detected =  True
LC80080142015260LGN00
86.37  >  30.0 ,  LC80080142015260LGN00 removed
CCL detected =  True
LC80080142016263LGN00
35.76  >  30.0 ,  LC80080142016263LGN00 removed
CCL detected =  True
LC80080142017057LGN00
LC80080142017057LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80080142013270LGN00
56.65  >  30.0 ,  LC80080142013270LGN00 removed
CCL detected =  True
LC80080142015164LGN00
LC80080142015164LGN00 B8 down

LC82320152013111LGN01
LC82320152013111LGN01 B8 downloaded -ccl 
CCL detected =  True
LC82320152016056LGN00
73.7  >  30.0 ,  LC82320152016056LGN00 removed
CCL detected =  True
LC82320152015309LGN00
LC82320152015309LGN00 B8 downloaded -ccl 
CCL detected =  True
LC82320152015181LGN00
LC82320152015181LGN00 B8 downloaded -ccl 
CCL detected =  True
LC82320152015325LGN00
LC82320152015325LGN00 B8 downloaded -ccl 
CCL detected =  True
LC82320152017074LGN00
LC82320152017074LGN00 B8 downloaded -ccl 
CCL detected =  True
LC82320152016264LGN00
71.21  >  30.0 ,  LC82320152016264LGN00 removed
CCL detected =  True
LC82320152015085LGN00
CCL detected =  False
CCL not detected, use CC.
62.8  >  50.0 ,  LC82320152015085LGN00 removed
LC82320152016136LGN00
LC82320152016136LGN00 B8 downloaded -ccl 
CCL detected =  True
LC82320152017058LGN00
99.85  >  30.0 ,  LC82320152017058LGN00 removed
CCL detected =  True
LC82320152013255LGN00
LC82320152013255LGN00 B8 downloaded -ccl 
CCL detected =  True
LC82320152016296

KeyboardInterrupt: 

## 3) Grab image dates from the metadata

In [67]:
import datetime

#create dictionary of datetime objects for the images:
datetime_objs = {}

scenecount = 0

#LOOP through each of the glaciers in the DataFrame and download for each path and row
for i in range(0, len(pathrows_df.index)):
    #SET path and row variables to the LS path and rows of the box
    path = pathrows_df['Path'][i]
    row = pathrows_df['Row'][i]
    #print(path, row)
    
    #SET path row folder name
    folder_name = 'Path'+path+'_Row'+row
#     print(folder_name)
    
    #SET output path
    bp_out = outputpath+'LS8aws/'+folder_name+'/'
        
        
    # LOOP through all the metadata files in the path_row folder to grab the image dates
    # in the metadata files
    for scene in os.listdir(bp_out):
        if scene.startswith("LC"):
            #list the name of the image folder
#             print(scene)
            scenetag = scene[9:19]
#             print(scenetag)
            scenecount = scenecount+1

            #open the metadata file within that folder
            for file in os.listdir(bp_out+scene+"/"):
#                 #if no metadata file is there, skip it
#                 datetime_objs.update( {scene: "Nan"})
                
                if ("MTL.txt" in file):
                    
                    mdata = open(bp_out+scene+"/"+scene+"_MTL.txt", "r")

                    #loop through each line in metadata to find the date and time of acquisition
                    for line in mdata:
                        variable = line.split("=")[0]

                        if ("DATE_ACQUIRED" in variable):
                            #save it:
                            date = line.split("=")[1][1:-1]
#                             print(date)

                        #if ("SCENE_CENTER_TIME" in variable): 
                            #save it:
                            #time = line.split("=")[1][2:-2]
                            #print(time)
                
                    #combine them into a datetime object
                    datetime_obj = datetime.datetime.strptime(date, '%Y-%m-%d')
                    datetime_objs.update( {scene[4:-5]: datetime_obj} )
                         

datetime_df = pd.DataFrame.from_dict(datetime_objs, orient='index')
datetime_df

Unnamed: 0,0
340052014215,2014-08-03
340052013148,2013-05-28
340052016125,2016-05-04
340052014183,2014-07-02
340052014135,2014-05-15
340052016157,2016-06-05
340052015106,2015-04-16
340052016093,2016-04-02
340052015170,2015-06-19
340052014247,2014-09-04


Export image dates to .csv

In [50]:
datetime_df.to_csv(path_or_buf = basepath+'datetags.csv', sep=',')

## For download using Google instead, follow these instructions:

Use gsutil: https://krstn.eu/landsat-batch-download-from-google/

To access a scene for Path 124, Row 053, use this syntax:

gsutil cp -n gs://earthengine-public/landsat/L8/124/053/LC81240532013107LGN01.tar.bz /landsat/
