## Identification of interesting VIIRS/MODIS files

This notebook process a general bash file that retrieve VIIRS and MODIS images to return a new bash file with the images that mathches ICESat-2 data in a given temporal window.

Prerequisites:
- bash script file with the name of all the VIIRS/MODIS files in a given specific region (see https://search.earthdata.nasa.gov/search?m=-41.467241310846475!75.05859375!4!1!0!0%2C2)

Imput:
- Maximum temporal tolerance between ICESat-2 ann VIIRS/MOIDS retrievals
- Spatial extent: this should be a subset of the spatial extent used to retrieve the scipt with the names of all the VIIRS/MODIS files

Output:
- Modified bash file with the names of the VIIRS/MODIS files that temporaly match with retrievals from ICESat-2



In order to download the files, make the script an executable by running the line 'chmod 777 download.sh' from the command line. After that is complete, the file can be executed by typing './download.sh'.

In [1]:
import pandas as pd
import icepyx as ipx
from shapely.ops import unary_union


from utils import drainage_basin
from utils_cloud import CLDMSK_date

In [2]:
# Spatial extend for the ICESat-2 search 
# Observation: in case that the VIIRS file retrieve images in a different spatial extent, then we are no going to have matches, 
# independently of the window. 
basins_ids = [1.1, 1.2, 1.3, 1.4,
              2.1, 2.2, 
              3.1, 3.2, 3.3, 
              4.1, 4.2, 4.3,
              5.0,
              6.1, 6.2,
              7.1, 7.2, 
              8.1, 8.2]

my_basins = basins_ids
polygons = [drainage_basin(basin_id) for basin_id in my_basins ]
poly_full = unary_union(polygons)
poly_full_simplify = poly_full.simplify(tolerance=0.5)

spatial_extent = list(poly_full_simplify.exterior.coords)

# Temporal window in minutes
minutes = 5
hr = minutes / 60 
# Maximim number of files we are interested in retrieve 
max_filtered = 100000000000


# lenght of line if bash file with CLDMSK imagen name
length_bash = 146

# path with bash retrieval files
#path_retrieve = "data/VIIRS_bash/"
path_retrieve = "data/Earthdata_scripts/"

## Select your satellite data

In [3]:
sensor = "VIIRS"
#sensor = "MODIS-Aqua"

if sensor == "VIIRS":
    
    # name of the original bash file to retrieve the data
    exe_old = "CLDMSK-VIIRS-Greenland-2019.sh"
    # name of the output bash file 
    exe_new = "CLDMSK-VIIRS-Greenland-2019-filtered-5min.sh"
    #lenght_bash = viirs_length_bash
    
if sensor == "MODIS-Aqua":
    
    exe_old = "CLDMSK_L2_MODIS_Aqua-Greenland-2019.sh"
    exe_new = "CLDMSK_L2_MODIS_Aqua-Greenland-2019-filtered.sh"    
    #lenght_bash = modis_aqua_length_bash

In [4]:
file = open(path_retrieve + exe_old, 'r')
new_file = open(path_retrieve + exe_new, 'x')

#viirs_names = []

counter = 0
counter_filtered = 0

for line in file:
    
    #print(line)
    
    if ("https://ladsweb.modaps.eosdis.nasa.gov" in line) and (len(line) == length_bash):
                
        if counter_filtered < max_filtered:
 
            cloud_file_name = line[:-1].split("/")[-1]
            cloud_time = CLDMSK_date(cloud_file_name)

                
            # Temporal search window for 
            start = cloud_time - pd.DateOffset(hours=hr)
            end   = cloud_time + pd.DateOffset(hours=hr) + pd.DateOffset(minutes=6)
            

            start_date_str = start.strftime('%Y-%m-%d')
            end_date_str   = end.strftime('%Y-%m-%d')
            start_time_str = start.strftime('%H:%M:%S')
            end_time_str   = end.strftime('%H:%M:%S')

            try:

                region_a = ipx.Query("ATL06", spatial_extent, [start_date_str, end_date_str], start_time_str, end_time_str)
                avail_granules = region_a.avail_granules(ids=True)

                #viirs_names.append(Vfile)
                #new_file.write(Vfile + "\n") # add \n
                assert len(avail_granules) > 0
                new_file.write(line)
                counter_filtered += 1

            except AssertionError:
                None
                
            counter += 1
        
        if counter % 100 == 0:
            print(">>> There are a total of", counter_filtered, "files founded out of", counter, "file names.")
            
        #break
        
    else:
        
        new_file.write(line)
        
    
new_file.close()

print("Completed")

>>> There are a total of 2 files founded out of 100 file names.
>>> There are a total of 10 files founded out of 200 file names.
>>> There are a total of 11 files founded out of 300 file names.
>>> There are a total of 15 files founded out of 400 file names.
>>> There are a total of 21 files founded out of 500 file names.
>>> There are a total of 26 files founded out of 600 file names.
>>> There are a total of 30 files founded out of 700 file names.
>>> There are a total of 39 files founded out of 800 file names.
>>> There are a total of 40 files founded out of 900 file names.
>>> There are a total of 45 files founded out of 1000 file names.
>>> There are a total of 47 files founded out of 1100 file names.
>>> There are a total of 53 files founded out of 1200 file names.
>>> There are a total of 64 files founded out of 1300 file names.
>>> There are a total of 65 files founded out of 1400 file names.
>>> There are a total of 73 files founded out of 1500 file names.
>>> There are a tota