# links_iteration() function improvement

* __About the function:__ function links_iteration() allows for the merging of ALL available tiles within an area of interest from ALL AVAILABLE DATES within a month (Used as a backup each month when specific dates failed).
* __Current challenge:__ Currently, function merges ALL available tiles for ALL available dates, meaning that some tiles are duplicated since they can be found on two or more dates.
* __Objective:__ For the month processing inside links_iteration() to use include in the mosaic each unique tile once, the one with less clouds percentage.

## __Import libraries__

In [1]:
from pathlib import Path
current_path = Path().resolve()
for parent in current_path.parents:
    if parent.name == "accesibilidad-urbana":
        module_path = str(parent)+'/'
        break
print(module_path)

/home/observatorio/Documents/repos/accesibilidad-urbana/


In [2]:
import os
import sys

import pandas as pd
import geopandas as gpd
import osmnx as ox
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

if module_path not in sys.path:
    sys.path.append(module_path)
import aup

## __From Script 19: Config notebook__

In [3]:
city = 'Piedad'

In [4]:
band_name_dict = {'nir':[False], #If GSD(resolution) of band is different, set True.
                  'red':[False], #If GSD(resolution) of band is different, set True.
                  'eq':['(nir-red)/(nir+red)']}
sat_query = {"eo:cloud_cover": {"lt": 10}}
index_analysis = 'ndvi'
tmp_dir = module_path + f'data/processed/tmp_{index_analysis}/'
res = [8,11]
freq = 'MS'
start_date = '2018-01-01'
end_date = '2018-12-31'
satellite = "sentinel-2-l2a"

print(tmp_dir)

/home/observatorio/Documents/repos/accesibilidad-urbana/data/processed/tmp_ndvi/


## __From Script 19: Main function__

### __Main function__ - Create hex_city

In [5]:
###############################
### Create city area of interest with biggest hexs
big_res = min(res)
schema_hex = 'hexgrid'
table_hex = f'hexgrid_{big_res}_city_2020'

# Download hexagons with type=urban
type = 'urban'
query = f"SELECT hex_id_{big_res},geometry FROM {schema_hex}.{table_hex} WHERE \"city\" = '{city}\' AND \"type\" = '{type}\'"
hex_urban = aup.gdf_from_query(query, geometry_col='geometry')

# Download hexagons with type=rural within 500m buffer
poly = hex_urban.to_crs("EPSG:6372").buffer(500).reset_index()
poly = poly.to_crs("EPSG:4326")
poly_wkt = poly.dissolve().geometry.to_wkt()[0]
type = 'rural'
query = f"SELECT hex_id_{big_res},geometry FROM {schema_hex}.{table_hex} WHERE \"city\" = '{city}\' AND \"type\" = '{type}\' AND (ST_Intersects(geometry, \'SRID=4326;{poly_wkt}\'))"
hex_rural = aup.gdf_from_query(query, geometry_col='geometry')

# Concatenate urban and rural hex
hex_city = pd.concat([hex_urban, hex_rural])

# Show
print(f'Downloaded {len(hex_city)} hexagon features')
print(hex_city.shape)
print(hex_city.crs)
hex_city.head(2)

Downloaded 173 hexagon features
(173, 2)
EPSG:4326


Unnamed: 0,hex_id_8,geometry
0,884981192dfffff,"POLYGON ((-101.69493 20.39082, -101.69039 20.3..."
1,88498112b5fffff,"POLYGON ((-102.0076 20.33191, -102.00306 20.33..."


In [7]:
#df_len = aup.download_raster_from_pc(hex_city, index_analysis, city, freq,
#                                     start_date, end_date, tmp_dir, band_name_dict, 
#                                     query=sat_query, satellite=satellite,
#                                     compute_unavailable_dates=True)

### __b - download_raster_from_pc() Step by step debug__

In [6]:
# Rename variables for argument compatibility inside download_raster_from_pc function
gdf = hex_city.copy()
query = sat_query.copy()
projection_crs = "EPSG:6372"
compute_unavailable_dates = True

In [8]:
# Create area of interest coordinates from hexagons to download raster data
print('Extracting bounding coordinates from hexagons')
# Create buffer around hexagons
poly = gdf.to_crs(projection_crs).buffer(500)
poly = poly.to_crs("EPSG:4326")
poly = gpd.GeoDataFrame(geometry=poly).dissolve().geometry
# Extract coordinates from polygon as DataFrame
coord_val = poly.bounds
# Get coordinates for bounding box
n = coord_val.maxy.max()
s = coord_val.miny.min()
e = coord_val.maxx.max()
w = coord_val.minx.min()

# Set the coordinates for the area of interest
area_of_interest = {
    "type": "Polygon",
    "coordinates": [
        [
            [e, s],
            [w, s],
            [w, n],
            [e, n],
            [e, s],
        ]
    ],
}
area_of_interest

Extracting bounding coordinates from hexagons


{'type': 'Polygon',
 'coordinates': [[[np.float64(-101.66646150156433),
    np.float64(20.27422711146732)],
   [np.float64(-102.08631451513033), np.float64(20.27422711146732)],
   [np.float64(-102.08631451513033), np.float64(20.466257437641488)],
   [np.float64(-101.66646150156433), np.float64(20.466257437641488)],
   [np.float64(-101.66646150156433), np.float64(20.27422711146732)]]]}

In [9]:
# Create time of interest (Creates a list for all to-be-analysed-months with structure [start_day/end_day,(...)])
print('Defining time of interest')
time_of_interest = aup.create_time_of_interest(start_date, end_date, freq=freq)
time_of_interest

Defining time of interest


['2018-01-01/2018-01-31',
 '2018-02-01/2018-02-28',
 '2018-03-01/2018-03-31',
 '2018-04-01/2018-04-30',
 '2018-05-01/2018-05-31',
 '2018-06-01/2018-06-30',
 '2018-07-01/2018-07-31',
 '2018-08-01/2018-08-31',
 '2018-09-01/2018-09-30',
 '2018-10-01/2018-10-31',
 '2018-11-01/2018-11-30',
 '2018-12-01/2018-12-31']

In [10]:
# Gather items for time and area of interest (Creates of list of available image items)
print('Gathering items for time and area of interest')
items = aup.gather_items(time_of_interest, area_of_interest, query=query, satellite=satellite)
print(f'Fetched {len(items)} items')
items

Gathering items for time and area of interest
Fetched 93 items


[<Item id=S2B_MSIL2A_20180120T171619_R112_T14QKH_20201014T072444>,
 <Item id=S2B_MSIL2A_20180120T171619_R112_T13QHC_20201014T072446>,
 <Item id=S2B_MSIL2A_20180120T171619_R112_T13QGC_20201014T072440>,
 <Item id=S2A_MSIL2A_20180115T171641_R112_T14QKH_20201014T055635>,
 <Item id=S2A_MSIL2A_20180115T171641_R112_T13QHC_20201014T055632>,
 <Item id=S2A_MSIL2A_20180115T171641_R112_T13QGC_20201014T055628>,
 <Item id=S2B_MSIL2A_20180110T171659_R112_T14QKH_20201014T044829>,
 <Item id=S2B_MSIL2A_20180110T171659_R112_T13QHC_20201014T044823>,
 <Item id=S2B_MSIL2A_20180110T171659_R112_T13QGC_20201014T044819>,
 <Item id=S2A_MSIL2A_20180224T171301_R112_T14QKH_20201014T005446>,
 <Item id=S2A_MSIL2A_20180224T171301_R112_T13QHC_20201014T005443>,
 <Item id=S2A_MSIL2A_20180224T171301_R112_T13QGC_20201014T005440>,
 <Item id=S2A_MSIL2A_20180214T171411_R112_T13QGC_20201013T211153>,
 <Item id=S2A_MSIL2A_20180204T171511_R112_T13QHC_20201013T184611>,
 <Item id=S2A_MSIL2A_20180204T171511_R112_T13QGC_20201013T1846

In [12]:
# Count available tiles for area of interest (Creates a list of available tiles, inside create_raster_by_month() logs available tiles per date vs total of area of interest)
aoi_tiles = []
for i in items:
    # Retrieve current tile
    if satellite == "sentinel-2-l2a":
        tile = i.properties['s2:mgrs_tile']
    elif satellite == "landsat-c2-l2":
        tile = i.properties['landsat:wrs_path']+i.properties['landsat:wrs_row']
    # Append if first find
    if tile not in aoi_tiles:
        aoi_tiles.append(tile)
print(f'Area of interest composed of {len(aoi_tiles)} tile: {aoi_tiles}.')

Area of interest composed of 3 tile: ['14QKH', '13QHC', '13QGC'].


In [11]:
print('Checking available tiles for area of interest')
# df_clouds, date_list = arrange_items(items, satellite=satellite)
df_tile, date_list = aup.available_datasets(items, satellite, query)
# log(f"{len(date_list)} dates available with avg {round(df_clouds['avg_cloud'].mean(),2)}% clouds.")
date_list

Checking available tiles for area of interest


[datetime.date(2018, 1, 10),
 datetime.date(2018, 1, 15),
 datetime.date(2018, 3, 6),
 datetime.date(2018, 12, 21),
 datetime.date(2018, 3, 31),
 datetime.date(2018, 12, 1),
 datetime.date(2018, 3, 16),
 datetime.date(2018, 5, 30),
 datetime.date(2018, 3, 26),
 datetime.date(2018, 1, 20),
 datetime.date(2018, 11, 21),
 datetime.date(2018, 3, 21),
 datetime.date(2018, 2, 4),
 datetime.date(2018, 11, 16),
 datetime.date(2018, 4, 15),
 datetime.date(2018, 5, 20),
 datetime.date(2018, 11, 26),
 datetime.date(2018, 3, 1),
 datetime.date(2018, 4, 5),
 datetime.date(2018, 2, 14),
 datetime.date(2018, 5, 25),
 datetime.date(2018, 11, 6),
 datetime.date(2018, 4, 25),
 datetime.date(2018, 6, 4),
 datetime.date(2018, 10, 2),
 datetime.date(2018, 5, 15),
 datetime.date(2018, 2, 24),
 datetime.date(2018, 5, 10),
 datetime.date(2018, 7, 29),
 datetime.date(2018, 4, 10),
 datetime.date(2018, 12, 11),
 datetime.date(2018, 4, 20),
 datetime.date(2018, 12, 6),
 datetime.date(2018, 7, 4),
 datetime.date(

In [13]:
df_tile

Unnamed: 0,14QKH_cloud,13QHC_cloud,13QGC_cloud,avg_cloud
2018-01-10,0.035265,0.031168,0.054003,0.040145
2018-01-15,0.022598,0.035504,0.102522,0.053541
2018-03-06,0.09843,,0.023621,0.061026
2018-12-21,0.085995,0.061961,0.05999,0.069315
2018-03-31,0.018796,0.036098,0.206417,0.087104
2018-12-01,0.040796,0.058593,0.30746,0.135616
2018-03-16,0.243841,0.20922,0.057123,0.170061
2018-05-30,0.133387,0.101708,0.304017,0.179704
2018-03-26,0.174887,0.267418,0.157274,0.19986
2018-01-20,0.224203,0.300531,0.131367,0.2187


In [15]:
# Create dictionary from links (assets_hrefs is a dict. of dates and links with structure {available_date:{band_n:[link]}})
band_name_list = list(band_name_dict.keys())[:-1]
assets_hrefs = aup.link_dict(band_name_list, items, date_list)
print('Created dictionary from items')
assets_hrefs

Created dictionary from items


{datetime.date(2018, 1, 20): {'nir': ['https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/14/Q/KH/2018/01/20/S2B_MSIL2A_20180120T171619_N0212_R112_T14QKH_20201014T072444.SAFE/GRANULE/L2A_T14QKH_A004571_20180120T172131/IMG_DATA/R10m/T14QKH_20180120T171619_B08_10m.tif?st=2025-10-13T18%3A32%3A28Z&se=2025-10-14T19%3A17%3A28Z&sp=rl&sv=2025-07-05&sr=c&skoid=9c8ff44a-6a2c-4dfb-b298-1c9212f64d9a&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2025-10-14T18%3A26%3A07Z&ske=2025-10-21T18%3A26%3A07Z&sks=b&skv=2025-07-05&sig=I0Y3lBQxtEaLbC8RQKV6sMaP/GBMWyIgefkLPVRqTaA%3D',
   'https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/13/Q/HC/2018/01/20/S2B_MSIL2A_20180120T171619_N0212_R112_T13QHC_20201014T072446.SAFE/GRANULE/L2A_T13QHC_A004571_20180120T172131/IMG_DATA/R10m/T13QHC_20180120T171619_B08_10m.tif?st=2025-10-13T18%3A32%3A28Z&se=2025-10-14T19%3A17%3A28Z&sp=rl&sv=2025-07-05&sr=c&skoid=9c8ff44a-6a2c-4dfb-b298-1c9212f64d9a&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2025-10-14T18%3A2

In [16]:
# Analyze available data according to raster properties (Creates df_len for the first time)
df_len, missing_months = aup.df_date_links(assets_hrefs, start_date, end_date,
                                       band_name_list, freq)

In [17]:
df_len

Unnamed: 0,year,month,data_id,able_to_download
0,2018,1,1,
1,2018,2,1,
2,2018,3,1,
3,2018,4,1,
4,2018,5,1,
5,2018,6,1,
6,2018,7,1,
7,2018,8,1,
8,2018,9,0,
9,2018,10,1,


In [18]:
# Test for missing months, raises errors
if compute_unavailable_dates:
    aup.available_data_check(df_len, missing_months)

In [19]:
# Raster cropping with bounding box from earlier
bounding_box = gpd.GeoDataFrame(geometry=poly).envelope
gdf_bb = gpd.GeoDataFrame(gpd.GeoSeries(bounding_box), columns=['geometry'])
print('Created bounding box for raster cropping')

# Create GeoDataFrame to test nan values in raster
gdf_raster_test = gdf.to_crs(projection_crs).buffer(1)
gdf_raster_test = gdf_raster_test.to_crs("EPSG:4326")
gdf_raster_test = gpd.GeoDataFrame(geometry=gdf_raster_test)#.dissolve() #Ignore to tests nans in each hex since ignoring available_datasets() filter

Created bounding box for raster cropping


In [20]:
# Raster creation - Download raster data by month
#print('Starting raster creation for specified time')
#df_len = create_raster_by_month(df_len, index_analysis, city, tmp_dir,
#                                band_name_dict,date_list, gdf_raster_test,
#                                gdf_bb, area_of_interest, satellite, aoi_tiles,
#                                query=query,compute_unavailable_dates=compute_unavailable_dates)

#### __b-01 - create_raster_by_month() Step by step debug__

In [21]:
from tqdm import tqdm
from datetime import datetime
from dateutil.relativedelta import relativedelta

In [22]:
# Rename variables for argument compatibility inside create_raster_by_month() function
aoi = area_of_interest.copy()
sat = satellite
time_exc_limit = 1500

In [23]:
# if df_len doesn't already exist, save dataframe to temporary directory
df_file_dir = tmp_dir+index_analysis+f'_{city}_dataframe.csv'
if os.path.exists(df_file_dir) == False: # Or folder, will return true or false
    df_len['able_to_download'] = np.nan
    df_len['download_method'] = ''
    df_len.to_csv(df_file_dir, index=False)

# if temporary folder doesn't already exist, create folder to store temporary raster files by iteration
tmp_raster_dir = tmp_dir+'temporary_files/'
if os.path.exists(tmp_raster_dir) == False: # Or folder, will return true or false
    os.mkdir(tmp_raster_dir)

In [24]:
# Time measurement for both processes
import time
processes = ['original','improved'] # 'original' or 'improved'
df_time_processes = pd.DataFrame()

In [25]:
for process in processes:
    
    # Iteration over df_len rows (months)
    for i in tqdm(range(len(df_len)), position=0, leave=True):
    
        # read dataframe in each iteration in case of code crash
        df_raster = pd.read_csv(df_file_dir, index_col=False)
    
        # binary id - checks if current month could be processed
        checker = 0
    
        # gather month and year from df to save raster
        month_ = df_raster.loc[df_raster.index==i].month.values[0]
        year_ = df_raster.loc[df_raster.index==i].year.values[0]
    
        # check if current month's raster already exists
        if f'{city}_{index_analysis}_{month_}_{year_}.tif' in os.listdir(tmp_dir):
            print(f'\n create_raster_by_month() - {city} - Raster for {month_}/{year_} already downloaded. Skipping to next month.')
            df_raster.loc[i,'data_id'] = 11
            df_raster.to_csv(df_file_dir, index=False)
            continue
    
        # check if current month has available links or could be processed (in case of a crash)
        if df_raster.iloc[i].data_id==0:
            print(f'\n create_raster_by_month() - {city} - Raster for {month_}/{year_} not available. Skipping to next month.')
            # In case of a crash, could be reading month whose links were available but could not be processed (data_id turns to 0)
            # In that case, 'download_method' is updated to 'could_not_process'.
            # If not, it is the first time the month is being processed. Update to 'no_links_available'.
            if df_raster.iloc[i].download_method != 'could_not_process':
                df_raster.loc[i,'download_method'] = 'no_links_available'
                df_raster.to_csv(df_file_dir, index=False)
            continue
    
        print(f'\n create_raster_by_month() - {city} - Starting new analysis for {month_}/{year_}')
    
        # creates time range for a specific month
        sample_date = datetime(year_, month_, 1)
        first_day = sample_date + relativedelta(day=1)
        last_day = sample_date + relativedelta(day=31)
        time_of_interest = [f"{year_}-{month_:02d}-{first_day.day:02d}/{year_}"+
                            f"-{month_:02d}-{last_day.day:02d}"]
    
        # create dataframe
        #df_links = pd.DataFrame.from_dict(assets_hrefs,
        #                                orient='Index').reset_index().rename(columns={'index':'date'})
    
        # dates in current month according to cloud coverage
        date_order = [True if (d.month == month_) and (d.year == year_) else False for d in date_list]
        date_array = np.array(date_list)
        date_filter = np.array(date_order)
        dates_ordered = date_array[date_filter]
        #print(f"All dates: {date_list}.")
        #print(f"Dates ordered: {dates_ordered}.")
        
        # mosaic raster iterations (while loop tries max_iter_count times to process all available rasters (dates) in a month)
        max_iter_count = 1
        iter_count = 1
        # create skip date list used to analyze null values in raster
        skip_date_list = []
        
        while iter_count <= max_iter_count:
        
            # --- Gather updated links - Since links expire after some time, they are gathered at each iteration
            # gather links for the date range from planetary computer
            items = aup.gather_items(time_of_interest, aoi, query=query, satellite=sat)
            # gather links from dates that are within date_list
            assets_hrefs = aup.link_dict(band_name_list, items, date_list)
        
            # --- For current month's gathered links, check the total amount of unique tiles and compare to aoi_tiles (logs)
            month_tiles = []
            for item in items:
                # if item's date is in assets_hrefs keys, check for unique tiles
                if item.datetime.date() in list(assets_hrefs.keys()):
                    # For sentinel-2-l2a, gather unique mgrs_tile values
                    if sat == "sentinel-2-l2a":
                        item_tile = item.properties['s2:mgrs_tile']
                        if item_tile not in month_tiles:
                            month_tiles.append(item_tile)
                    # For landsat-c2-l2, gather unique wrs_path + wrs_row values
                    elif sat == "landsat-c2-l2":
                        item_tile = item.properties['landsat:wrs_path'] + item.properties['landsat:wrs_row']
                        if item_tile not in month_tiles:
                            month_tiles.append(item_tile)
        
            if len(aoi_tiles) > len(month_tiles):
                print(f'NOTE: Insufficient tiles to cover area of interest. Needed: {len(aoi_tiles)}, available: {len(month_tiles)}.')
                print(f'NOTE: Available tiles: {month_tiles}. Missing tiles: {list(set(aoi_tiles) - set(month_tiles))}.')
            else:
                print(f'NOTE: Month has all available tiles within area of interest.')
            
            # --- Analyze links in two ways: ordered by cloud coverage and all available links for the month
            # Explanation: 
            # Since satellites pass over different areas on different dates, sometimes analysis by date results in missing data.
            # To solve this, we gather all available links for the month and use them if the date ordered by cloud coverage does not pass the null test.
            
            # In order to avoid duplicating code, the links_iteration() function recieves most of the current function's arguments,
            # while only specific links and dates data are changed.
            common_args_dct = {'skip_date_list':skip_date_list, # List of dates to be skipped because null test failed
                               'iter_count':iter_count, # Current iteration of current month (Used in logs)
                               'time_exc_limit':time_exc_limit, # Specified time limit for downloading a raster
                               'band_name_dict':band_name_dict, # Bands to be used in the raster analysis
                               'gdf_bb':gdf_bb, # Crop the raster to a specific area of interest
                               'tmp_raster_dir':tmp_raster_dir, # Folder to store temporary raster files by iteration
                               'index_analysis':index_analysis, # Current type of analysis
                               'gdf_raster_test':gdf_raster_test, # GeoDataFrame to test nan values in raster
                               'tmp_dir':tmp_dir, # Temporary directory where temporary rasters are saved
                               'city':city, # To save the raster files based on the area of interest's name
                               'month_':month_, # Current month of dates being processed
                               'year_':year_, # Current year of dates being processed
                               'checker':checker, # Checker with value '0' if month has not being processed, 1 when processed
                               }
        
            # --- LINKS ANALIZYS A - ORDERED ACCORDING TO CLOUD COVERAGE [PREFERRED]
            # --- Gather updated links - Since links expire after some time, they are gathered at each iteration
            # Create list of links ordered according to cloud coverage
            
            links_dicts_ordered_lst = []
            for data_position in range(len(dates_ordered)):
                current_link_dct = assets_hrefs[dates_ordered[data_position]]
                links_dicts_ordered_lst.append(current_link_dct)
            
            a="""
            # Processing by ordered dates
            ordered_links_try = 0 #Call the current position in dates_ordered
            for bands_links in links_dicts_ordered_lst:
                print(f"{dates_ordered[ordered_links_try]} - ITERATION {iter_count} - DATE {ordered_links_try+1}/{len(links_dicts_ordered_lst)}.")
                skip_date_list, checker = aup.links_iteration(bands_links = bands_links,
                                                          specific_date = (True, dates_ordered[ordered_links_try]),
                                                          common_args_dct = common_args_dct
                                                         )
                # If succeded current date, stop ordered dates iterations
                if checker==1:
                    break
                # Else, try next date
                ordered_links_try += 1
            # If succeded by any date, stop month's while loop (Doesn't try whole month's available links)
            if checker==1:
                download_method = 'specific_date'
                break
            """

            # Measure processes time consumption
            start = time.time()
            
            if process == 'original':
                # --- LINKS ANALIZYS B - WHOLE MONTH'S AVAILABLE LINKS [BACKUP]
                # Create list of ALL available links for the month
                links_dicts_month = {}
                for current_link_dct in links_dicts_ordered_lst:
                    for band, links in current_link_dct.items():
                        if band not in links_dicts_month:
                            links_dicts_month[band] = []  # Initialize list if band not in dictionary
                        links_dicts_month[band].extend(links) # Append links to the list for the band
                #print(links_dicts_month)
                
                # Processing all available links for the month
                print(f"{month_}/{year_} - MONTH ITERATION {iter_count}.")
                skip_date_list, checker = aup.links_iteration(bands_links = links_dicts_month,
                                                          specific_date = (False, None),
                                                          common_args_dct = common_args_dct
                                                          )
            elif process == 'improved':
                # --- LINKS ANALIZYS B - WHOLE MONTH'S AVAILABLE LINKS [BACKUP]
                # --- Gather updated links - Since links expire after some time, they are gathered at each iteration
                # gather links for the date range from planetary computer
                items = aup.gather_items(time_of_interest, aoi, query=query, satellite=sat)
        
                # --- From month's available links, select only the dates that have min cloud pct for each tile
                # Re-create df_tile (tiles with cloud pct dataframe) for currently explored dates
                df_tile_current, _ = aup.available_datasets(items, satellite, query)
                # Drop 'avg_cloud' column
                df_tile_current.drop(columns=['avg_cloud'],inplace=True)
                # Drop all tile columns with no data (where mean is nan) and list the rest
                df_tile_current = df_tile_current.drop(columns=df_tile_current.columns[df_tile_current.mean(skipna=True).isna()])
                tiles_lst = df_tile_current.columns.to_list()
                # Reset index to place date as a column
                df_tile_current.reset_index(inplace=True)
                df_tile_current.rename(columns={'index':'date'},inplace=True)
                # For each tile, find the date where the clouds percentage is lowest and append date and tile to perform month's analysis
                best_dates_tiles = {}
                for tile in tiles_lst:
                    # Find date where tile has lowest cloud percentage
                    mincloud_idx = df_tile_current[tile].min()
                    mincloud_date = df_tile_current.loc[df_tile_current[tile]==mincloud_idx]['date'].unique()[0]
                    print(f"Tile {tile.replace("_cloud", "")} has lowest cloud coverage on date {mincloud_date}.")
                    # Save date and tile in dictionary
                    if mincloud_date in list(best_dates_tiles.keys()):
                        # Append to existing list
                        tiles_lst = best_dates_tiles[mincloud_date]
                        tiles_lst.append(tile)
                        best_dates_tiles[mincloud_date] = tiles_lst
                    else:
                        # Inicialize list
                        best_dates_tiles[mincloud_date] = [tile]

                # --- FILTER ITEMS
                # Filter specific items based on dates-tiles dict
                dates_month_min_cloud = list(best_dates_tiles.keys())
                filtered_items = []
                for i in items:
                    # If item's date in filtered dates
                    if i.datetime.date() in dates_month_min_cloud:
                        # Check current item's tile
                        if satellite == "sentinel-2-l2a":
                            tile = i.properties['s2:mgrs_tile']
                        elif satellite == "landsat-c2-l2":
                            tile = i.properties['landsat:wrs_path']+i.properties['landsat:wrs_row']
                        tile = tile + '_cloud'
                        # If tile inside dict, append item to filtered_items
                        if tile in best_dates_tiles[i.datetime.date()]:
                            filtered_items.append(i)
                            print(f"Appended item for tile {tile.replace("_cloud", "")} on date {i.datetime.date()} to month fallback analysis.")

                # --- GATHER FILTERED ITEM'S LINKS IN DICTIONARY
                # gather links from dates that are within dates_month_min_cloud
                assets_hrefs = aup.link_dict(band_name_list, filtered_items, dates_month_min_cloud)

                # --- REORGANIZE LINKS BY BAND
                # Create list of BEST available links for the month
                links_dicts_month = {}
                for data_position in range(len(dates_month_min_cloud)):
                    current_link_dct = assets_hrefs[dates_month_min_cloud[data_position]]
                    for band, links in current_link_dct.items():
                        if band not in links_dicts_month:
                            links_dicts_month[band] = []  # Initialize list if band not in dictionary
                        links_dicts_month[band].extend(links) # Append links to the list for the band

                # --- PROCESS
                # Processing all available links for the month
                print(f"{month_}/{year_} - MONTH ITERATION {iter_count}.")
                skip_date_list, checker = aup.links_iteration(bands_links = links_dicts_month,
                                                          specific_date = (False, None),
                                                          common_args_dct = common_args_dct
                                                          )
            # Measure processes time consumption
            end = time.time()
            print("--"*50)
            print("PRINTING PROCESSING TIME COMPARISON")
            print(f'Iteration {iter_count} Process {process} elapsed: {end - start:.2f} seconds.')
            print("--"*50)
            df_time_processes.loc[month_, process] = round(end-start,2)
            
            # If succeded whole month, stop while loop
            if checker==1:
                download_method = 'full_month'
                break
            # Else, try next iteration (If not reached max_iter_count)
            iter_count += 1

  0%|                                                                                                                                                                          | 0/12 [00:00<?, ?it/s]


 create_raster_by_month() - Piedad - Starting new analysis for 1/2018
NOTE: Month has all available tiles within area of interest.
1/2018 - MONTH ITERATION 1.


  8%|█████████████▌                                                                                                                                                    | 1/12 [01:38<18:03, 98.53s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process original elapsed: 97.59 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 2/2018
NOTE: Month has all available tiles within area of interest.
2/2018 - MONTH ITERATION 1.


 17%|██████████████████████████▊                                                                                                                                      | 2/12 [05:34<29:56, 179.61s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process original elapsed: 235.37 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 3/2018
NOTE: Month has all available tiles within area of interest.
3/2018 - MONTH ITERATION 1.


 25%|████████████████████████████████████████▎                                                                                                                        | 3/12 [07:08<21:04, 140.52s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process original elapsed: 92.96 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 4/2018
NOTE: Month has all available tiles within area of interest.
4/2018 - MONTH ITERATION 1.


 33%|█████████████████████████████████████████████████████▋                                                                                                           | 4/12 [08:52<16:46, 125.86s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process original elapsed: 102.34 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 5/2018
NOTE: Month has all available tiles within area of interest.
5/2018 - MONTH ITERATION 1.


 42%|███████████████████████████████████████████████████████████████████                                                                                              | 5/12 [10:37<13:48, 118.32s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process original elapsed: 103.94 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 6/2018
NOTE: Month has all available tiles within area of interest.
6/2018 - MONTH ITERATION 1.


 50%|████████████████████████████████████████████████████████████████████████████████▌                                                                                | 6/12 [12:13<11:04, 110.69s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process original elapsed: 94.89 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 7/2018
NOTE: Month has all available tiles within area of interest.
7/2018 - MONTH ITERATION 1.


 58%|█████████████████████████████████████████████████████████████████████████████████████████████▉                                                                   | 7/12 [13:50<08:52, 106.49s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process original elapsed: 96.85 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 8/2018
NOTE: Insufficient tiles to cover area of interest. Needed: 3, available: 2.
NOTE: Available tiles: ['14QKH', '13QHC']. Missing tiles: ['13QGC'].
8/2018 - MONTH ITERATION 1.


 67%|███████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                     | 8/12 [15:25<06:51, 102.82s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process original elapsed: 93.83 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Raster for 9/2018 not available. Skipping to next month.

 create_raster_by_month() - Piedad - Starting new analysis for 10/2018
NOTE: Insufficient tiles to cover area of interest. Needed: 3, available: 1.
NOTE: Available tiles: ['13QGC']. Missing tiles: ['13QHC', '14QKH'].
10/2018 - MONTH ITERATION 1.


 83%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                          | 10/12 [17:49<02:56, 88.04s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process original elapsed: 142.63 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 11/2018
NOTE: Month has all available tiles within area of interest.
11/2018 - MONTH ITERATION 1.


 92%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌             | 11/12 [19:27<01:30, 90.68s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process original elapsed: 97.35 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 12/2018
NOTE: Month has all available tiles within area of interest.
12/2018 - MONTH ITERATION 1.


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [21:16<00:00, 106.33s/it]


----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process original elapsed: 102.45 seconds.
----------------------------------------------------------------------------------------------------


  0%|                                                                                                                                                                          | 0/12 [00:00<?, ?it/s]


 create_raster_by_month() - Piedad - Starting new analysis for 1/2018
NOTE: Month has all available tiles within area of interest.
Tile 14QKH has lowest cloud coverage on date 2018-01-15.
Tile 13QHC has lowest cloud coverage on date 2018-01-10.
Tile 13QGC has lowest cloud coverage on date 2018-01-10.
Appended item for tile 14QKH on date 2018-01-15 to month fallback analysis.
Appended item for tile 13QHC on date 2018-01-10 to month fallback analysis.
Appended item for tile 13QGC on date 2018-01-10 to month fallback analysis.
1/2018 - MONTH ITERATION 1.


  8%|█████████████▍                                                                                                                                                   | 1/12 [01:42<18:50, 102.74s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process improved elapsed: 101.75 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 2/2018
NOTE: Month has all available tiles within area of interest.
Tile 14QKH has lowest cloud coverage on date 2018-02-24.
Tile 13QHC has lowest cloud coverage on date 2018-02-04.
Tile 13QGC has lowest cloud coverage on date 2018-02-04.
Appended item for tile 14QKH on date 2018-02-24 to month fallback analysis.
Appended item for tile 13QHC on date 2018-02-04 to month fallback analysis.
Appended item for tile 13QGC on date 2018-02-04 to month fallback analysis.
2/2018 - MONTH ITERATION 1.


 17%|██████████████████████████▊                                                                                                                                      | 2/12 [03:21<16:43, 100.34s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process improved elapsed: 97.61 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 3/2018
NOTE: Month has all available tiles within area of interest.
Tile 14QKH has lowest cloud coverage on date 2018-03-31.
Tile 13QHC has lowest cloud coverage on date 2018-03-31.
Tile 13QGC has lowest cloud coverage on date 2018-03-06.
Appended item for tile 14QKH on date 2018-03-31 to month fallback analysis.
Appended item for tile 13QHC on date 2018-03-31 to month fallback analysis.
Appended item for tile 13QGC on date 2018-03-06 to month fallback analysis.
3/2018 - MONTH ITERATION 1.


 25%|████████████████████████████████████████▎                                                                                                                        | 3/12 [05:08<15:31, 103.49s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process improved elapsed: 106.24 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 4/2018
NOTE: Month has all available tiles within area of interest.
Tile 14QKH has lowest cloud coverage on date 2018-04-15.
Tile 13QHC has lowest cloud coverage on date 2018-04-15.
Tile 13QGC has lowest cloud coverage on date 2018-04-05.
Appended item for tile 14QKH on date 2018-04-15 to month fallback analysis.
Appended item for tile 13QHC on date 2018-04-15 to month fallback analysis.
Appended item for tile 13QGC on date 2018-04-05 to month fallback analysis.
4/2018 - MONTH ITERATION 1.


 33%|█████████████████████████████████████████████████████▋                                                                                                           | 4/12 [06:57<14:04, 105.58s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process improved elapsed: 95.04 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 5/2018
NOTE: Month has all available tiles within area of interest.
Tile 14QKH has lowest cloud coverage on date 2018-05-30.
Tile 13QHC has lowest cloud coverage on date 2018-05-30.
Tile 13QGC has lowest cloud coverage on date 2018-05-20.
Appended item for tile 14QKH on date 2018-05-30 to month fallback analysis.
Appended item for tile 13QHC on date 2018-05-30 to month fallback analysis.
Appended item for tile 13QGC on date 2018-05-20 to month fallback analysis.
5/2018 - MONTH ITERATION 1.


 42%|███████████████████████████████████████████████████████████████████                                                                                              | 5/12 [08:33<11:55, 102.25s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process improved elapsed: 95.28 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 6/2018
NOTE: Month has all available tiles within area of interest.
Tile 14QKH has lowest cloud coverage on date 2018-06-04.
Tile 13QHC has lowest cloud coverage on date 2018-06-04.
Tile 13QGC has lowest cloud coverage on date 2018-06-04.
Appended item for tile 14QKH on date 2018-06-04 to month fallback analysis.
Appended item for tile 13QHC on date 2018-06-04 to month fallback analysis.
Appended item for tile 13QGC on date 2018-06-04 to month fallback analysis.
6/2018 - MONTH ITERATION 1.


 50%|████████████████████████████████████████████████████████████████████████████████▌                                                                                | 6/12 [10:16<10:13, 102.32s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process improved elapsed: 101.45 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 7/2018
NOTE: Month has all available tiles within area of interest.
Tile 14QKH has lowest cloud coverage on date 2018-07-29.
Tile 13QHC has lowest cloud coverage on date 2018-07-29.
Tile 13QGC has lowest cloud coverage on date 2018-07-29.
Appended item for tile 14QKH on date 2018-07-29 to month fallback analysis.
Appended item for tile 13QHC on date 2018-07-29 to month fallback analysis.
Appended item for tile 13QGC on date 2018-07-29 to month fallback analysis.
7/2018 - MONTH ITERATION 1.


 58%|█████████████████████████████████████████████████████████████████████████████████████████████▉                                                                   | 7/12 [11:57<08:29, 101.97s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process improved elapsed: 100.21 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 8/2018
NOTE: Insufficient tiles to cover area of interest. Needed: 3, available: 2.
NOTE: Available tiles: ['14QKH', '13QHC']. Missing tiles: ['13QGC'].
Tile 14QKH has lowest cloud coverage on date 2018-08-18.
Tile 13QHC has lowest cloud coverage on date 2018-08-18.
Appended item for tile 14QKH on date 2018-08-18 to month fallback analysis.
Appended item for tile 13QHC on date 2018-08-18 to month fallback analysis.
8/2018 - MONTH ITERATION 1.


 67%|███████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                     | 8/12 [13:35<06:43, 100.77s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process improved elapsed: 97.10 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Raster for 9/2018 not available. Skipping to next month.

 create_raster_by_month() - Piedad - Starting new analysis for 10/2018
NOTE: Insufficient tiles to cover area of interest. Needed: 3, available: 1.
NOTE: Available tiles: ['13QGC']. Missing tiles: ['13QHC', '14QKH'].
Tile 13QGC has lowest cloud coverage on date 2018-10-02.
Appended item for tile 13QGC on date 2018-10-02 to month fallback analysis.
10/2018 - MONTH ITERATION 1.


 83%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                          | 10/12 [16:00<02:54, 87.32s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process improved elapsed: 144.12 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 11/2018
NOTE: Month has all available tiles within area of interest.
Tile 14QKH has lowest cloud coverage on date 2018-11-21.
Tile 13QHC has lowest cloud coverage on date 2018-11-21.
Tile 13QGC has lowest cloud coverage on date 2018-11-26.
Appended item for tile 13QGC on date 2018-11-26 to month fallback analysis.
Appended item for tile 14QKH on date 2018-11-21 to month fallback analysis.
Appended item for tile 13QHC on date 2018-11-21 to month fallback analysis.
11/2018 - MONTH ITERATION 1.


 92%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌             | 11/12 [17:38<01:29, 89.90s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process improved elapsed: 93.92 seconds.
----------------------------------------------------------------------------------------------------

 create_raster_by_month() - Piedad - Starting new analysis for 12/2018
NOTE: Month has all available tiles within area of interest.
Tile 14QKH has lowest cloud coverage on date 2018-12-01.
Tile 13QHC has lowest cloud coverage on date 2018-12-01.
Tile 13QGC has lowest cloud coverage on date 2018-12-21.
Appended item for tile 13QGC on date 2018-12-21 to month fallback analysis.
Appended item for tile 14QKH on date 2018-12-01 to month fallback analysis.
Appended item for tile 13QHC on date 2018-12-01 to month fallback analysis.
12/2018 - MONTH ITERATION 1.


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [19:21<00:00, 96.78s/it]

----------------------------------------------------------------------------------------------------
PRINTING PROCESSING TIME COMPARISON
Iteration 1 Process improved elapsed: 102.04 seconds.
----------------------------------------------------------------------------------------------------





In [26]:
df_time_processes

Unnamed: 0,original,improved
1,97.59,101.75
2,235.37,97.61
3,92.96,106.24
4,102.34,95.04
5,103.94,95.28
6,94.89,101.45
7,96.85,100.21
8,93.83,97.1
10,142.63,144.12
11,97.35,93.92


In [32]:
#df_time_processes_original = df_time_processes.copy()
df_time_processes_original

Unnamed: 0,original
1,619.86
2,196.9
3,620.18
4,596.81
5,613.94


In [35]:
df_time_processes

Unnamed: 0,improved
1,272.4
2,252.8
3,245.7
4,245.33
5,256.78
