# 20250923 join months test with ndvi

Previous context: After finding the problem on notebook 20250903 (Specific date analysis in some cities could not provide enough raster tiles for the area of interest to be processed), the implemented solution (Backup: Processing all rasters found in various dates within each month, all together as a unified mosaic) was succesfully tested in Puebla's temperature analysis.

This notebook tests the process using a city with only two raster tiles in order to __analyse whether joining all tiles in a month plan creates variability depending on the raster tile footprint__ since the process joins __different dates and hours together.__

#### __Tried but skipped these cities:__
__Logs 2025-09-24__
* __Tuxtla__ (2 tiles) started with 40% missing, ended up _Missing more than 50 percent of data points_ when 2/2021 failed.
* __Piedad__ (2 tiles) started with 39% missing, and ended with 39% missing, __but all downloaded months used a specific date__ (Not a mosaic with all dates in a month). It didn't work previously because of a bug in available_datasets().
* __Cordoba__ (2 tiles) started with 19% missing, ended with 26%, __but all downloaded months used a specific date__ (Not a mosaic with all dates in a month). It didn't work previously because of a bug in available_datasets().
* __Orizaba__ (2 tiles) started with 25% missing, ended with 26%, __but all downloaded months used a specific date__ (Not a mosaic with all dates in a month). It didn't work previously because of a bug in available_datasets().
* __Morelia__ (3 tiles) started with 18% missing, could not process 5/2018 and ended up with _Multiple missing months together_.
* __Cancun__ (3 tiles) started with 11% missing, ended with 32% ,__but all downloaded months used a specific date__ (Not a mosaic with all dates in a month). It didn't work previously because of a bug in available_datasets().
* __Playa__ (3 tiles) started with 18% missing, could not process 10/2019 and ended up with _Multiple missing months together_.
* __Culiacan__ (3 tiles) started with 1% missing, ended with 19% __but all downloaded months used a specific date__ (Not a mosaic with all dates in a month). It didn't work previously because of a bug in available_datasets().

__Logs 2025-09-25__
* __Guaymas__ (3 tiles) started with 1% missing, ended with 8% __but all downloaded months used a specific date__ (Not a mosaic with all dates in a month). It didn't work previously because of a bug in available_datasets().
#### __Finally testing with this city:__

## __Import libraries__

In [1]:
from pathlib import Path
current_path = Path().resolve()
for parent in current_path.parents:
    if parent.name == "accesibilidad-urbana":
        module_path = str(parent)+'/'
        break
print(module_path)

/home/jovyan/accesibilidad-urbana/


In [2]:
import os
import sys

import pandas as pd
import geopandas as gpd
import osmnx as ox
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

if module_path not in sys.path:
    sys.path.append(module_path)
import aup

## __From Script 19: Config notebook__

In [3]:
city = 'Guaymas'
# Saving
save_output_database = False
save_output_locally = True
# In case of error when saving
already_processed_res = []

In [4]:
band_name_dict = {'nir08':[False], #If GSD(resolution) of band is different, set True.
                   'red':[False], #If GSD(resolution) of band is different, set True.
                   'eq':['(nir08-red)/(nir08+red)']}
query_sat = {"eo:cloud_cover": {"lt": 15},
          "platform": {"in": ["landsat-8", "landsat-9"]}}
index_analysis = 'ndvi'
tmp_dir = module_path + f'data/processed/tmp_{index_analysis}/'
res = [8,9]
freq = 'MS'
start_date = '2018-01-01'
end_date = '2023-12-31'
satellite = 'landsat-c2-l2'
save = True  # True
del_data = False # True

print(tmp_dir)

/home/jovyan/accesibilidad-urbana/data/processed/tmp_ndvi/


## __From Script 19: Main function__

### __Main function__ - Create hex_city

In [5]:
###############################
### Create city area of interest with biggest hexs
big_res = min(res)
schema_hex = 'hexgrid'
table_hex = f'hexgrid_{big_res}_city_2020'

# Download hexagons with type=urban
type = 'urban'
query = f"SELECT hex_id_{big_res},geometry FROM {schema_hex}.{table_hex} WHERE \"city\" = '{city}\' AND \"type\" = '{type}\'"
hex_urban = aup.gdf_from_query(query, geometry_col='geometry')

# Download hexagons with type=rural within 500m buffer
poly = hex_urban.to_crs("EPSG:6372").buffer(500).reset_index()
poly = poly.to_crs("EPSG:4326")
poly_wkt = poly.dissolve().geometry.to_wkt()[0]
type = 'rural'
query = f"SELECT hex_id_{big_res},geometry FROM {schema_hex}.{table_hex} WHERE \"city\" = '{city}\' AND \"type\" = '{type}\' AND (ST_Intersects(geometry, \'SRID=4326;{poly_wkt}\'))"
hex_rural = aup.gdf_from_query(query, geometry_col='geometry')

# Concatenate urban and rural hex
hex_city = pd.concat([hex_urban, hex_rural])

# Show
print(f'Downloaded {len(hex_city)} hexagon features')
print(hex_city.shape)
print(hex_city.crs)
hex_city.head(2)

Downloaded 257 hexagon features
(257, 2)
epsg:4326


Unnamed: 0,hex_id_8,geometry
0,88480bb197fffff,"POLYGON ((-110.80285 27.96357, -110.79822 27.9..."
1,88480bb1a1fffff,"POLYGON ((-110.75951 27.95357, -110.75487 27.9..."


### __Main function__ - download_raster_from_pc

#### __download_raster_from_pc__ - Try entire function with compute_unavailable_dates=False

In [6]:
df_len = aup.download_raster_from_pc(hex_city, index_analysis, city, freq,
                                     start_date, end_date, tmp_dir, band_name_dict, 
                                     query=query_sat, satellite = satellite,
                                     compute_unavailable_dates=True)

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 72/72 [2:07:25<00:00, 106.19s/it]


In [7]:
def raster_to_hex_save(hex_gdf_i, df_len, index_analysis, tmp_dir, city, r, save, local_save=False, i=0):
    print(f'Translating raster to hexagon for res: {r}')

    hex_raster_analysis, df_raster_analysis = aup.raster_to_hex_analysis(hex_gdf_i, df_len, index_analysis,
                                                                tmp_dir, city, r)
    print('Finished assigning raster data to hexagons')
    print(f'df nan values: {df_raster_analysis[index_analysis].isna().sum()}')
    if df_raster_analysis[index_analysis].isna().sum() > 0:
        raise NanValues('NaN values are still present after processing')
    
    # local save (test)
    if local_save:
        # Create folder to store local save
        localsave_dir = tmp_dir+'local_save/'
        if os.path.exists(localsave_dir) == False:
            os.mkdir(localsave_dir)

        # Local save
        #hex_raster_analysis.to_file(tmp_dir+'local_save/'+f'{city}_{index_analysis}_HexRes{r}_v{i}.geojson')
        #df_raster_analysis.to_csv(localsave_dir+f'{city}_{index_analysis}_HexRes{r}_v{i}.csv')

    # Save - upload to database
    if save:
        upload_chunk = 150000
        print(f'Starting upload for res: {r}')

        if r == 8:
            # df upload
            #aup.df_to_db_slow(df_raster_analysis, f'{index_analysis}_complete_dataset_hex',
            #                'raster_analysis', if_exists='append', chunksize=upload_chunk)
            # gdf upload
            aup.gdf_to_db_slow(hex_raster_analysis, f'{index_analysis}_analysis_hex',
                            'raster_analysis', if_exists='append')

        else:
            # df upload
            #limit_len = 5000000
            #if len(df_raster_analysis)>limit_len:
            #    c_upload = len(df_raster_analysis)/limit_len
            #    for k in range(int(c_upload)+1):
            #        print(f"Starting range k = {k} of {int(c_upload)}")
            #        df_inter_upload = df_raster_analysis.iloc[int(limit_len*k):int(limit_len*(1+k))].copy()
            #        aup.df_to_db(df_inter_upload,f'{index_analysis}_complete_dataset_hex',
            #                        'raster_analysis', if_exists='append')
            #else:
            #    aup.df_to_db(df_raster_analysis,f'{index_analysis}_complete_dataset_hex',
            #                        'raster_analysis', if_exists='append')
            # gdf upload
            aup.gdf_to_db_slow(hex_raster_analysis, f'{index_analysis}_analysis_hex',
                            'raster_analysis', if_exists='append')
        print(f'Finished uploading data for res{r}')
        
    # delete variables
    del df_raster_analysis
    del hex_raster_analysis

In [8]:
# ------------------------------ RASTERS TO HEX ------------------------------
### hex preprocessing
print('Started loading hexagons at different resolutions')

# Create res_list
res_list=[]
for r in range(res[0],res[-1]+1):
    res_list.append(r)

# Load hexgrids
hex_gdf = hex_city.copy()
hex_gdf.rename(columns={f'hex_id_{big_res}':'hex_id'}, inplace=True)
hex_gdf['res'] = big_res

print(f'Loaded hexgrid res {big_res}')

for r in res_list:
    # biggest resolution already loaded
    if r == big_res:
        continue
    
    # Load hexgrid
    table_hex = f'hexgrid_{r}_city_2020'
    query = f"SELECT hex_id_{r},geometry FROM {schema_hex}.{table_hex} WHERE \"city\"=\'{city}\' AND  (ST_Intersects(geometry, \'SRID=4326;{poly_wkt}\'))"
    hex_tmp = aup.gdf_from_query(query, geometry_col='geometry')
    # Format hexgrid
    hex_tmp.rename(columns={f'hex_id_{r}':'hex_id'}, inplace=True)
    hex_tmp['res'] = r
    # Concatenate to hex_gdf
    hex_gdf = pd.concat([hex_gdf, hex_tmp])

    print(f'Loaded hexgrid res {r}')

    del hex_tmp

print('Finished creating hexagons at different resolutions')

# Raster to hex function for each resolution (saves output)
for r in list(hex_gdf.res.unique()):

    if r in already_processed_res:
        continue

    print(f'---------------------------------------')
    print(f'STARTING processing for resolution {r}.')

    processing_chunk = 20000 # Use 20,000 max, crashed on DELL laptop with 50,000

    # filters hexagons at specified resolution
    hex_gdf_res = hex_gdf.loc[hex_gdf.res==r].copy()
    hex_gdf_res = hex_gdf_res.reset_index(drop=True)

    if len(hex_gdf_res)>processing_chunk:
        print(f'hex_gdf_res len: {len(hex_gdf_res)} is bigger than processing chunk: {processing_chunk}')
        c_processing = len(hex_gdf_res)/processing_chunk
        print(f'There are {round(c_processing)} processes')
        for i in range(int(c_processing)+1):
            print(f'Processing from {i*processing_chunk} to {(i+1)*processing_chunk}')
            hex_gdf_i = hex_gdf_res.iloc[int(processing_chunk*i):int(processing_chunk*(1+i))].copy()
            raster_to_hex_save(hex_gdf_i, df_len, index_analysis, tmp_dir, city, r, 
                               save = save_output_database, 
                               local_save = save_output_locally, 
                               i = i
                              )

    else:
        print('hex_gdf len smaller than processing chunk')
        hex_gdf_i = hex_gdf_res.copy()
        raster_to_hex_save(hex_gdf_i, df_len, index_analysis, tmp_dir, city, r, 
                           save = save_output_database, 
                           local_save = save_output_locally, 
                          )

Started loading hexagons at different resolutions
Loaded hexgrid res 8
Loaded hexgrid res 9
Finished creating hexagons at different resolutions
---------------------------------------
STARTING processing for resolution 8.
hex_gdf len smaller than processing chunk
Translating raster to hexagon for res: 8


  0%|                                                                                                                                                                 | 0/6 [00:00<?, ?it/s]
  0%|                                                                                                                                                                | 0/12 [00:00<?, ?it/s][AException during reset or similar
Traceback (most recent call last):
  File "/opt/conda/envs/gds/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 988, in _finalize_fairy
    fairy._reset(
  File "/opt/conda/envs/gds/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 1438, in _reset
    pool._dialect.do_rollback(self)
  File "/opt/conda/envs/gds/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 692, in do_rollback
    dbapi_connection.rollback()
psycopg2.OperationalError: SSL error: wrong version number

Exception during reset or similar
Traceback (most recent call last):
  File "/o

Finished assigning raster data to hexagons
df nan values: 0
---------------------------------------
STARTING processing for resolution 9.
hex_gdf len smaller than processing chunk
Translating raster to hexagon for res: 9


  0%|                                                                                                                                                                 | 0/6 [00:00<?, ?it/s]
 17%|█████████████████████████▌                                                                                                                               | 1/6 [00:03<00:15,  3.04s/it][A

  0%|                                                                                                                                                                | 0/12 [00:03<?, ?it/s][A[A
 33%|███████████████████████████████████████████████████                                                                                                      | 2/6 [00:06<00:13,  3.31s/it]
  0%|                                                                                                                                                                | 0/12 [00:03<?, ?it/s][A
 50%|█████████████████████████████████████

Finished assigning raster data to hexagons
df nan values: 0
