# NDVI regular cities running

This notebook tries to __run all missing cities__ (identified on notebook 02_missing_ndvi_temp_cities) using an ordered city list that priorices cities with less tiles. Even though all missing cities will run download_raster_from_pc(), __only cities with no full_month processing (just specific_dates processing) will be processed to hexs and uploaded to the database.__

* NOTE: Cities where no full_month processing is used couldn't be previously processed not because of the availability of tiles in specific_dates, but because of a __(fixed) bug in available_datasets()__

## Processing summary
#### __Cities with no full_month processing, just specific_date processing:__
__Logs 2025-09-24__
* __Tuxtla__ (2 tiles) started with 40% missing, could not process 2/2021 and ended up with _Missing more than 50 percent of data points_ 
* __Piedad__ (2 tiles) started with 39% missing, and ended with 39% missing, [UPLOADED TO DB]
* __Cordoba__ (2 tiles) started with 19% missing, ended with 26%,  [UPLOADED TO DB]
* __Orizaba__ (2 tiles) started with 25% missing, ended with 26%,  [UPLOADED TO DB]
* __Morelia__ (3 tiles) started with 18% missing, could not process 5/2018 and ended up with _Multiple missing months together_.
* __Cancun__ (3 tiles) started with 11% missing, ended with 32% , [UPLOADED TO DB]
* __Playa__ (3 tiles) started with 18% missing, could not process 10/2019 and ended up with _Multiple missing months together_.
* __Culiacan__ (3 tiles) started with 1% missing, ended with 19%  [UPLOADED TO DB, BUT ERROR IN RES 11, FROM HEX SOURCE]

__Logs 2025-09-25__
* __Guaymas__ (3 tiles) started with 1% missing, ended with 8%  [UPLOADED TO DB]
#### __Finally testing with this city:__

## Import libraries

In [1]:
from pathlib import Path
current_path = Path().resolve()
for parent in current_path.parents:
    if parent.name == "accesibilidad-urbana":
        module_path = str(parent)+'/'
        break
print(module_path)

/home/jovyan/accesibilidad-urbana/


In [2]:
import os
import sys

import pandas as pd
import geopandas as gpd
import osmnx as ox
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

if module_path not in sys.path:
    sys.path.append(module_path)
import aup

## __Notebook config__

In [3]:
# Missing processing from
ndvi_3_tiles = ['Tampico']
ndvi_4_tiles = ['Campeche','Celaya','Guanajuato','Leon','Irapuato','SLP','Merida']
city_list = ndvi_3_tiles + ndvi_4_tiles
# Saving
save_output_database = True
save_output_locally = True

## __NDVI config__

In [4]:
band_name_dict = {'nir08':[False], #If GSD(resolution) of band is different, set True.
                   'red':[False], #If GSD(resolution) of band is different, set True.
                   'eq':['(nir08-red)/(nir08+red)']}
query_sat = {"eo:cloud_cover": {"lt": 15},
          "platform": {"in": ["landsat-8", "landsat-9"]}}
index_analysis = 'ndvi'
tmp_dir = module_path + f'data/processed/tmp_{index_analysis}/'
res = [8,11]
freq = 'MS'
start_date = '2018-01-01'
end_date = '2023-12-31'
satellite = 'landsat-c2-l2'

print(tmp_dir)

/home/jovyan/accesibilidad-urbana/data/processed/tmp_ndvi/


## __Secondary functions -__ raster_to_save_hex()

In [5]:
def raster_to_hex_save(hex_gdf_i, df_len, index_analysis, tmp_dir, city, r, save, local_save=False, i=0):
    print(f'Translating raster to hexagon for res: {r}')

    hex_raster_analysis, df_raster_analysis = aup.raster_to_hex_analysis(hex_gdf_i, df_len, index_analysis,
                                                                tmp_dir, city, r)
    print('Finished assigning raster data to hexagons')
    print(f'df nan values: {df_raster_analysis[index_analysis].isna().sum()}')
    if df_raster_analysis[index_analysis].isna().sum() > 0:
        raise NanValues('NaN values are still present after processing')
    
    # local save (test)
    if local_save:
        # Create folder to store local save
        localsave_dir = tmp_dir+'local_save/'
        if os.path.exists(localsave_dir) == False:
            os.mkdir(localsave_dir)

        # Local save
        #hex_raster_analysis.to_file(tmp_dir+'local_save/'+f'{city}_{index_analysis}_HexRes{r}_v{i}.geojson')
        df_raster_analysis.to_csv(localsave_dir+f'{city}_{index_analysis}_HexRes{r}_v{i}.csv')

    # Save - upload to database
    if save:
        upload_chunk = 150000
        print(f'Starting upload for res: {r}')

        if r == 8:
            # df upload
            #aup.df_to_db_slow(df_raster_analysis, f'{index_analysis}_complete_dataset_hex',
            #                'raster_analysis', if_exists='append', chunksize=upload_chunk)
            # gdf upload
            aup.gdf_to_db_slow(hex_raster_analysis, f'{index_analysis}_analysis_hex',
                            'raster_analysis', if_exists='append')

        else:
            # df upload
            #limit_len = 5000000
            #if len(df_raster_analysis)>limit_len:
            #    c_upload = len(df_raster_analysis)/limit_len
            #    for k in range(int(c_upload)+1):
            #        print(f"Starting range k = {k} of {int(c_upload)}")
            #        df_inter_upload = df_raster_analysis.iloc[int(limit_len*k):int(limit_len*(1+k))].copy()
            #        aup.df_to_db(df_inter_upload,f'{index_analysis}_complete_dataset_hex',
            #                        'raster_analysis', if_exists='append')
            #else:
            #    aup.df_to_db(df_raster_analysis,f'{index_analysis}_complete_dataset_hex',
            #                        'raster_analysis', if_exists='append')
            # gdf upload
            aup.gdf_to_db_slow(hex_raster_analysis, f'{index_analysis}_analysis_hex',
                            'raster_analysis', if_exists='append')
        print(f'Finished uploading data for res{r}')
        
    # delete variables
    del df_raster_analysis
    del hex_raster_analysis

## __Main function__

In [6]:
failed_cities = {}

In [7]:
for city in city_list:
    print(f"STARTING {city}.")
    ############################### CREATE AREA OF INTEREST
    ### Create city area of interest with biggest hexs
    big_res = min(res)
    schema_hex = 'hexgrid'
    table_hex = f'hexgrid_{big_res}_city_2020'
    
    # Download hexagons with type=urban
    type = 'urban'
    query = f"SELECT hex_id_{big_res},geometry FROM {schema_hex}.{table_hex} WHERE \"city\" = '{city}\' AND \"type\" = '{type}\'"
    hex_urban = aup.gdf_from_query(query, geometry_col='geometry')
    
    # Download hexagons with type=rural within 500m buffer
    poly = hex_urban.to_crs("EPSG:6372").buffer(500).reset_index()
    poly = poly.to_crs("EPSG:4326")
    poly_wkt = poly.dissolve().geometry.to_wkt()[0]
    type = 'rural'
    query = f"SELECT hex_id_{big_res},geometry FROM {schema_hex}.{table_hex} WHERE \"city\" = '{city}\' AND \"type\" = '{type}\' AND (ST_Intersects(geometry, \'SRID=4326;{poly_wkt}\'))"
    hex_rural = aup.gdf_from_query(query, geometry_col='geometry')
    
    # Concatenate urban and rural hex
    hex_city = pd.concat([hex_urban, hex_rural])

    print(f"{city} - Created hex_city.")
    
    ############################### DOWNLOAD AND INTERPOLATE RASTERS
    try:
        df_len = aup.download_raster_from_pc(hex_city, index_analysis, city, freq,
                                     start_date, end_date, tmp_dir, band_name_dict, 
                                     query=query_sat, satellite = satellite,
                                     compute_unavailable_dates=True)
        print(f"{city} - Created df_len.")
    except:
        print(f"{city} - Failed df_len.")
        failed_cities[city] = 'df_len'
        continue

    ############################### RASTERS TO HEX
    # Do NOT process and upload if used full_month processing since it is still a WIP.
    if 'full_month' in df_len.download_method.unique():
        full_months = len(df_len.loc[df_len.download_method == 'full_month'].copy())
        print(f"---------------------------------------")
        print(f"{city} - Has {full_months} months that used 'full_month' processing.")
        print(f"{city} - NOT PROCESSING TO HEXS AND NOT SAVING TO DATABASE.")
        print(f"---------------------------------------")
        continue
    
    ### hex preprocessing
    print(f"{city} - Started loading hexagons at different resolutions.")
    
    # Create res_list
    res_list=[]
    for r in range(res[0],res[-1]+1):
        res_list.append(r)
    
    # Load hexgrids
    hex_gdf = hex_city.copy()
    hex_gdf.rename(columns={f'hex_id_{big_res}':'hex_id'}, inplace=True)
    hex_gdf['res'] = big_res
    
    print(f"{city} Loaded hexgrid res {big_res}.")
    
    for r in res_list:
        # biggest resolution already loaded
        if r == big_res:
            continue
        
        # Load hexgrid
        table_hex = f'hexgrid_{r}_city_2020'
        query = f"SELECT hex_id_{r},geometry FROM {schema_hex}.{table_hex} WHERE \"city\"=\'{city}\' AND  (ST_Intersects(geometry, \'SRID=4326;{poly_wkt}\'))"
        hex_tmp = aup.gdf_from_query(query, geometry_col='geometry')
        # Format hexgrid
        hex_tmp.rename(columns={f'hex_id_{r}':'hex_id'}, inplace=True)
        hex_tmp['res'] = r
        # Concatenate to hex_gdf
        hex_gdf = pd.concat([hex_gdf, hex_tmp])
    
        print(f"{city} - Loaded hexgrid res {r}.")
    
        del hex_tmp
    
    print(f"{city} - Finished creating hexagons at different resolutions.")
    
    # Raster to hex function for each resolution (saves output)
    for r in list(hex_gdf.res.unique()):
    
        print(f"---------------------------------------")
        print(f"{city} - STARTING processing for resolution {r}.")
    
        processing_chunk = 20000 # Use 20,000 max, crashed on DELL laptop with 50,000
    
        # filters hexagons at specified resolution
        hex_gdf_res = hex_gdf.loc[hex_gdf.res==r].copy()
        hex_gdf_res = hex_gdf_res.reset_index(drop=True)
    
        if len(hex_gdf_res)>processing_chunk:
            print(f'hex_gdf_res len: {len(hex_gdf_res)} is bigger than processing chunk: {processing_chunk}')
            c_processing = len(hex_gdf_res)/processing_chunk
            print(f'There are {round(c_processing)} processes')
            for i in range(int(c_processing)+1):
                print(f'Processing from {i*processing_chunk} to {(i+1)*processing_chunk}')
                hex_gdf_i = hex_gdf_res.iloc[int(processing_chunk*i):int(processing_chunk*(1+i))].copy()
                raster_to_hex_save(hex_gdf_i, df_len, index_analysis, tmp_dir, city, r, 
                                   save = save_output_database, 
                                   local_save = save_output_locally, 
                                   i = i
                                  )
        else:
            print('hex_gdf len smaller than processing chunk')
            hex_gdf_i = hex_gdf_res.copy()
            raster_to_hex_save(hex_gdf_i, df_len, index_analysis, tmp_dir, city, r, 
                               save = save_output_database, 
                               local_save = save_output_locally, 
                              )

STARTING Guaymas.
Guaymas - Created hex_city.


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 72/72 [00:00<00:00, 433.71it/s]


Guaymas - Created df_len.
Guaymas - Started loading hexagons at different resolutions.
Loaded hexgrid res 8
Loaded hexgrid res 9
Loaded hexgrid res 10
Loaded hexgrid res 11
Guaymas - Finished creating hexagons at different resolutions.
---------------------------------------
STARTING processing for resolution 8.
hex_gdf len smaller than processing chunk
Translating raster to hexagon for res: 8


  0%|                                                                                                                                  | 0/6 [00:00<?, ?it/s]
 17%|████████████████████▎                                                                                                     | 1/6 [00:01<00:05,  1.15s/it][A

  0%|                                                                                                                                 | 0/12 [00:01<?, ?it/s][A[A
 33%|████████████████████████████████████████▋                                                                                 | 2/6 [00:02<00:04,  1.06s/it]
  0%|                                                                                                                                 | 0/12 [00:00<?, ?it/s][A
 50%|█████████████████████████████████████████████████████████████                                                             | 3/6 [00:03<00:03,  1.01s/it]

  0%|                                 

Finished assigning raster data to hexagons
df nan values: 0
Starting upload for res: 8
Finished uploading data for res8
---------------------------------------
STARTING processing for resolution 9.
hex_gdf len smaller than processing chunk
Translating raster to hexagon for res: 9


  0%|                                                                                                                                  | 0/6 [00:00<?, ?it/s]
 17%|████████████████████▎                                                                                                     | 1/6 [00:04<00:23,  4.69s/it][A

  0%|                                                                                                                                 | 0/12 [00:04<?, ?it/s][A[A
 33%|████████████████████████████████████████▋                                                                                 | 2/6 [00:09<00:19,  4.81s/it]
  0%|                                                                                                                                 | 0/12 [00:04<?, ?it/s][A
 50%|█████████████████████████████████████████████████████████████                                                             | 3/6 [00:14<00:14,  4.92s/it]

  0%|                                 

Finished assigning raster data to hexagons
df nan values: 0
Starting upload for res: 9
Finished uploading data for res9
---------------------------------------
STARTING processing for resolution 10.
hex_gdf len smaller than processing chunk
Translating raster to hexagon for res: 10


  0%|                                                                                                                                  | 0/6 [00:00<?, ?it/s]
 17%|████████████████████▎                                                                                                     | 1/6 [00:33<02:46, 33.25s/it][A

  0%|                                                                                                                                 | 0/12 [00:33<?, ?it/s][A[A
 33%|████████████████████████████████████████▋                                                                                 | 2/6 [01:08<02:17, 34.32s/it]
  0%|                                                                                                                                 | 0/12 [00:35<?, ?it/s][A
 50%|█████████████████████████████████████████████████████████████                                                             | 3/6 [01:44<01:44, 34.98s/it]

  0%|                                 

Finished assigning raster data to hexagons
df nan values: 0
Starting upload for res: 10
Finished uploading data for res10
---------------------------------------
STARTING processing for resolution 11.
hex_gdf_res len: 75049 is bigger than processing chunk: 20000
There are 4 processes
Processing from 0 to 20000
Translating raster to hexagon for res: 11


  0%|                                                                                                                                                                      | 0/6 [00:00<?, ?it/s]
 17%|██████████████████████████▎                                                                                                                                   | 1/6 [01:25<07:08, 85.66s/it][A

  0%|                                                                                                                                                                     | 0/12 [01:25<?, ?it/s][A[A
 33%|████████████████████████████████████████████████████▋                                                                                                         | 2/6 [02:49<05:39, 84.81s/it]
  0%|                                                                                                                                                                     | 0/12 [01:24<?, ?it/s][A
 50%|████████████

Finished assigning raster data to hexagons
df nan values: 0
Starting upload for res: 11
Finished uploading data for res11
Processing from 20000 to 40000
Translating raster to hexagon for res: 11


  0%|                                                                                                                                                                      | 0/6 [00:00<?, ?it/s]
 17%|██████████████████████████▎                                                                                                                                   | 1/6 [01:12<06:00, 72.06s/it][A

  0%|                                                                                                                                                                     | 0/12 [01:12<?, ?it/s][A[A
 33%|████████████████████████████████████████████████████▋                                                                                                         | 2/6 [02:26<04:54, 73.64s/it]
  0%|                                                                                                                                                                     | 0/12 [01:14<?, ?it/s][A
 50%|████████████

Finished assigning raster data to hexagons
df nan values: 0
Starting upload for res: 11
Finished uploading data for res11
Processing from 40000 to 60000
Translating raster to hexagon for res: 11


  0%|                                                                                                                                                                      | 0/6 [00:00<?, ?it/s]
 17%|██████████████████████████▎                                                                                                                                   | 1/6 [01:12<06:03, 72.70s/it][A

  0%|                                                                                                                                                                     | 0/12 [01:12<?, ?it/s][A[A
 33%|████████████████████████████████████████████████████▋                                                                                                         | 2/6 [02:27<04:54, 73.72s/it]
  0%|                                                                                                                                                                     | 0/12 [01:14<?, ?it/s][A
 50%|████████████

Finished assigning raster data to hexagons
df nan values: 0
Starting upload for res: 11
Finished uploading data for res11
Processing from 60000 to 80000
Translating raster to hexagon for res: 11


  0%|                                                                                                                                                                      | 0/6 [00:00<?, ?it/s]
 17%|██████████████████████████▎                                                                                                                                   | 1/6 [01:02<05:11, 62.25s/it][A

  0%|                                                                                                                                                                     | 0/12 [01:02<?, ?it/s][A[A
 33%|████████████████████████████████████████████████████▋                                                                                                         | 2/6 [02:05<04:12, 63.09s/it]
  0%|                                                                                                                                                                     | 0/12 [01:03<?, ?it/s][A
 50%|████████████

Finished assigning raster data to hexagons
df nan values: 0
Starting upload for res: 11
Finished uploading data for res11
