# Ciudades de 15 minutos - resolución 9

Notebook para la prueba de la metodología de ciudades de 15 minutos. El acercamiento del Observatorio de Ciudades se basa en agrupar las amenidades en 4 ejes:
* Educación
* Servicios comunitarios
* Comercio
* Entretenimiento

Cada uno de estos ejes se subdivide en amenidades específicas:

* Educación
    * Preescolar
    * Primaria
    * Secundaria
* Servicios comunitarios:
    * Centro de salud - lo traducimos como primer contacto a salud (que incluye farmacias con médicos)
    * Gobierno - oficinas de gobierno
    * Asistencia social - DIF
    * Cuidados - Guarderías
* Comercio:
    * Alimentos - sitios para la adquisición de alimentos
    * Comercio personal - peluquerías y venta de ropa
    * Farmacias
    * Hogar - Ferretería y tlapalería y artículos de limpieza
    * Complementario - sitios de comercio complementario como venta de ropa, calzado, muebles, lavandería, pintura y revistas
* Entretenimiento
    * Actividad física - espacios de recreación al aire libre como parques, canchas, unidades deportivas o parques naturales
    * Social - sitios de esparcimiento social como restaurantes, bares y cafés
    * Cultural - espacios de recreación cultural como museos o cines

Para calcular si un hexágono cumple o no con lo neceasrio para ser ciudad de 15 minutos se toma el tiempo máximo a una de las amenidades y esa se registra en el hexágono, si ese tiempo es menor a 15, se considera que cumple, de lo contrario no.

In [1]:
import os
import sys

import pandas as pd
import geopandas as gpd
import osmnx as ox
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

module_path = os.path.abspath(os.path.join('../../'))
if module_path not in sys.path:
    sys.path.append(module_path)
    import aup



In [30]:
def main(city, save=False, save_disk_space = True):
    
    print('STARTING ANALYSIS FOR {}'.format(city))
    
    #--------------- DOWNLOAD DATA
    #Download hexagons with pop data (Based on cvegeo)
    hex_pop = gpd.GeoDataFrame()
    hex_schema = 'censo'
    hex_table = 'hex_censo_mza_2020_res9'
    
    query = f"SELECT * FROM {hex_schema}.{hex_table} WHERE \"metropolis\" LIKE \'{city}\'"
    hex_pop = aup.gdf_from_query(query, geometry_col='geometry')
    
    pob_tot = hex_pop.pobtot.sum()
    print('Downloaded hex data with a total of {} persons'.format(pob_tot))
        
    #Download nodes (Based on city)
    nodes_schema = 'prox_analysis'
    nodes_table = 'nodes_proximity_2020'
    
    query = f"SELECT * FROM {nodes_schema}.{nodes_table} WHERE \"metropolis\" LIKE \'{city}\'"
    nodes = aup.gdf_from_query(query, geometry_col='geometry')
    
    print('Downloaded a total of {} nodes'.format(nodes.shape[0]))
    
    #--------------- PREPARE DATA
    #--------------- PREPARE DATA ---------- DELETE DUPLICATES AND CLEAN NODES
    #This step keeps osmid, geometry and metropolis (without duplicates, keeping only one point for each node) to store times to each amenity source by node in following loop.
    nodes_geom = nodes.drop_duplicates(subset='osmid', keep="last")[['osmid','geometry','metropolis']].copy()
    
    #--------------- PREPARE DATA ---------- REORGANIZE NODES DATA
    #This step organizes data by nodes by changing (time to source amenities) from rows (1 column with source amenity name + 1 column with time data) 
    #to columns (1 column with time data named after its source amenity)
    nodes_analysis = nodes_geom.copy()

    for source_amenity in list(nodes.amenity.unique()):
        nodes_tmp = nodes.loc[nodes.amenity == source_amenity,['osmid','time']]
        nodes_tmp = nodes_tmp.rename(columns={'time':source_amenity})
        # Search for amenities that aren't present in the city (with all values marked as 0) and change them to NaN
        if nodes_tmp[source_amenity].mean() == 0:
            nodes_tmp[source_amenity] = np.nan
        nodes_analysis = nodes_analysis.merge(nodes_tmp, on='osmid')

    if save_disk_space:
        del nodes_geom
        del nodes_tmp
        
    print("Transformed nodes data")
        
    #--------------- PREPARE DATA ---------- SET PARAMETER DEFINITIONS
    #This step sets the ejes, amenidades, sources and weights for further analysis
    #{Eje (e):
    #         {Amenidad (a):
    #                       {Source (s))}}}

    idx_15_min = {'Escuelas':{'Preescolar':['denue_preescolar'],
                             'Primaria':['denue_primaria'],
                             'Secundaria':['denue_secundaria']},
                 'Servicios comunitarios':{'Salud':['clues_primer_nivel'],
                                           'Guarderías':['denue_guarderias'],
                                           'Asistencia social':['denue_dif']},
                  'Comercio':{'Alimentos':['denue_supermercado','denue_abarrotes',
                                           'denue_carnicerias','sip_mercado'],
                              'Personal':['denue_peluqueria'],
                              'Farmacias':['denue_farmacias'],
                              'Hogar':['denue_ferreteria_tlapaleria','denue_art_limpieza'],
                              'Complementarios':['denue_ropa','denue_calzado','denue_muebles',
                                                 'denue_lavanderia','denue_revistas_periodicos',
                                                 'denue_pintura']},
                  'Entretenimiento':{'Social':['denue_restaurante_insitu','denue_restaurante_llevar',
                                               'denue_bares','denue_cafe'],
                                    'Actividad física':['sip_cancha','sip_unidad_deportiva',
                                                        'sip_espacio_publico','denue_parque_natural'],
                                    'Cultural':['denue_cines','denue_museos']} 
                 }

    #If weight of amenity is less than number of sources, the algorith chooses the minimum time to source. Else (if equall or greater), chooses max time.
    wegiht_idx = {'Escuelas':{'Preescolar':1,
                            'Primaria':1,
                            'Secundaria':1},
                'Servicios comunitarios':{'Salud':1,
                                        'Guarderías':1,
                                        'Asistencia social':1},
                'Comercio':{'Alimentos':1,
                            'Personal':1,
                            'Farmacias':1,
                            'Hogar':1,
                            'Complementarios':1},
                'Entretenimiento':{'Social':4,
                                    'Actividad física':1,
                                    'Cultural':1}
                }
    
    #--------------- PREPARE DATA ---------- FILL MISSING COLUMNS (In case there is a source amenity not available in a city)
    sources = []

    # Gather all possible sources
    for eje in idx_15_min.keys():
        for amenity in idx_15_min[eje].values():
            for source in amenity:
                sources.append(source)

    # If source not in currently analized city, fill column with np.nan
    column_list = list(nodes_analysis.columns)
    missing_sourceamenities = []

    for s in sources:
            if s not in column_list:
                nodes_analysis[s] = np.nan
                missing_sourceamenities.append(a)

    print("There are {} non present source amenities in {}".format(len(missing_sourceamenities),city))
    
    #--------------- PROCESS DATA 
    #--------------- PROCESS DATA ---------- Max time calculation
    #This step calculates times by amenity

    column_max_all = [] # list with all max index column names
    column_max_ejes = [] # list with ejes index column names

    #Goes through each eje in dictionary:
    for e in idx_15_min.keys():

        #Appends to 3 lists currently examined eje
        column_max_all.append('max_'+ e.lower())
        column_max_ejes.append('max_'+ e.lower())
        column_max_amenities = [] # list with amenities in current eje

        #Goes through each amenity of current eje:
        for a in idx_15_min[e].keys():

            #Appends to 2 lists currently examined amenity:
            column_max_all.append('max_'+ a.lower())
            column_max_amenities.append('max_'+ a.lower())

            #Calculates time to currently examined amenity:
            #If weight is less than number of sources of amenity, choose minimum time to sources.
            if wegiht_idx[e][a] < len(idx_15_min[e][a]): 
                nodes_analysis['max_'+ a.lower()] = nodes_analysis[idx_15_min[e][a]].min(axis=1)
            #Else, choose maximum time to sources.
            else:
                nodes_analysis['max_'+ a.lower()] = nodes_analysis[idx_15_min[e][a]].max(axis=1)

        #Calculates time to currently examined eje (max time of its amenities):
        nodes_analysis['max_'+ e.lower()] = nodes_analysis[column_max_amenities].max(axis=1) 

    index_column = 'max_time' # column name for maximum time data

    #Add to column_max_all list the attribute 'max_time'
    column_max_all.append(index_column)

    #Assigns "max_time" the max time for all ejes
    nodes_analysis[index_column] = nodes_analysis[column_max_ejes].max(axis=1)     
    
    #Add to column_max_all list the attributes 'osmid' and 'geometry' to filter nodes_analysis with the column_max_all list.
    column_max_all.append('osmid')
    column_max_all.append('geometry')
    nodes_analysis_filter = nodes_analysis[column_max_all].copy()

    if save_disk_space:
        del nodes_analysis
          
    print('Calculated proximity to amenities data by node')
        
    #--------------- PROCESS DATA ---------- GROUP TIMES BY HEXAGONS
    # group data by hex
    res = 9
    hex_tmp = hex_pop[['hex_id_9','geometry']]
    hex_res_9_idx = aup.group_by_hex_mean(nodes_analysis_filter, hex_tmp, res, index_column)
    hex_res_9_idx = hex_res_9_idx.loc[hex_res_9_idx[index_column]>0].copy()

    if save_disk_space:
        del hex_tmp
        del nodes_analysis_filter
        
    print('Grouped nodes data by hexagons')
          
    #--------------- PROCESS DATA ---------- RE-CALCULATE MAX TIMES BY HEXAGON
    # This step recalculates max time to each eje from max times to calculated amenities and max_time from max eje
    column_max_ejes = [] # list with ejes index column names

    #Goes (again) through each eje in dictionary:
    for e in idx_15_min.keys():

        column_max_ejes.append('max_'+ e.lower())
        column_max_amenities = [] # list with amenities in current eje

        #Goes (again) through each amenity of current eje:    
        for a in idx_15_min[e].keys():

            column_max_amenities.append('max_'+ a.lower())

        #Re-calculates time to currently examined eje (max time of its amenities):        
        hex_res_9_idx['max_'+ e.lower()] = hex_res_9_idx[column_max_amenities].max(axis=1)

    hex_res_9_idx[index_column] = hex_res_9_idx[column_max_ejes].max(axis=1)

    #Add to column_max_all list the attribute 'max_time'
    column_max_ejes.append(index_column)
          
    print('Finished recalculating times in hexagons')
    
    #--------------- PROCESS DATA ---------- INDEX, MEDIAN AND MEAN CALCULATION
    # This step adds data
    
    #Define function
    def apply_sigmoidal(x):
        if x == -1:
            return -1
        elif x > 1000:
            return 0
        else:
            val = aup.sigmoidal_function(0.1464814753435666, x, 30)
            return val
    
    #Apply function
    amenities_col = ['max_preescolar','max_primaria','max_secundaria',
               'max_salud','max_guarderías','max_asistencia social',
               'max_alimentos','max_personal','max_farmacias','max_hogar',
               'max_complementarios','max_social','max_actividad física',
               'max_cultural']
    for ac in amenities_col:
        idx_col = ac.replace('max','idx')
        hex_res_9_idx[idx_col] = hex_res_9_idx[ac].apply(apply_sigmoidal)
    
    #Add data
    idx_colname = []
    for ac in amenities_col:
        idx_col = ac.replace('max','idx')
        idx_colname.append(idx_col)
    
    hex_res_9_idx['mean_time'] = hex_res_9_idx[amenities_col].mean(axis=1)
    hex_res_9_idx['median_time'] = hex_res_9_idx[amenities_col].median(axis=1)
    hex_res_9_idx['idx_sum'] = hex_res_9_idx[idx_colname].sum(axis=1)
          
    print('Finished calculating index, mean and median time')
    
    #--------------- PROCESS DATA ---------- ADD POP AND CITY DATA
    # calculate population density
    hex_pop = hex_pop.to_crs("EPSG:6372")
    hex_pop['dens_pobha'] = hex_pop['pobtot'] / (hex_pop.area/10000)
    hex_pop = hex_pop.to_crs("EPSG:4326")
    
    # Add pop data
    pop_list = ['hex_id_9','pobtot','dens_pobha']
    hex_res_9_idx = pd.merge(hex_res_9_idx, hex_pop[pop_list], on='hex_id_9')

    if save_disk_space:
        del hex_pop
    
    # Add city data
    hex_res_9_idx['city'] = city
          
    print('Finished adding pop and city data')
    
    #--------------- FINAL FORMAT ----------
    #--------------- FINAL FORMAT ---------- REORDER COLUMNS    
    #Final format
    final_column_ordered_list = ['hex_id_9', 'geometry', 
                             'max_escuelas', 'max_preescolar', 'max_primaria', 'max_secundaria',
                             'max_servicios comunitarios', 'max_salud', 'max_guarderías', 'max_asistencia social',
                             'max_comercio', 'max_alimentos', 'max_personal', 'max_farmacias', 'max_hogar', 'max_complementarios',
                             'max_entretenimiento', 'max_social', 'max_actividad física', 'max_cultural', 
                             'idx_preescolar', 'idx_primaria', 'idx_secundaria',
                             'idx_salud', 'idx_guarderías', 'idx_asistencia social',
                             'idx_alimentos', 'idx_personal', 'idx_farmacias', 'idx_hogar', 'idx_complementarios',
                             'idx_social', 'idx_actividad física', 'idx_cultural',
                             'mean_time', 'median_time', 'max_time', 'idx_sum',
                             'pobtot', 'dens_pobha','city']

    hex_res_9_idx_reordered = hex_res_9_idx[final_column_ordered_list]

    if save_disk_space:
        del hex_res_9_idx
          
    print('Finished final format')
        
    #--------------- SAVE TO DB ----------
    if save:
        #Load previously loaded data
        prox_schema = 'prox_analysis'
        prox_table = 'proximityanalysis_hexres9'
        query = f"SELECT * FROM {prox_schema}.{prox_table}"
        prox_all = aup.gdf_from_query(query, geometry_col='geometry')
        print('Loaded already loaded data')    

        #Concatenate data
        dfs = [hex_res_9_idx_reordered,prox_all]
        prox = pd.concat(dfs)
        
        if save_disk_space:
            del hex_res_9_idx_reordered
            del prox_all
            
        print('Concatented {} data to already loaded data'.format(city))

        #Upload data
        aup.gdf_to_db_slow(prox,"proximityanalysis_hexres9", 'prox_analysis', if_exists='replace')
        print('Uploaded {} data to db'.format(city))
    
    print('FINISHED ANALYSIS FOR {}'.format(city))

In [31]:
#Load mun data
mun_schema = 'metropolis'
mun_table = 'metro_gdf'
query = f"SELECT * FROM {mun_schema}.{mun_table}" 
gdf_mun = aup.gdf_from_query(query, geometry_col='geometry')

#Find already loaded cities
prox_schema = 'prox_analysis'
prox_table = 'proximityanalysis_hexres9'
query = f"SELECT * FROM {prox_schema}.{prox_table}"
prox_all = aup.gdf_from_query(query, geometry_col='geometry')
processed_city_list = list(prox_all.city.unique())

#Run main function
for city in gdf_mun.city.unique():
        if city not in processed_city_list:
            main(city, save=True, save_disk_space = True)

Exception during reset or similar
Traceback (most recent call last):
  File "/usr/local/python/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 991, in _finalize_fairy
    fairy._reset(
  File "/usr/local/python/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1440, in _reset
    pool._dialect.do_rollback(self)
  File "/usr/local/python/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 657, in do_rollback
    dbapi_connection.rollback()
psycopg2.OperationalError: SSL SYSCALL error: EOF detected



STARTING ANALYSIS FOR Ensenada
Downloaded hex data with a total of 396709.0 persons


Exception during reset or similar
Traceback (most recent call last):
  File "/usr/local/python/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 991, in _finalize_fairy
    fairy._reset(
  File "/usr/local/python/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1440, in _reset
    pool._dialect.do_rollback(self)
  File "/usr/local/python/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 657, in do_rollback
    dbapi_connection.rollback()
psycopg2.OperationalError: SSL SYSCALL error: EOF detected

Exception during reset or similar
Traceback (most recent call last):
  File "/usr/local/python/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 991, in _finalize_fairy
    fairy._reset(
  File "/usr/local/python/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1440, in _reset
    pool._dialect.do_rollback(self)
  File "/usr/local/python/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 657, in do_rollback
    dbapi_connec

Downloaded a total of 1065960 nodes
Transformed nodes data
There are 0 non present source amenities in Ensenada
Calculated proximity to amenities data by node
Grouped nodes data by hexagons
Finished recalculating times in hexagons
Finished calculating index, mean and median time
Finished adding pop and city data
Finished final format
Loaded already loaded data
Concatented Ensenada data to already loaded data


OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

[SQL: 
DROP TABLE prox_analysis.proximityanalysis_hexres9]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

### Check

#for city in gdf_mun.city.unique():
    if city not in processed_city_list:
        print(city)
    else:
        print("{} ya está procesada".format(city))