# Script 27 projects comparisons

This notebook has a function that compares the proximity on the baseline network (red_buena_calidad) with the proximity of a project network (e.g. red_buena_calidad_pza_italia) in order to find the reason of undesired results in Script 27.
* __Result:__ The reason of undesired results in Script 27a was due to the creation of the Network. Dropping duplicates both in GIS and in code resulted on different connections in the Network and therefore different proximity results.
* __New approach:__ It was decided to send the network to Santiago's team and recieve that same network with the pje_ep column for each edge instead of trying to use their local network as a replacement for the OSMnx network. That development can be found in __Notebook 31__.
* __Notebook still used to compare results on Script 27b__ (Testing site section)

## Import libraries

In [1]:
accesibilidad_urbana = "../../../"

In [2]:
import os
import sys

import pandas as pd
import geopandas as gpd
import osmnx as ox
import numpy as np

from shapely import Point

import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

module_path = os.path.abspath(os.path.join(accesibilidad_urbana))
if module_path not in sys.path:
    sys.path.append(module_path)
    import aup

## Notebook config

In [3]:
script27_output_dir = accesibilidad_urbana + "data/external/santiago/output/"

## Load data

In [9]:
# red_buena_calidad (project_01)
baseline_hexproximity = gpd.read_file(script27_output_dir + "project_01/santiago_hexproximity_project_01.gpkg")
baseline_hexvariablesanalysis = gpd.read_file(script27_output_dir + "project_01/santiago_hexvariablesanalysis_project_01.gpkg")
baseline_hexanalysis = gpd.read_file(script27_output_dir + "project_01/santiago_hexanalysis_project_01.gpkg")

# red_buena_calidad_pza_italia (project_02)
plazaitalia_hexproximity = gpd.read_file(script27_output_dir + "project_02/santiago_hexproximity_project_02.gpkg")
plazaitalia_hexvariablesanalysis = gpd.read_file(script27_output_dir + "project_02/santiago_hexvariablesanalysis_project_02.gpkg")
plazaitalia_hexanalysis = gpd.read_file(script27_output_dir + "project_02/santiago_hexanalysis_project_02.gpkg")

# red_buena_calidad_norte_sur (project_03)
nortesur_hexproximity = gpd.read_file(script27_output_dir + "project_03/santiago_hexproximity_project_03.gpkg")
nortesur_hexvariablesanalysis = gpd.read_file(script27_output_dir + "project_03/santiago_hexvariablesanalysis_project_03.gpkg")
nortesur_hexanalysis = gpd.read_file(script27_output_dir + "project_03/santiago_hexanalysis_project_03.gpkg")

# red_buena_calidad_parque_bueras (project_04)
parquebueras_hexproximity = gpd.read_file(script27_output_dir + "project_04/santiago_hexproximity_project_04.gpkg")
parquebueras_hexvariablesanalysis = gpd.read_file(script27_output_dir + "project_04/santiago_hexvariablesanalysis_project_04.gpkg")
parquebueras_hexanalysis = gpd.read_file(script27_output_dir + "project_04/santiago_hexanalysis_project_04.gpkg")

In [10]:
print(baseline_hexproximity.crs)
print(baseline_hexvariablesanalysis.crs)
print(baseline_hexanalysis.crs)

EPSG:4326
EPSG:4326
EPSG:4326


## Data comparisons - Proximity

This cell compares the baseline proximity to any given project proximity and saves the result to a gdf. The gdf contains:
* The difference for each attribute in a column (for each hex).
* Whether there was a possitive (increase) or negative (decrease) change in ANY time or count column (for each hex).

__Result:__ There IS a problem. Some hexs (not those directly below project interventions/new node creations) registered more time before the project than after.

## Finding problem

Project_02(plaza italia) presents no problem, while Project_03(norte sur) and Project_04(parque bueras) have problems.

### __Testing site__

#### Testing site - __Proximity__ comparison

In [4]:
def compare_proximity(baseline_hexproximity, comparing_project, comparison_id, save):
    # ----------
    attributes_list = ['carniceria_time','carniceria_count_15min','hogar_time','hogar_count_15min','bakeries_time','bakeries_count_15min',
                       'supermercado_time','supermercado_count_15min','banco_time','banco_count_15min','ferias_time','ferias_count_15min',
                       'local_mini_market_time','local_mini_market_count_15min','correos_time','correos_count_15min','centro_recyc_time','centro_recyc_count_15min',
                       'hospital_priv_time','hospital_priv_count_15min','hospital_pub_time','hospital_pub_count_15min','clinica_priv_time','clinica_priv_count_15min',
                       'clinica_pub_time','clinica_pub_count_15min','farmacia_time','farmacia_count_15min','vacunatorio_priv_time','vacunatorio_priv_count_15min',
                       'vacunatorio_pub_time','vacunatorio_pub_count_15min','consult_ado_priv_time','consult_ado_priv_count_15min','consult_ado_pub_time','consult_ado_pub_count_15min',
                       'salud_mental_time','salud_mental_count_15min','labs_priv_time','labs_priv_count_15min','residencia_adumayor_time','residencia_adumayor_count_15min',
                       'eq_deportivo_priv_time','eq_deportivo_priv_count_15min','eq_deportivo_pub_time','eq_deportivo_pub_count_15min','club_deportivo_time','club_deportivo_count_15min',
                       'civic_office_time','civic_office_count_15min','tax_collection_time','tax_collection_count_15min','social_security_time','social_security_count_15min',
                       'police_time','police_count_15min','bomberos_time','bomberos_count_15min','museos_priv_time','museos_priv_count_15min','museos_pub_time','museos_pub_count_15min',
                       'cines_time','cines_count_15min','sitios_historicos_time','sitios_historicos_count_15min','restaurantes_bar_cafe_time','restaurantes_bar_cafe_count_15min',
                       'librerias_time','librerias_count_15min','ep_plaza_small_time','ep_plaza_small_count_15min','ep_plaza_big_time','ep_plaza_big_count_15min',
                       'edu_basica_pub_time','edu_basica_pub_count_15min','edu_media_pub_time','edu_media_pub_count_15min','jardin_inf_pub_time','jardin_inf_pub_count_15min',
                       'universidad_time','universidad_count_15min','edu_tecnica_time','edu_tecnica_count_15min','edu_adultos_pub_time','edu_adultos_pub_count_15min',
                       'edu_especial_pub_time','edu_especial_pub_count_15min','bibliotecas_time','bibliotecas_count_15min','centro_edu_amb_time','centro_edu_amb_count_15min',
                       'paradas_tp_ruta_time','paradas_tp_ruta_count_15min','paradas_tp_metro_time','paradas_tp_metro_count_15min','paradas_tp_tren_time','paradas_tp_tren_count_15min',
                       'ciclovias_time','ciclovias_count_15min','estaciones_bicicletas_time','estaciones_bicicletas_count_15min']
    
    # ---------- Merge baseline and comparing project data
    both_gdfs = pd.merge(baseline_hexproximity,comparing_project[['hex_id']+attributes_list],on='hex_id')
    
    # ---------- Compare baseline (old) and project (new) attributes, saving the difference in a col and
    #            identifying hexs where time or count increased or decreased for any attribute.
    
    # Set to empty/0
    compare_list = []
    both_gdfs['time_increase'] = 0
    both_gdfs['time_decrease'] = 0
    both_gdfs['count_increase'] = 0
    both_gdfs['count_decrease'] = 0
    both_gdfs['new_availability'] = 0
    
    # Iterate over each attribute
    for attribute in attributes_list:

        # Test results skipping restaurants (Before get_seeds solution)
        #if (attribute == 'restaurantes_bar_cafe_time') | (attribute == 'restaurantes_bar_cafe_count_15min'):
        #    continue
    
        # Find attribute difference
        old_attribute = f"{attribute}_x"
        new_attribute = f"{attribute}_y"
        
        both_gdfs[f"{attribute}_diff"] = both_gdfs[new_attribute] - both_gdfs[old_attribute]
        
        # Register positive or negative difference
        if 'time' in attribute:
            idx_1 = both_gdfs[f"{attribute}_diff"]>0
            both_gdfs.loc[idx_1,'time_increase'] = 1
            idx_2 = both_gdfs[f"{attribute}_diff"]<0
            both_gdfs.loc[idx_2,'time_decrease'] = 1
            
        elif 'count' in attribute:
            idx_3 = both_gdfs[f"{attribute}_diff"]>0
            both_gdfs.loc[idx_3,'count_increase'] = 1
            idx_4 = both_gdfs[f"{attribute}_diff"]<0
            both_gdfs.loc[idx_4,'count_decrease'] = 1

        # Overwrite when the apparition of previously unavailable attributes has an effect on increase of time
        if 'time' in attribute:
            idx_0 = (both_gdfs[new_attribute]>0) & (both_gdfs[old_attribute]==0)
            both_gdfs.loc[idx_0,'time_increase'] = 0
            both_gdfs.loc[idx_0,'new_availability'] = 1
    
    # Save result
    if save:
        outputs_analysis_dir = accesibilidad_urbana + "data/external/santiago/outputs_analysis/"
        both_gdfs.to_file(outputs_analysis_dir + f"comparison_{comparison_id}_proximity_changes.gpkg", driver='GPKG')

    return both_gdfs

In [5]:
baseline_hexproximity = gpd.read_file(script27_output_dir + "project_01_osmnx/santiago_hexproximity_project_01.gpkg")

In [15]:
comparing_hexproximity = gpd.read_file(script27_output_dir + "project_04_osmnx/santiago_hexproximity_project_04.gpkg")

In [16]:
compared = compare_proximity(baseline_hexproximity,comparing_hexproximity,comparison_id='osm_p01vsp04_seedsfix',save=True)

# COMPARISON IDS ALREADY EXPLORED:
# FROM SCRIPT 27a THE PROBLEM WAS IDENTIFIED AS THE NETWORK. [OUTDATED]
# p01vsp02 --> Current situation (Works properly)
# p01vsp03 --> Current situation (Has undesired changes in proximity)
# p01vsp04 --> Current situation (Has undesired changes in proximity)

# Tests on p01(baseline) and p02(Plaza Italia, comparing):
# p01vsp02_01rerun --> Rerunning without changing anything test 
#                  --> Result is same as current situation (Both network and script work properly)

# p01vsp02_02redo --> Rerunning after redoing QGIS preprocessing steps 
#                 --> Result has undesired changes in proximity (QGIS preprocessing is the problem)

# p01vsp02_03reorder --> Changing the order with which QGIS preprocessing is done 
#                    --> Result has undesired changes in proximity (QGIS preprocessing may vary, not reliable)

# p01vsp02_04code --> Preprocess within script instead of in QGIS (Required recreating Nearest data)
#                 --> Result has a lot of undesired changes in proximity

# --------------------------------------
# FROM SCRIPT 27b:
# osm_p01vsp02 --> Works properly
# osm_p01vsp03 --> Appears to have problems in count_decrease and count_increase due to restaurantes_bar_count_15min
# osm_p01vsp04 --> Appears to have problems in count_decrease and count_increase due to restaurantes_bar_count_15min

# osm_p01vsp03_norestbar --> Analysis without restaurantes_bar_cafe_count_15min. Fixes problems. Problem located in restaurantes_bar_count_15min.
# osm_p01vsp04_norestbar --> Analysis without restaurantes_bar_cafe_count_15min. Fixes problems. Problem located in restaurantes_bar_count_15min.

# osm_p01vsp03_batchfix --> Changes in aup.pois_time() batch creation --> Did not solve the problem.

# osm_p01vsp02_seedsfix -->
# osm_p01vsp03_seedsfix -->
# osm_p01vsp04_seedsfix -->

  super().__setitem__(key, value)
  super().__setitem__(key, value)
  super().__setitem__(key, value)
  super().__setitem__(key, value)
  super().__setitem__(key, value)
  super().__setitem__(key, value)
  super().__setitem__(key, value)
  super().__setitem__(key, value)
  super().__setitem__(key, value)
  super().__setitem__(key, value)
  super().__setitem__(key, value)


#### Testing site - __HQSL__ comparison

In [11]:
def compare_hqsl(baseline_hexanalysis, comparing_project, comparison_id, save):
    # ----------
    attributes_list = ['hqsl']
    
    # ---------- Merge baseline and comparing project data
    both_gdfs = pd.merge(baseline_hexanalysis,comparing_project[['hex_id']+attributes_list],on='hex_id')
    
    # ---------- Compare baseline (old) and project (new) attributes, saving the difference in a col and
    #            identifying hexs where time or count increased or decreased for any attribute.
    
    # Set to empty/0
    compare_list = []
    both_gdfs['hqsl_changes'] = 0
    
    # Iterate over each attribute
    for attribute in attributes_list:
    
        # Find attribute difference
        old_attribute = f"{attribute}_x"
        new_attribute = f"{attribute}_y"
        
        both_gdfs[f"{attribute}_diff"] = both_gdfs[new_attribute] - both_gdfs[old_attribute]
        
        idx_1 = both_gdfs[f"{attribute}_diff"]>0
        both_gdfs.loc[idx_1,'hqsl_changes'] = 1
        idx_2 = both_gdfs[f"{attribute}_diff"]<0
        both_gdfs.loc[idx_2,'hqsl_changes'] = -1
    
    # Save result
    if save:
        outputs_analysis_dir = accesibilidad_urbana + "data/external/santiago/outputs_analysis/"
        both_gdfs.to_file(outputs_analysis_dir + f"comparison_{comparison_id}_hqsl_changes.gpkg", driver='GPKG')

    return both_gdfs

In [12]:
baseline_hexanalysis = gpd.read_file(script27_output_dir + "project_01_osmnx/santiago_hexanalysis_project_01.gpkg")

In [19]:
comparing_hexanalysis = gpd.read_file(script27_output_dir + "project_04_osmnx/santiago_hexanalysis_project_04.gpkg")

In [20]:
compared_analysis = compare_hqsl(baseline_hexanalysis,comparing_hexanalysis,comparison_id='osm_p01vsp04_seedsfix',save=True)

# COMPARISON IDS ALREADY EXPLORED:
# --------------------------------------
# FROM SCRIPT 27b:
# osm_p01vsp02 --> Presents HQSL decreases
# osm_p01vsp03 --> Presents HQSL decreases
# osm_p01vsp04 --> Presents HQSL decreases

# osm_p01vsp02_seedsfix --> 
# osm_p01vsp03_seedsfix --> 
# osm_p01vsp04_seedsfix --> 

### Finding problem - Test 1: Is the network/code the problem? Or is the problem located in project_03 and project_04 specifically?

* __Approach:__ Re-do network process in QGIS for Project_00(baseline) and project_01(plaza_italia) and re-run __without changing code.__
* __Result:__ Re-doing the network and re-running plaza italia resulted in Plaza Italia having problems. But could also be the code, not the network.

### Finding problem - Test 2: Was the algorithm changed between projects?

* __Approach:__ Run Project_00(baseline) and project_01(plaza_italia) again __without changing anything (same old existing network)__ and re-test to see if problems emerge.
* __Result:__ Re-using the old network results in the correct result for Plaza Italia. __The problem is located in how the network is created.__

### Finding problem - Test 3: Invert QGIS process

#### Instead of:
1. __Split lines with lines__
2. __Vector > Geometry Tools > Multipart to singleparts__
3. Extract specific vertices [0,-1]
4. MMQGIS Modify > Drop dups
#### Do:
1. __Vector > Geometry Tools > Multipart to singleparts__
2. __Split lines with lines__
3. Extract specific vertices [0,-1]
4. MMQGIS Modify > Drop dups

__Result:__ "NotImplementedError: Sub-geometries may have coordinate sequences, but multi-part geometries do not"

### Finding problem - Test 04: Compare working (old) and non-working (re-done) networks.

In [71]:
networks_dir = accesibilidad_urbana + "data/external/santiago/calidad_ep/"

# red_buena_calidad_pza_italia (project_02)
project02_original_nodes = gpd.read_file(networks_dir + "02_red_buena_calidad_pza_italia/red_buena_calidad_pza_italia_nodes.shp")
project02_re_do_nodes = gpd.read_file(networks_dir + "02_red_buena_calidad_pza_italia/project_02_nodes.shp")

In [72]:
print(project02_original_nodes.shape)
print(project02_original_nodes.crs)
project02_original_nodes.head(1)

(14948, 10)
EPSG:32719


Unnamed: 0,fid,Nom_Rut,pje_ep,vertex_pos,vertex_ind,vertex_par,vertex_p_1,distance,angle,geometry
0,1.0,Costanera Sur,0.549756,0,0,0,0,0.0,191.098679,POINT (350915.673 6302582.007)


In [73]:
print(project02_re_do_nodes.shape)
print(project02_re_do_nodes.crs)
project02_re_do_nodes.head(1)

(14949, 10)
EPSG:32719


Unnamed: 0,fid,Nom_Rut,pje_ep,vertex_pos,vertex_ind,vertex_par,vertex_p_1,distance,angle,geometry
0,1.0,Costanera Sur,0.549756,0,0,0,0,0.0,191.098679,POINT (350915.673 6302582.007)


In [74]:
original_nodes_lst = list(project02_original_nodes.geometry.unique())
re_do_nodes_lst = list(project02_re_do_nodes.geometry.unique())

geoms_list = []
for geom in original_nodes_lst:
    if geom not in existing_nodes:
        geoms_list.append(geom)

geoms_list

[<POINT (348041.604 6299041.683)>,
 <POINT (348026.727 6299078.686)>,
 <POINT (347981.729 6299031.689)>,
 <POINT (348002.114 6299040.092)>,
 <POINT (348050.963 6299108.82)>,
 <POINT (347995.039 6299056.535)>,
 <POINT (347995.309 6299056.763)>,
 <POINT (348002.115 6299040.092)>,
 <POINT (348002.114 6299040.092)>,
 <POINT (347995.309 6299056.763)>,
 <POINT (348050.963 6299108.82)>,
 <POINT (348093.918 6299108.686)>]

In [None]:
# Find that extra node
project02_re_do_nodes.loc[project02_re_do_nodes.geometry==Point(348041.6037999997, 6299041.6832)]

In [None]:
# Find that extra node
project02_re_do_nodes.loc[project02_re_do_nodes.geometry==Point(348026.7271999996, 6299078.6865)]

In [None]:
# Find nodes with differences between both processes
concat = pd.concat([project02_original_nodes,project02_re_do_nodes])
dropped = concat.drop_duplicates(keep=False)

dropped

In [None]:
# Save resulting nodes
dropped.drop(columns='fid',inplace=True)
outputs_analysis_dir = accesibilidad_urbana + "data/external/santiago/outputs_analysis/"
dropped.to_file(outputs_analysis_dir+'project_02_different_nodes.gpkg', driver='GPKG')

# Script 27b debugging (Restaurantes bar cafe problem)

* __Problem:__ The restaurantes_bar_cafes_15min_count changes negatively after implementing projects.
* __Result:__ The problem uncovered an unecessary set() in function aup.get_seeds(). The set removed duplicate nodes, so if one node had 3 restaurantes, only 1 was counted while time remained unaffected. The set was removed.

### Script 27b debugging - Exploration 1: batch size creation in function pois_time()
* The problem was believed to be here. It wasn't, but still the code was changed from <__if len(x) % 250:__> to <__if (len(x) % 250) == 0:__>

In [147]:
attributes_list = ['carniceria_time','carniceria_count_15min','hogar_time','hogar_count_15min','bakeries_time','bakeries_count_15min',
                       'supermercado_time','supermercado_count_15min','banco_time','banco_count_15min','ferias_time','ferias_count_15min',
                       'local_mini_market_time','local_mini_market_count_15min','correos_time','correos_count_15min','centro_recyc_time','centro_recyc_count_15min',
                       'hospital_priv_time','hospital_priv_count_15min','hospital_pub_time','hospital_pub_count_15min','clinica_priv_time','clinica_priv_count_15min',
                       'clinica_pub_time','clinica_pub_count_15min','farmacia_time','farmacia_count_15min','vacunatorio_priv_time','vacunatorio_priv_count_15min',
                       'vacunatorio_pub_time','vacunatorio_pub_count_15min','consult_ado_priv_time','consult_ado_priv_count_15min','consult_ado_pub_time','consult_ado_pub_count_15min',
                       'salud_mental_time','salud_mental_count_15min','labs_priv_time','labs_priv_count_15min','residencia_adumayor_time','residencia_adumayor_count_15min',
                       'eq_deportivo_priv_time','eq_deportivo_priv_count_15min','eq_deportivo_pub_time','eq_deportivo_pub_count_15min','club_deportivo_time','club_deportivo_count_15min',
                       'civic_office_time','civic_office_count_15min','tax_collection_time','tax_collection_count_15min','social_security_time','social_security_count_15min',
                       'police_time','police_count_15min','bomberos_time','bomberos_count_15min','museos_priv_time','museos_priv_count_15min','museos_pub_time','museos_pub_count_15min',
                       'cines_time','cines_count_15min','sitios_historicos_time','sitios_historicos_count_15min','restaurantes_bar_cafe_time','restaurantes_bar_cafe_count_15min',
                       'librerias_time','librerias_count_15min','ep_plaza_small_time','ep_plaza_small_count_15min','ep_plaza_big_time','ep_plaza_big_count_15min',
                       'edu_basica_pub_time','edu_basica_pub_count_15min','edu_media_pub_time','edu_media_pub_count_15min','jardin_inf_pub_time','jardin_inf_pub_count_15min',
                       'universidad_time','universidad_count_15min','edu_tecnica_time','edu_tecnica_count_15min','edu_adultos_pub_time','edu_adultos_pub_count_15min',
                       'edu_especial_pub_time','edu_especial_pub_count_15min','bibliotecas_time','bibliotecas_count_15min','centro_edu_amb_time','centro_edu_amb_count_15min',
                       'paradas_tp_ruta_time','paradas_tp_ruta_count_15min','paradas_tp_metro_time','paradas_tp_metro_count_15min','paradas_tp_tren_time','paradas_tp_tren_count_15min',
                       'ciclovias_time','ciclovias_count_15min']

# 499 items
#short_list = attributes_list.copy()
#short_list.remove('ciclovias_count_15min')
#attributes_list = (attributes_list*4)
#attributes_list.extend(short_list)

# 500 items
#attributes_list = attributes_list*5

# 501 items
#attributes_list = attributes_list*5+['test']
print(len(attributes_list))

100


In [157]:
# 1486 items
extended_list = (attributes_list*15)
short_list = extended_list.copy()
remove_items = ['edu_especial_pub_time','edu_especial_pub_count_15min','bibliotecas_time','bibliotecas_count_15min','centro_edu_amb_time','centro_edu_amb_count_15min',
                'paradas_tp_ruta_time','paradas_tp_ruta_count_15min','paradas_tp_metro_time','paradas_tp_metro_count_15min','paradas_tp_tren_time','paradas_tp_tren_count_15min',
                'ciclovias_time','ciclovias_count_15min']
for item in remove_items:
    short_list.remove(item)
test_list = short_list.copy()
print(len(test_list))

1486


In [156]:
# 1490 items
extended_list = (attributes_list*15)
short_list = extended_list.copy()
remove_items = ['edu_especial_pub_time','edu_especial_pub_count_15min','bibliotecas_time','bibliotecas_count_15min','centro_edu_amb_time','centro_edu_amb_count_15min',
                'paradas_tp_ruta_time','paradas_tp_ruta_count_15min','paradas_tp_metro_time','paradas_tp_metro_count_15min']
for item in remove_items:
    short_list.remove(item)
test_list = short_list.copy()
print(len(test_list))

1490


In [154]:
# If possible, analyses by batches of 200 pois.
if (len(test_list) % 250) == 0:
    print("Escenario 1")
    batch_size = len(test_list)/200
    print(f"{int(batch_size)+1} batches")
    for k in range(int(batch_size)+1):
        
        print(f"Iteración {k}")
        source_process = test_list[int(200*k):int(200*(1+k))]
        print(f"Analysing from {int(200*k)} to {int(200*(1+k))}. ({len(source_process)} datas).")
        
else:
    print("Escenario 2")
    batch_size = len(test_list)/250
    print(f"{int(batch_size)+1} batches")
    for k in range(int(batch_size)+1):
        
        print(f"Iteración {k}")
        source_process = test_list[int(250*k):int(250*(1+k))]
        print(f"Analysing from {int(250*k)} to {int(250*(1+k))}. ({len(source_process)} datas).")

Escenario 2
6 batches
Iteración 0
Analysing from 0 to 250. (250 datas).
Iteración 1
Analysing from 250 to 500. (250 datas).
Iteración 2
Analysing from 500 to 750. (250 datas).
Iteración 3
Analysing from 750 to 1000. (250 datas).
Iteración 4
Analysing from 1000 to 1250. (250 datas).
Iteración 5
Analysing from 1250 to 1500. (240 datas).


In [158]:
# If possible, analyses by batches of 200 pois.
if len(test_list) % 250:
    batch_size = len(test_list)/201 # <----------- ADDED +1
    print(f"{int(batch_size)+1} batches")
    for k in range(int(batch_size)+1):
        
        print(f"Iteración {k}")
        source_process = test_list[int(200*k):int(200*(1+k))]
        print(f"Analysing from {int(200*k)} to {int(200*(1+k))}. ({len(source_process)} datas).")
        
else:
    batch_size = len(test_list)/251 # <----------- ADDED +1
    print(f"{int(batch_size)+1} batches")
    for k in range(int(batch_size)+1):
        
        print(f"Iteración {k}")
        source_process = test_list[int(250*k):int(250*(1+k))]
        print(f"Analysing from {int(250*k)} to {int(250*(1+k))}. ({len(source_process)} datas).")

8 batches
Iteración 0
Analysing from 0 to 200. (200 datas).
Iteración 1
Analysing from 200 to 400. (200 datas).
Iteración 2
Analysing from 400 to 600. (200 datas).
Iteración 3
Analysing from 600 to 800. (200 datas).
Iteración 4
Analysing from 800 to 1000. (200 datas).
Iteración 5
Analysing from 1000 to 1200. (200 datas).
Iteración 6
Analysing from 1200 to 1400. (200 datas).
Iteración 7
Analysing from 1400 to 1600. (86 datas).


### Script 27b debugging - Exploration 2: Restaurantes bar cafe batches
* The batches created were extracted and analysed until the set() solution was discovered.
* __In order to replicate results__ it would be necessary to save the nearest and nodes_distance_prep (batches) used inside pois_time() for each source.

#### Precalculated nearest

In [216]:
nearest_dir = accesibilidad_urbana + "data/external/santiago/nearest/"
original_nearest = gpd.read_file(nearest_dir + "nearest_restaurantes_bar_cafe.gpkg")
print(original_nearest.shape)

osmid_check_list = list(red_buena_calidad_nearest.reset_index().osmid.unique())
original_nearest = original_nearest.loc[original_nearest.osmid.isin(osmid_check_list)]

print(original_nearest.shape)
original_nearest.tail(2)

(2370, 20)
(1486, 20)


Unnamed: 0,rut,dv,vigenci,fecha,tipo_di,calle,numero,bloque,departa,villa_p,ciudad,comuna,region,sngldrs,empresa,index_right,city,osmid,distance_node,geometry
2367,99533970,6,N,2015-12-24,SUCURSAL,SANCHEZ FONTECILLA,310,,,,SANTIAGO,LAS CONDES,XIII REGION METROPOLITANA,"SANCHEZ FONTECILLA 310, LAS CONDES, Chile",INVERSIONES GRS S A,0,alamedabuffer_4500m,2910188259,55.432913,POINT (-70.59839 -33.41998)
2368,99546720,8,N,2013-05-20,SUCURSAL,COMPANIA DE JESUS,2217,,,,SANTIAGO,SANTIAGO,XIII REGION METROPOLITANA,"COMPANIA DE JESUS 2217, SANTIAGO, Chile",COMERCIAL BARRERA HERMANOS SPA,0,alamedabuffer_4500m,253411470,42.68583,POINT (-70.66729 -33.44010)


In [226]:
test = original_nearest.iloc[0:250]
print(test.shape)
test.tail(1)

(250, 20)


Unnamed: 0,rut,dv,vigenci,fecha,tipo_di,calle,numero,bloque,departa,villa_p,ciudad,comuna,region,sngldrs,empresa,index_right,city,osmid,distance_node,geometry
366,76152699,5,S,2016-02-02,SUCURSAL,AV PAJARITOS,4500,,9,,SANTIAGO,MAIPU,XIII REGION METROPOLITANA,"AV PAJARITOS 4500, MAIPU, Chile",PJ CHILE SPA,0,alamedabuffer_4500m,443979608,37.375551,POINT (-70.74829 -33.48022)


#### Red buena calidad

In [327]:
osmid = 254346788
network = 'testarea' #alamedabuffer or testarea

In [323]:
red_buena_calidad_nearest = gpd.read_file(script27_output_dir + f"red_buena_calidad_batches_{network}/restaurantes_nearest.gpkg")
red_buena_calidad_batch_0 = gpd.read_file(script27_output_dir + f"red_buena_calidad_batches_{network}/restaurantes_batch0.gpkg")
print(f"Batch 0: {len(red_buena_calidad_batch_0.loc[red_buena_calidad_batch_0.osmid==osmid])}.")

if network == 'alamedabuffer':
    red_buena_calidad_batch_1 = gpd.read_file(script27_output_dir + f"red_buena_calidad_batches_{network}/restaurantes_batch1.gpkg")
    print(f"Batch 1: {len(red_buena_calidad_batch_1.loc[red_buena_calidad_batch_1.osmid==osmid])}.")
    red_buena_calidad_batch_2 = gpd.read_file(script27_output_dir + f"red_buena_calidad_batches_{network}/restaurantes_batch2.gpkg")
    print(f"Batch 2: {len(red_buena_calidad_batch_2.loc[red_buena_calidad_batch_2.osmid==osmid])}.")
    red_buena_calidad_batch_3 = gpd.read_file(script27_output_dir + f"red_buena_calidad_batches_{network}/restaurantes_batch3.gpkg")
    print(f"Batch 3: {len(red_buena_calidad_batch_3.loc[red_buena_calidad_batch_3.osmid==osmid])}.")
    red_buena_calidad_batch_4 = gpd.read_file(script27_output_dir + f"red_buena_calidad_batches_{network}/restaurantes_batch4.gpkg")
    print(f"Batch 4: {len(red_buena_calidad_batch_4.loc[red_buena_calidad_batch_4.osmid==osmid])}.")
    red_buena_calidad_batch_5 = gpd.read_file(script27_output_dir + f"red_buena_calidad_batches_{network}/restaurantes_batch5.gpkg")
    print(f"Batch 5: {len(red_buena_calidad_batch_5.loc[red_buena_calidad_batch_5.osmid==osmid])}.")

Batch 0: 1.


In [328]:
red_buena_calidad_batch_0.loc[red_buena_calidad_batch_0.osmid==osmid]

Unnamed: 0,osmid,index,x,y,street_count,dist_restaurantes_bar_cafe,restaurantes_bar_cafe_15min,geometry
2,254346788,2,-70.674783,-33.434185,4,3.61636,5,POINT (-70.67478 -33.43419)


In [329]:
red_buena_calidad_batch_1.loc[red_buena_calidad_batch_1.osmid==osmid]

Unnamed: 0,osmid,index,x,y,street_count,dist_restaurantes_bar_cafe,restaurantes_bar_cafe_15min,geometry


In [317]:
red_buena_calidad_batch_2.loc[red_buena_calidad_batch_2.osmid==osmid]

Unnamed: 0,osmid,index,x,y,street_count,dist_restaurantes_bar_cafe,restaurantes_bar_cafe_15min,geometry
1256,254343802,1522,-70.672945,-33.435041,4,0.0,1,POINT (-70.67295 -33.43504)


In [318]:
red_buena_calidad_batch_3.loc[red_buena_calidad_batch_3.osmid==osmid]

Unnamed: 0,osmid,index,x,y,street_count,dist_restaurantes_bar_cafe,restaurantes_bar_cafe_15min,geometry
1255,254343802,1522,-70.672945,-33.435041,4,0.0,2,POINT (-70.67295 -33.43504)


In [330]:
red_buena_calidad_batch_4.loc[red_buena_calidad_batch_4.osmid==osmid]

Unnamed: 0,osmid,index,x,y,street_count,dist_restaurantes_bar_cafe,restaurantes_bar_cafe_15min,geometry


In [331]:
red_buena_calidad_batch_5.loc[red_buena_calidad_batch_5.osmid==osmid]

Unnamed: 0,osmid,index,x,y,street_count,dist_restaurantes_bar_cafe,restaurantes_bar_cafe_15min,geometry


In [304]:
nearest_in_node = red_buena_calidad_nearest.loc[red_buena_calidad_nearest.osmid==osmid]
print(len(nearest_in_node))
print(nearest_in_node.original_id.unique())

4
[1176 1468 1497 1585]


#### Norte sur

In [187]:
norte_sur_batch_0 = gpd.read_file(script27_output_dir + "norte_sur_batches/restaurantes_batch0.gpkg")
norte_sur_batch_1 = gpd.read_file(script27_output_dir + "norte_sur_batches/restaurantes_batch1.gpkg")
norte_sur_batch_2 = gpd.read_file(script27_output_dir + "norte_sur_batches/restaurantes_batch2.gpkg")
norte_sur_batch_3 = gpd.read_file(script27_output_dir + "norte_sur_batches/restaurantes_batch3.gpkg")
norte_sur_batch_4 = gpd.read_file(script27_output_dir + "norte_sur_batches/restaurantes_batch4.gpkg")
norte_sur_batch_5 = gpd.read_file(script27_output_dir + "norte_sur_batches/restaurantes_batch5.gpkg")
norte_sur_nearest = gpd.read_file(script27_output_dir + "norte_sur_batches/restaurantes_nearest.gpkg")

In [204]:
nearest_in_node = norte_sur_nearest.loc[norte_sur_nearest.osmid==osmid]
print(len(nearest_in_node))
print(nearest_in_node.original_id.unique())

4
[1176 1468 1497 1585]


In [241]:
print(len(norte_sur_batch_0.loc[norte_sur_batch_0.osmid==osmid]))
print(len(norte_sur_batch_1.loc[norte_sur_batch_1.osmid==osmid]))
print(len(norte_sur_batch_2.loc[norte_sur_batch_2.osmid==osmid]))
print(len(norte_sur_batch_3.loc[norte_sur_batch_3.osmid==osmid]))
print(len(norte_sur_batch_4.loc[norte_sur_batch_4.osmid==osmid]))
print(len(norte_sur_batch_5.loc[norte_sur_batch_5.osmid==osmid]))

0
0
0
1
0
0


In [242]:
norte_sur_batch_3.loc[norte_sur_batch_3.osmid==osmid]

Unnamed: 0,osmid,index,x,y,street_count,dist_restaurantes_bar_cafe,restaurantes_bar_cafe_15min,geometry
1290,254346790,1581,-70.673038,-33.434066,4,1.45068,2,POINT (-70.67304 -33.43407)
