# 05c-download_december2020_osmnx_network_metropolis2020

This notebook runs the cities of metropolis>metro_gdf_2020, creates a Network (G,nodes,edges) for a bbox around the city and saves it to db.

This notebook downloads OSMnx data from __December 2020__ because the __mexican 2020 census__ was took place between March 2nd and 27th, 2020, __but April 2020 network was found to be partially incomplete.__ The tests for downloading specific OSMnx dates can be found on notebook 00b.

#### __From OSMnx module: Date must be in form of yyyy-mm-ddThh:mm:ssZ__
#### __Example: '[out:json][timeout:90][date:"2019-10-28T19:20:00Z"]'__

* First part runs for all cities except for ZMVM (Too heavy).
* Second part runs ZMVM by municipality (Bboxes overlap with each other, must drop duplicates)
* Third part are tests.

## Import libraries

In [1]:
main_folder_path = '../../'

In [2]:
import os
import sys

import pandas as pd
import geopandas as gpd
import osmnx as ox
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

module_path = os.path.abspath(os.path.join(main_folder_path))
if module_path not in sys.path:
    sys.path.append(module_path)
    import aup

## Load all 2020 cities

In [3]:
# gdf_mun
metro_schema = 'metropolis'
metro_table = 'metro_gdf_2020'

query = f"SELECT * FROM {metro_schema}.{metro_table}"
metro_gdf = aup.gdf_from_query(query, geometry_col='geometry')
metro_gdf = metro_gdf.set_crs("EPSG:4326")

city_list = list(metro_gdf.city.unique())

# Show
print(metro_gdf.shape)
print(len(city_list))
print(city_list)
metro_gdf.head(1)

(367, 6)
71
['Aguascalientes', 'Ensenada', 'Mexicali', 'Tijuana', 'La Paz', 'Los Cabos', 'Campeche', 'Laguna', 'Monclova', 'Piedras Negras', 'Saltillo', 'Colima', 'Tapachula', 'Tuxtla', 'Chihuahua', 'Delicias', 'Juarez', 'CDMX', 'ZMVM', 'Durango', 'Celaya', 'Guanajuato', 'Leon', 'Irapuato', 'Acapulco', 'Chilpancingo', 'Pachuca', 'Tulancingo', 'Guadalajara', 'Vallarta', 'Piedad', 'Toluca', 'Morelia', 'Zamora', 'Uruapan', 'Cuautla', 'Cuernavaca', 'Tepic', 'Monterrey', 'Oaxaca', 'Puebla', 'San Martin', 'Tehuacan', 'Queretaro', 'Cancun', 'Chetumal', 'Playa', 'SLP', 'Culiacan', 'Los Mochis', 'Mazatlan', 'Guaymas', 'Ciudad Obregon', 'Hermosillo', 'Nogales', 'Villahermosa', 'Victoria', 'Matamoros', 'Nuevo Laredo', 'Reynosa', 'Tampico', 'Tlaxcala', 'Coatzacoalcos', 'Cordoba', 'Minatitlan', 'Orizaba', 'Poza Rica', 'Veracruz', 'Xalapa', 'Merida', 'Zacatecas']


Unnamed: 0,CVEGEO,CVE_ENT,CVE_MUN,NOMGEO,geometry,city
0,1001,1,1,Aguascalientes,"POLYGON ((-102.10641 22.06035, -102.10368 22.0...",Aguascalientes


## First part - Run function create_osmnx_network for each city (Except for ZMVM) and save

In [9]:
# Date to query
query_date = '[out:json][timeout:90][date:"2020-12-31T23:59:00Z"]'

# Save locally?
local_save = False
local_save_dir = main_folder_path + f"data/processed/networks/"
# Save to database?
db_save = False
nodes_table = 'nodes_osmnx_20_point'
edges_table = 'edges_osmnx_20_line'

# Test configuration
# If test, saves locally only (overrides local_save and save_to_db vars.) and runs test_city_lst only
test = False
test_city_lst = ['Aguascalientes']

In [5]:
# Test run
if test:
    # Configuration for test_city_list
    city_list = test_city_lst
    skip_city_list = []
    nodes_processed_city_list = []
    edges_processed_city_list = []
    i = 0
    k = len(city_list)
    # Save configuration
    local_save = True
    db_save = False
# Complete run
else:
    k = len(city_list)
    
    # prevent cities being analyzed several times in case of a crash
    nodes_processed_city_list = []
    edges_processed_city_list = []
    try:
        query = f"SELECT city FROM osmnx.{nodes_table}"
        nodes_processed = aup.df_from_query(query)
        nodes_processed_city_list = list(nodes_processed.city.unique())
        query = f"SELECT city FROM osmnx.{edges_table}"
        edges_processed = aup.df_from_query(query)
        edges_processed_city_list = list(edges_processed.city.unique())
    except:
        nodes_processed_city_list = []
        edges_processed_city_list = []
    
    # PRINT NODES PROGRESS
    missing_cities_nodes = []
    for city in city_list:
        if city not in nodes_processed_city_list:
            missing_cities_nodes.append(city)
    i = len(nodes_processed_city_list)
    print(f'Already processed nodes for ({i}/{k}) cities.')
    print(f'Missing nodes procesing for cities: {missing_cities_nodes}')
    # PRINT EDGES PROGRESS
    missing_cities_edges = []
    for city in city_list:
        if city not in edges_processed_city_list:
            missing_cities_edges.append(city)
    j = len(nodes_processed_city_list)
    print(f'Already processed nodes for ({i}/{k}) cities.')
    print(f'Missing nodes procesing for cities: {missing_cities_edges}')    
    
    # SKIP SPECIFIC CITIES
    skip_city_list = ['ZMVM'] #Skipping 'ZMVM' because of size
    k = k - len(skip_city_list)
    print(f'Removing {len(skip_city_list)} cities from processing. Total cities: {k}. Cities removed:')
    print(skip_city_list)

# RUN ALL OTHER CITIES
for city in city_list:
    if city not in skip_city_list:
        if (city not in nodes_processed_city_list) and (city not in edges_processed_city_list):
            print("--"*40)
            i = i + 1
            print(f"Starting city {i}/{k}: {city}")
        
            # Load area of interest
            city_gdf = metro_gdf.loc[metro_gdf.city == city]
            aoi = city_gdf.dissolve()
            
            # Create Network
            G,nodes,edges = aup.create_osmnx_network(aoi, 
                                                     how='from_bbox', 
                                                     network_type='all_private', 
                                                     specific_date=(True,query_date))
            # Add city data
            nodes['city'] = city
            edges['city'] = city

            # Reset index 
            # Function create_osmnx_network() returns 'osmid' as nodes index and 'u','v' and 'key' as edges index.
            # This indexed columns are not uploaded to the database if kept as index.
            nodes.reset_index(inplace=True)
            edges.reset_index(inplace=True)

            # Save network locally
            if local_save:
                print(f"Uploading {city} nodes locally.")
                nodes.to_file(local_save_dir + f"{city}_apr2020_nodes", driver='GPKG')
                print(f"Uploading {city} edges locally.")
                edges.to_file(local_save_dir + f"{city}_apr2020_edges", driver='GPKG')
            
            # Save network to database
            if db_save:
                print(f"Uploading {city} nodes to database")
                aup.gdf_to_db_slow(nodes, nodes_table, 'osmnx', if_exists='append')
                print(f"Uploading {city} edges to database")
                aup.gdf_to_db_slow(edges, edges_table, 'osmnx', if_exists='append')

        else:
            print("--"*40)
            print(f"{city} already processed. ({i}/{k})")
            
    else:
        print("--"*40)
        print(f"SKIPPED {city}")

Already processed nodes for (0/71) cities.
Missing nodes procesing for cities: ['Aguascalientes', 'Ensenada', 'Mexicali', 'Tijuana', 'La Paz', 'Los Cabos', 'Campeche', 'Laguna', 'Monclova', 'Piedras Negras', 'Saltillo', 'Colima', 'Tapachula', 'Tuxtla', 'Chihuahua', 'Delicias', 'Juarez', 'CDMX', 'ZMVM', 'Durango', 'Celaya', 'Guanajuato', 'Leon', 'Irapuato', 'Acapulco', 'Chilpancingo', 'Pachuca', 'Tulancingo', 'Guadalajara', 'Vallarta', 'Piedad', 'Toluca', 'Morelia', 'Zamora', 'Uruapan', 'Cuautla', 'Cuernavaca', 'Tepic', 'Monterrey', 'Oaxaca', 'Puebla', 'San Martin', 'Tehuacan', 'Queretaro', 'Cancun', 'Chetumal', 'Playa', 'SLP', 'Culiacan', 'Los Mochis', 'Mazatlan', 'Guaymas', 'Ciudad Obregon', 'Hermosillo', 'Nogales', 'Villahermosa', 'Victoria', 'Matamoros', 'Nuevo Laredo', 'Reynosa', 'Tampico', 'Tlaxcala', 'Coatzacoalcos', 'Cordoba', 'Minatitlan', 'Orizaba', 'Poza Rica', 'Veracruz', 'Xalapa', 'Merida', 'Zacatecas']
Already processed nodes for (0/71) cities.
Missing nodes procesing for 

  multi_poly_proj = utils_geo._consolidate_subdivide_geometry(poly_proj)


Created OSMnx graph from bounding box.
Converted OSMnx graph to 40856 nodes and 115794 edges GeoDataFrame.
Filtered columns.
Column: osmid in edges gdf, has a list in it, the column data was converted to string.
Column: lanes in edges gdf, has a list in it, the column data was converted to string.
Column: name in edges gdf, has a list in it, the column data was converted to string.
Column: highway in edges gdf, has a list in it, the column data was converted to string.
Column: maxspeed in edges gdf, has a list in it, the column data was converted to string.
Column: ref in edges gdf, has a list in it, the column data was converted to string.
Column: width in edges gdf, has a list in it, the column data was converted to string.
Column: service in edges gdf, has a list in it, the column data was converted to string.
Uploading Ensenada nodes to database
Uploading Ensenada edges to database
--------------------------------------------------------------------------------
Starting city 3/70: 

  multi_poly_proj = utils_geo._consolidate_subdivide_geometry(poly_proj)


Created OSMnx graph from bounding box.
Converted OSMnx graph to 77887 nodes and 226006 edges GeoDataFrame.
Filtered columns.
Column: osmid in edges gdf, has a list in it, the column data was converted to string.
Column: lanes in edges gdf, has a list in it, the column data was converted to string.
Column: name in edges gdf, has a list in it, the column data was converted to string.
Column: highway in edges gdf, has a list in it, the column data was converted to string.
Column: maxspeed in edges gdf, has a list in it, the column data was converted to string.
Column: ref in edges gdf, has a list in it, the column data was converted to string.
Column: access in edges gdf, has a list in it, the column data was converted to string.
Column: service in edges gdf, has a list in it, the column data was converted to string.
Uploading Mexicali nodes to database
Uploading Mexicali edges to database
--------------------------------------------------------------------------------
Starting city 4/70:

  multi_poly_proj = utils_geo._consolidate_subdivide_geometry(poly_proj)


Created OSMnx graph from bounding box.
Converted OSMnx graph to 22359 nodes and 63944 edges GeoDataFrame.
Filtered columns.
Column: osmid in edges gdf, has a list in it, the column data was converted to string.
Column: lanes in edges gdf, has a list in it, the column data was converted to string.
Column: name in edges gdf, has a list in it, the column data was converted to string.
Column: highway in edges gdf, has a list in it, the column data was converted to string.
Column: maxspeed in edges gdf, has a list in it, the column data was converted to string.
Column: ref in edges gdf, has a list in it, the column data was converted to string.
Uploading La Paz nodes to database
Uploading La Paz edges to database
--------------------------------------------------------------------------------
Starting city 6/70: Los Cabos
Extracted min and max coordinates from the municipality. Polygon N:23.67148, S:22.87195, E-109.41317, W-110.12041.
Created OSMnx graph from bounding box.
Converted OSMnx g

  multi_poly_proj = utils_geo._consolidate_subdivide_geometry(poly_proj)


Created OSMnx graph from bounding box.
Converted OSMnx graph to 36814 nodes and 107472 edges GeoDataFrame.
Added column width for edges.
Filtered columns.
Column: osmid in edges gdf, has a list in it, the column data was converted to string.
Column: lanes in edges gdf, has a list in it, the column data was converted to string.
Column: name in edges gdf, has a list in it, the column data was converted to string.
Column: highway in edges gdf, has a list in it, the column data was converted to string.
Column: maxspeed in edges gdf, has a list in it, the column data was converted to string.
Uploading Monclova nodes to database
Uploading Monclova edges to database
--------------------------------------------------------------------------------
Starting city 10/70: Piedras Negras
Extracted min and max coordinates from the municipality. Polygon N:28.94075, S:28.3331, E-100.3968, W-100.8393.
Created OSMnx graph from bounding box.
Converted OSMnx graph to 15384 nodes and 44289 edges GeoDataFram

  multi_poly_proj = utils_geo._consolidate_subdivide_geometry(poly_proj)


Created OSMnx graph from bounding box.
Converted OSMnx graph to 181819 nodes and 471397 edges GeoDataFrame.
Filtered columns.
Column: osmid in edges gdf, has a list in it, the column data was converted to string.
Column: lanes in edges gdf, has a list in it, the column data was converted to string.
Column: name in edges gdf, has a list in it, the column data was converted to string.
Column: highway in edges gdf, has a list in it, the column data was converted to string.
Column: maxspeed in edges gdf, has a list in it, the column data was converted to string.
Column: ref in edges gdf, has a list in it, the column data was converted to string.
Column: access in edges gdf, has a list in it, the column data was converted to string.
Column: width in edges gdf, has a list in it, the column data was converted to string.
Column: service in edges gdf, has a list in it, the column data was converted to string.
Uploading Saltillo nodes to database
Uploading Saltillo edges to database
------------

  multi_poly_proj = utils_geo._consolidate_subdivide_geometry(poly_proj)


Created OSMnx graph from bounding box.
Converted OSMnx graph to 82415 nodes and 227843 edges GeoDataFrame.
Filtered columns.
Column: osmid in edges gdf, has a list in it, the column data was converted to string.
Column: lanes in edges gdf, has a list in it, the column data was converted to string.
Column: name in edges gdf, has a list in it, the column data was converted to string.
Column: highway in edges gdf, has a list in it, the column data was converted to string.
Column: maxspeed in edges gdf, has a list in it, the column data was converted to string.
Column: ref in edges gdf, has a list in it, the column data was converted to string.
Column: access in edges gdf, has a list in it, the column data was converted to string.
Uploading Chihuahua nodes to database
Uploading Chihuahua edges to database
--------------------------------------------------------------------------------
Starting city 16/70: Delicias
Extracted min and max coordinates from the municipality. Polygon N:28.60833,

  multi_poly_proj = utils_geo._consolidate_subdivide_geometry(poly_proj)


Created OSMnx graph from bounding box.
Converted OSMnx graph to 43542 nodes and 116967 edges GeoDataFrame.
Filtered columns.
Column: osmid in edges gdf, has a list in it, the column data was converted to string.
Column: lanes in edges gdf, has a list in it, the column data was converted to string.
Column: name in edges gdf, has a list in it, the column data was converted to string.
Column: highway in edges gdf, has a list in it, the column data was converted to string.
Column: maxspeed in edges gdf, has a list in it, the column data was converted to string.
Column: ref in edges gdf, has a list in it, the column data was converted to string.
Column: service in edges gdf, has a list in it, the column data was converted to string.
Uploading Hermosillo nodes to database
Uploading Hermosillo edges to database
--------------------------------------------------------------------------------
Starting city 54/70: Nogales
Extracted min and max coordinates from the municipality. Polygon N:31.3771

## Second part - Run function create_osmnx_network for ZMVM 
#### (Uploads one municipality at a time, creating bbox overlaps. __Must delete duplicates.__)

In [8]:
# Filter for ZMVM municipalities
city = 'ZMVM'
zmvm_gdf = metro_gdf.loc[metro_gdf.city == city]
k = len(list(zmvm_gdf.NOMGEO.unique()))
i = 1

for nomgeo in list(zmvm_gdf.NOMGEO.unique()):
    
    print(f"Starting mun {i}/{k}: {nomgeo}")

    # Load area of interest
    mun_gdf = zmvm_gdf.loc[zmvm_gdf.NOMGEO == nomgeo]
    aoi = mun_gdf.dissolve()
    
    # Create Network
    G,nodes,edges = aup.create_osmnx_network(aoi, 
                                             how='from_bbox', 
                                             network_type='all_private', 
                                             specific_date=(True,query_date))
    
    # Add city data
    nodes['city'] = city
    edges['city'] = city

    # Reset index 
    # Function create_osmnx_network() returns 'osmid' as nodes index and 'u','v' and 'key' as edges index.
    # This indexed columns are not uploaded to the database if kept as index.
    nodes.reset_index(inplace=True)
    edges.reset_index(inplace=True)
    
    # Save network to database
    if db_save:
        print(f"Uploading nodes for mun {nomgeo} of {city}.")
        aup.gdf_to_db_slow(nodes, nodes_table, 'osmnx', if_exists='append')
        print(f"Uploading edges for mun {nomgeo} of {city}.")
        aup.gdf_to_db_slow(edges, edges_table, 'osmnx', if_exists='append')

    i = i+1

Starting mun 1/47: Atotonilco de Tula
Extracted min and max coordinates from the municipality. Polygon N:20.04706, S:19.87349, E-99.14251, W-99.31103.
Created OSMnx graph from bounding box.
Converted OSMnx graph to 10750 nodes and 26270 edges GeoDataFrame.
Added column access for edges.
Filtered columns.
Column: osmid in edges gdf, has a list in it, the column data was converted to string.
Column: name in edges gdf, has a list in it, the column data was converted to string.
Column: highway in edges gdf, has a list in it, the column data was converted to string.
Column: maxspeed in edges gdf, has a list in it, the column data was converted to string.
Uploading nodes for mun Atotonilco de Tula of ZMVM.
Uploading edges for mun Atotonilco de Tula of ZMVM.
Starting mun 2/47: Tizayuca
Extracted min and max coordinates from the municipality. Polygon N:19.90671, S:19.79857, E-98.90298, W-99.02058.
Created OSMnx graph from bounding box.
Converted OSMnx graph to 13204 nodes and 32388 edges GeoDa

## Third part - Tests

### NaNs in downloaded data [SOLVED]
#### __Problem:__ Nans were found in ZMVM 'osmid', 'u', 'v' and other columns. Check all cities.
#### __Result:__ Just ZMVM. Was missing .reset_index() in cell that runs ZMVM.

In [16]:
i = 0
k = len(city_list)
nodes_table = 'nodes_osmnx_20_point'
edges_table = 'edges_osmnx_20_line'

#city_tmp_lst = ['Aguascalientes']

for city in city_list:
    print(f"Reviewing city {i}/{k}: {city}")

    query = f"SELECT osmid,geometry FROM osmnx.{nodes_table} WHERE \"city\" LIKE \'{city}\'"
    nodes_gdf = aup.gdf_from_query(query, geometry_col='geometry')
    test = nodes_gdf.isnull().values.any()
    if test == True:
        print(f"NANS IN {city} NODES.")

    query = f"SELECT u,v,key,geometry FROM osmnx.{edges_table} WHERE \"city\" LIKE \'{city}\'"
    edges_gdf = aup.gdf_from_query(query, geometry_col='geometry')
    test = edges_gdf.isnull().values.any()
    if test == True:
        print(f"NANS IN {city} EDGES.")

    i+=1

Reviewing city 0/71: Aguascalientes
Reviewing city 1/71: Ensenada
Reviewing city 2/71: Mexicali
Reviewing city 3/71: Tijuana
Reviewing city 4/71: La Paz
Reviewing city 5/71: Los Cabos
Reviewing city 6/71: Campeche
Reviewing city 7/71: Laguna
Reviewing city 8/71: Monclova
Reviewing city 9/71: Piedras Negras
Reviewing city 10/71: Saltillo
Reviewing city 11/71: Colima
Reviewing city 12/71: Tapachula
Reviewing city 13/71: Tuxtla
Reviewing city 14/71: Chihuahua
Reviewing city 15/71: Delicias
Reviewing city 16/71: Juarez
Reviewing city 17/71: CDMX
Reviewing city 18/71: ZMVM
NANS IN ZMVM NODES.
NANS IN ZMVM EDGES.
Reviewing city 19/71: Durango
Reviewing city 20/71: Celaya
Reviewing city 21/71: Guanajuato
Reviewing city 22/71: Leon
Reviewing city 23/71: Irapuato
Reviewing city 24/71: Acapulco
Reviewing city 25/71: Chilpancingo
Reviewing city 26/71: Pachuca
Reviewing city 27/71: Tulancingo
Reviewing city 28/71: Guadalajara
Reviewing city 29/71: Vallarta
Reviewing city 30/71: Piedad
Reviewing ci