# OSM to PostGIS
In this notebook, we will work out how to convert OSM data into a format suitable for PostGIS.

In [1]:
import geopandas
from ipypb import track
from sqlalchemy import create_engine

## Database connection
We create an sqlalchemy database engine in order to connect to PostGIS.

In [3]:
engine = create_engine('postgresql://postgres:changeme@localhost:5432/')

## File/table mappings
We want to map filepaths to database tables, since we are uploading shapefiles to PostGIS tables.

In [4]:
table_map = [
    {
        "file_path": "data/OSM/gis_osm_buildings_a_free_1.shp",
        "table_name": "osm_buildings"
    },
    {
        "file_path": "data/OSM/gis_osm_landuse_a_free_1.shp",
        "table_name": "osm_landuse"
    },
    {
        "file_path": "data/OSM/gis_osm_natural_a_free_1.shp",
        "table_name": "osm_natural_polygon"
    },
    {
    
        "file_path": "data/OSM/gis_osm_natural_free_1.shp",
        "table_name": "osm_natural_features"
    },
    {
        "file_path": "data/OSM/gis_osm_places_a_free_1.shp",
        "table_name": "osm_places_polygon"
    },
    {
        "file_path": "data/OSM/gis_osm_places_free_1.shp",
        "table_name": "osm_places"
    },
    {
        "file_path": "data/OSM/gis_osm_pofw_a_free_1.shp",
        "table_name": "osm_places_of_worship_polygon"
    },
    {
        "file_path": "data/OSM/gis_osm_pofw_free_1.shp",
        "table_name": "osm_places_of_worship"
    },
    {
        "file_path": "data/OSM/gis_osm_pois_a_free_1.shp",
        "table_name": "osm_points_of_interest_polygon"
    },
    {
        "file_path": "data/OSM/gis_osm_pois_free_1.shp",
        "table_name": "osm_points_of_interest"
    },
    {
        "file_path": "data/OSM/gis_osm_railways_free_1.shp",
        "table_name": "osm_railways"
    },
    {
        "file_path": "data/OSM/gis_osm_roads_free_1.shp",
        "table_name": "osm_roads"
    },
    {
        "file_path": "data/OSM/gis_osm_traffic_a_free_1.shp",
        "table_name": "osm_traffic_polygon"
    },
    {
        "file_path": "data/OSM/gis_osm_traffic_free_1.shp",
        "table_name": "osm_traffic"
    },
    {
        "file_path": "data/OSM/gis_osm_transport_a_free_1.shp",
        "table_name": "osm_transport_polygon"
    },
    {
        "file_path": "data/OSM/gis_osm_transport_free_1.shp",
        "table_name": "osm_transport"
    },
    {
        "file_path": "data/OSM/gis_osm_water_a_free_1.shp",
        "table_name": "osm_water_polygon"
    },
    {
        "file_path": "data/OSM/gis_osm_waterways_free_1.shp",
        "table_name": "osm_waterways"
    },
]

## Upload shapefiles to PostGIS
We will parse all shapefiles to GeoDataFrames and upload them to PostGIS. Note: This step is very slow, and can take over ten minutes depending on the size of the OSM data set. For example, Finland data takes around 12 minutes to load fully on a local PostGIS instance (i.e. where network bandwidth is not an issue).

In [5]:
%%time
for item in track(table_map):
    # Open the file
    geodata = geopandas.read_file(item["file_path"])
    
    # Use OSM ID for index
    geodata.set_index("osm_id", inplace=True)
    
    number_of_rows = format(geodata.shape[0], ',d')
    
    print(f"{item['table_name']} step is processing {number_of_rows} rows...")
    
    # Write data to PostGIS
    geodata.to_postgis(
        con=engine,
        name=item["table_name"],
        if_exists="replace",
        chunksize=100
    )

osm_buildings step is processing 1,879,942 rows...
osm_landuse step is processing 712,010 rows...
osm_natural_polygon step is processing 2,481 rows...
osm_natural_features step is processing 72,560 rows...
osm_places_polygon step is processing 9,705 rows...
osm_places step is processing 12,759 rows...
osm_places_of_worship_polygon step is processing 1,941 rows...
osm_places_of_worship step is processing 474 rows...
osm_points_of_interest_polygon step is processing 40,114 rows...
osm_points_of_interest step is processing 71,187 rows...
osm_railways step is processing 21,051 rows...
osm_roads step is processing 1,375,410 rows...
osm_traffic_polygon step is processing 38,116 rows...
osm_traffic step is processing 158,358 rows...
osm_transport_polygon step is processing 150 rows...
osm_transport step is processing 94,639 rows...
osm_water_polygon step is processing 383,529 rows...
osm_waterways step is processing 72,367 rows...
CPU times: user 10min 13s, sys: 9.4 s, total: 10min 23s
Wall t

## Add indexes
We will add indexes to table columns like `geometry` and `fclass`. We specify a column name and corresponding index type. Then we create a SQL statement and execute it.

In [6]:
index_columns = [
    {
        "name": "fclass",
        "index_type": "BTREE"
    },
    {
        "name": "geometry",
        "index_type": "GIST"
    }
]

In [7]:
%%time
connection = engine.connect()

for item in track(table_map):

    for index_column in index_columns:
        index_name = f"{ item['table_name'] }_{ index_column['name'] }_idx"
        print(f"Creating { index_name } index...")
        
        sql_statement = f"""
            DROP INDEX IF EXISTS { index_name };
            CREATE INDEX { index_name }
            ON { item["table_name"] }
            USING { index_column["index_type"] } ({ index_column["name"] });
        """

        connection.execute(sql_statement)

connection.close()

Creating osm_buildings_fclass_idx index...
Creating osm_buildings_geometry_idx index...
Creating osm_landuse_fclass_idx index...
Creating osm_landuse_geometry_idx index...
Creating osm_natural_polygon_fclass_idx index...
Creating osm_natural_polygon_geometry_idx index...
Creating osm_natural_features_fclass_idx index...
Creating osm_natural_features_geometry_idx index...
Creating osm_places_polygon_fclass_idx index...
Creating osm_places_polygon_geometry_idx index...
Creating osm_places_fclass_idx index...
Creating osm_places_geometry_idx index...
Creating osm_places_of_worship_polygon_fclass_idx index...
Creating osm_places_of_worship_polygon_geometry_idx index...
Creating osm_places_of_worship_fclass_idx index...
Creating osm_places_of_worship_geometry_idx index...
Creating osm_points_of_interest_polygon_fclass_idx index...
Creating osm_points_of_interest_polygon_geometry_idx index...
Creating osm_points_of_interest_fclass_idx index...
Creating osm_points_of_interest_geometry_idx ind