# San Jose Sidewalks Data Processing Documentation

## Summary
Use Docker for POSTGIS, run some queries (I use Python Geopandas to get the data into POSTGIS and document the queries), validate and export using QGIS.

### Get a Docker instance for POSTGIS (i'm using a Mac).

I'm using this instance:
https://hub.docker.com/r/kartoza/postgis/

Run Commands
```docker pull kartoza/postgis```

```docker run --name "postgis" -p 5432:5432 -d -t kartoza/postgis```

username, password, both are docker

#### Postgis Version
POSTGIS="2.3.2 r15302" GEOS="3.4.2-CAPI-1.8.2 r3921" SFCGAL="1.2.2" 
PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.10.1, released 2013/08/26" 
LIBXML="2.9.1" LIBJSON="0.11.99" TOPOLOGY RASTER

In [49]:
import geopandas as gpd
import pandas as pd
import sqlalchemy as sal
from sqlalchemy import create_engine

# from shapely.geometry import Point
# import json

In [4]:
#Read in wgs 84 sidewalks file.
sidewalks = gpd.read_file('84/sj_sidewalks_84.shp')

#Connect to local docker POSTGIS instance
engine = sal.create_engine('postgresql://docker:docker@0.0.0.0/gis', echo=True)
conn = engine.connect()

2017-09-02 11:09:17,702 INFO sqlalchemy.engine.base.Engine select version()
2017-09-02 11:09:17,703 INFO sqlalchemy.engine.base.Engine {}
2017-09-02 11:09:17,709 INFO sqlalchemy.engine.base.Engine select current_schema()
2017-09-02 11:09:17,710 INFO sqlalchemy.engine.base.Engine {}
2017-09-02 11:09:17,715 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2017-09-02 11:09:17,716 INFO sqlalchemy.engine.base.Engine {}
2017-09-02 11:09:17,720 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2017-09-02 11:09:17,722 INFO sqlalchemy.engine.base.Engine {}
2017-09-02 11:09:17,730 INFO sqlalchemy.engine.base.Engine show standard_conforming_strings
2017-09-02 11:09:17,731 INFO sqlalchemy.engine.base.Engine {}


In [None]:
## Load data into the POSTGIS server
# https://stackoverflow.com/questions/38361336/write-geodataframe-into-sql-database

# Function to generate WKB hex
def wkb_hexer(line):
    return line.wkb_hex


gdf = sidewalks
# Convert `'geom'` column in GeoDataFrame `gdf` to hex
# Note that following this step, the GeoDataFrame is just a regular DataFrame
# because it does not have a geometry column anymore. Also note that
# it is assumed the `'geom'` column is correctly datatyped.
gdf['geom'] = gdf['geometry'].apply(wkb_hexer)

#Delete extra WKT geometry colum.
del gdf['geometry']

#Empty column.
del gdf['WIDTH']

table_name = 'orig_sj_sidewalks'

# Connect to database using a context manager
gdf.to_sql(table_name, con=conn, if_exists='append', index=False)


#Convert the geom column to a geometry.
sql = """ALTER TABLE sideys
ALTER COLUMN geom TYPE Geometry(POLYGON, 4326)
                  USING ST_SetSRID(geom::Geometry, 4326)"""
conn.execute(sql)

You may need to do a series of these commands to get your postgis running the right SFCGAL pacakges.

```CREATE extension postgis_sfcgal```

```SET postgis.backend = sfcgal;```

```SET postgis.backend = geos;```

Run this command to test to see if SFCGAL is working:
```SELECT ST_ApproximateMedialAxis(ST_GeomFromText('POLYGON (( 190 190, 10 190, 10 10, 190 10, 190 20, 160 30, 60 30, 60 130, 190 140, 190 190 ))'));```

## POSTGIS SQL Processing Commands  

In [None]:
# Take the 44k sidewalk polygons and merge them all into one big geometry
# There are lots of sidewalk polygons next to each other that refer to connected sidewalks.  
# During this step we lose the ADACOMPLY, COVERED, and other fields.
sql = """CREATE TABLE union_sideys as SELECT ST_UNION(geom) as geom FROM orig_sj_sidewalks;"""
conn.execute(sql)

In [6]:
# Break out the single polygon into the individual polygons.  This produces about 7375 total rows.
sql = """CREATE TABLE unjoined_sideys as SELECT (ST_Dump(geom)).geom AS geom FROM union_sideys;"""
conn.execute(sql)

2017-09-02 18:01:51,900 INFO sqlalchemy.engine.base.Engine CREATE TABLE unjoined_sideys as SELECT (ST_Dump(geom)).geom AS geom FROM union_sideys;
2017-09-02 18:01:51,902 INFO sqlalchemy.engine.base.Engine {}
2017-09-02 18:01:53,375 INFO sqlalchemy.engine.base.Engine COMMIT


<sqlalchemy.engine.result.ResultProxy at 0x113abc940>

In [15]:
#Give the table some ids because of the next step.
#https://dba.stackexchange.com/questions/20801/most-efficient-way-to-add-a-serial-column-to-a-huge-table
sql = """ALTER TABLE unjoined_sideys ADD column id bigserial;"""
conn.execute(sql)

2017-09-02 18:46:31,803 INFO sqlalchemy.engine.base.Engine ALTER TABLE unjoined_sideys ADD column id bigserial;
2017-09-02 18:46:31,805 INFO sqlalchemy.engine.base.Engine {}
2017-09-02 18:46:32,895 INFO sqlalchemy.engine.base.Engine COMMIT


<sqlalchemy.engine.result.ResultProxy at 0x1160e8668>

In [18]:
# Create the empty table that will contain the sidewalk strings.
sql = """CREATE TABLE approx_medial (id bigserial, LINESTRING geometry); """
conn.execute(sql)

2017-09-02 18:53:15,178 INFO sqlalchemy.engine.base.Engine CREATE TABLE approx_medial (id bigserial, LINESTRING geometry); 
2017-09-02 18:53:15,180 INFO sqlalchemy.engine.base.Engine {}
2017-09-02 18:53:15,230 INFO sqlalchemy.engine.base.Engine COMMIT


<sqlalchemy.engine.result.ResultProxy at 0x11620d630>

In [47]:
# Individually process each of the 7375 total polygons.  This algorithm mostly produces the
# middle line of the polygons represented by a set of lines which is close to what we want.
# https://postgis.net/docs/manual-2.2/ST_ApproximateMedialAxis.html

# Unfortunately 67 of the 7.3k rows will fail this algorithm since: 
# "ERROR: straight skeleton of Polygon with touching interior rings is not implemented"
# Row 6703 seems to crash the databse.

error_rows = []
for row in range(7374,7375):
    sql = """INSERT INTO approx_medial (SELECT id, ST_ApproximateMedialAxis(geom) AS linestring FROM unjoined_sideys WHERE id = %s);""" % (row)
    try:
        conn.execute(sql)
    except:
        error_rows.append(row)

2017-09-02 20:03:42,000 INFO sqlalchemy.engine.base.Engine INSERT INTO approx_medial (SELECT id, ST_ApproximateMedialAxis(geom) AS linestring FROM unjoined_sideys WHERE id = 7374);
2017-09-02 20:03:42,002 INFO sqlalchemy.engine.base.Engine {}
2017-09-02 20:03:42,507 INFO sqlalchemy.engine.base.Engine COMMIT


In [None]:
error_rows = [1165, 1166, 1422, 1680, 1696, 2024, 2025, 2247, 2381, 2382, 2874, 2979, 3255, 3256, 3537, 3538, 3539, 3540, 3541, 3542, 3543, 3544, 3545, 3546, 3547, 3548, 3549, 3868, 3869, 3941, 3942, 4120, 4121, 4465, 4630, 4631, 4632, 4633, 4634, 4753, 4754, 4956, 4957, 4958, 4959, 4960, 5182, 5183, 5184, 5185, 5790, 5791, 6043, 6173, 6174, 6175, 6229, 6471, 6472, 6542, 6613, 6644, 6703, 6774, 6866, 6867, 7046]

In [48]:
print(error_rows)
len(error_rows)

[1165, 1166, 1422, 1680, 1696, 2024, 2025, 2247, 2381, 2382, 2874, 2979, 3255, 3256, 3537, 3538, 3539, 3540, 3541, 3542, 3543, 3544, 3545, 3546, 3547, 3548, 3549, 3868, 3869, 3941, 3942, 4120, 4121, 4465, 4630, 4631, 4632, 4633, 4634, 4753, 4754, 4956, 4957, 4958, 4959, 4960, 5182, 5183, 5184, 5185, 5790, 5791, 6043, 6173, 6174, 6175, 6229, 6471, 6472, 6542, 6613, 6644, 6703, 6774, 6866, 6867, 7046]


67

Access validate and save with QGIS!