# Join labels for every building

This notebook creates a single table containing a row for every building and a column for each city delineation used in the comparison section of the published paper (Section 5): administrative municipalities, AUDES, and our own A-DBSCAN boundaries. The result is a single file (`pts_multi_lbls.parquet`) with a row per building and the building's label for each delineation.

## Dependencies and data

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

import os, time
from time import gmtime, strftime
import geopandas as gpd
import multiprocessing as mp
from shapely.geometry import Point
import pandas as pd
import numpy as np
import tools

* Read data

In [None]:
%time xys = pd.read_parquet('xys.parquet.gzip')

cities_path = "solution_rep1000_eps2000_mp2000_thr90.gpkg"
cities = gpd.read_file(cities_path)\
            .cx[:, 3500000:]\
            .set_index('lbls')

aus_path = 'audes2010-au-pol.gpkg'
aus = gpd.read_file(aus_path)
aus = aus.to_crs(cities.crs)\
         .cx[:, 3500000:]

munis_path = 'audes2010-mun.gpkg'
munis = gpd.read_file(munis_path)
munis = munis.to_crs(cities.crs)\
         .cx[:, 3500000:]

* Create `Point` objects from `xys` coordinates

In [None]:
%%time
def ptfy(xy):
    return Point(*xy)
def ptfy_partition(xys):
    out = pd.Series(map(ptfy, xys.values), 
                    index=xys.index)
    return out

try:
    print('Quick read...')
    pts = tools.read_geoparquet('pts_geoms.parquet')
except:
    pool = mp.Pool(mp.cpu_count())
    xys_chunked = np.array_split(xys, mp.cpu_count())
    pts = gpd.GeoSeries(pool.map(ptfy_partition,
                                 xys_chunked), 
                        crs=cities.crs)
    pts = pd.concat(pts.tolist())
    tools.write_geoparquet(gpd.GeoDataFrame({'geometry': pts}), 
                           'pts_geoms.parquet')

In [None]:
pts.crs = munis.crs

## (Multi-core) spatial joins

* Spatial join between `pts` and `munis` to link municipality ID to `pts`

---

Multi-CPU implementation

In [None]:
%%time

pts_w_lbls = tools.p_sjoin(pts, munis[['geometry', 'CODINE']])

* Spatial join between `pts` and `aus` to link Urban Area ID to `pts`

In [None]:
aus.crs

In [None]:
pts.crs

In [None]:
%%time

pts_w_ua = tools.p_sjoin(pts, aus[['geometry', 'AU']])

* Join delineation ID to table

In [None]:
p = 'solution_rep1000_eps2000_mp2000_thr90.parquet'
#p = '../output/revision/' + p
%time votes = pd.read_parquet(p)\
                .set_index('id')

In [None]:
db = votes.join(pts_w_lbls[['CODINE']])\
          .join(pts_w_ua[['AU']])\
          .reset_index()\
          .rename(columns={'index': 'pt_id'})

* Write resulting table to Parquet file (`pts_multi_lbls.parquet`)

In [None]:
db.to_parquet('pts_multi_lbls.parquet')