# `numba` tests

This notebook documents and serves as a scratchpad for exploring `numba`-based acceleration on areal interpolation.

**NOTE** - To be removed/relocated once/if functionality is merged

---

**IMPORTANT**

As of Dec. 17th'20, the multi-core implementation requires the versions in `main` for `pygeos` and `geopandas`. On a working environment with the latest released versions (as the `gds_env:5.0`), this can be achieved by:

```shell
pip install --no-deps git+https://github.com/pygeos/pygeos.git
pip install --no-deps git+https://github.com/geopandas/geopandas.git
```

---

In [19]:
from tobler.area_weighted.area_interpolate import _area_tables_binning, area_tables_binning_numba
import geopandas, pandas

summary = lambda src, tgt: print(
    f"Transfer {src.shape[0]} polygons into {tgt.shape[0]}"
)

## Data setup

- Minimal problem

In [2]:
p = ("https://geographicdata.science/book/_downloads/"\
     "f2341ee89163afe06b42fc5d5ed38060/sandiego_tracts.gpkg")
src = geopandas.read_file(p)

p = ("https://geographicdata.science/book/_downloads/"\
     "d740a1069144baa1302b9561c3d31afe/sd_h3_grid.gpkg")
tgt = geopandas.read_file(p).to_crs(src.crs)

w, s, e, n = tgt.total_bounds
#src = src.cx[w:e, s:n]
summary(src, tgt)

Transfer 628 polygons into 644


- Slightly larger problem

In [26]:
# Tracts
p = "https://ndownloader.figshare.com/files/20460645"
src = geopandas.read_file(p)
src = pandas.concat([src]*50)

# Precincts
p = "https://ndownloader.figshare.com/files/20460549"
tgt = geopandas.read_file(p).to_crs(src.crs)
tgt = pandas.concat([tgt]*20)
summary(src, tgt)

Transfer 41100 polygons into 75600


## Correctness

In [3]:
cross2 = area_tables_binning_numba(src, tgt, n_jobs=1)
cross = _area_tables_binning(src, tgt)
(cross != cross2).sum()

Setup: 0.18500614166259766 secs
Buckets+: 2.0063958168029785 secs
Intersections: 0.36847734451293945 secs
Conversion: 0.0019366741180419922 secs
Setup: 0.0018305778503417969 secs
Buckets: 0.22100353240966797 secs
Intersections: 2.041260242462158 secs


0

## Performance

Results with all observations in first dataset:

In [5]:
%timeit cross2 = area_tables_binning_numba(src, tgt)

292 ms ± 8.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [7]:
%timeit cross2 = area_tables_binning_numba(src, tgt, n_jobs=1)

477 ms ± 3.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [8]:
%timeit cross = _area_tables_binning(src, tgt)

2.27 s ± 7.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


---

Results with second dataset:

In [None]:
%time cross2 = area_tables_binning_numba(src, tgt)

In [None]:
%time cross2 = area_tables_binning_numba(src, tgt, n_jobs=1)

In [None]:
%time cross = _area_tables_binning(src, tgt)

---

Results with second dataset, flipped:

In [13]:
%time cross2 = area_tables_binning_numba(tgt, src)

Setup: 0.0039517879486083984 secs
Buckets+: 2.526702880859375 secs
Intersections: 1.6080853939056396 secs
Conversion: 0.012014150619506836 secs
CPU times: user 3.38 s, sys: 86.9 ms, total: 3.47 s
Wall time: 4.15 s


In [14]:
%time cross2 = area_tables_binning_numba(tgt, src, n_jobs=1)

Setup: 0.004775524139404297 secs
Buckets+: 2.515500068664551 secs
Intersections: 4.918679475784302 secs
Conversion: 0.01279139518737793 secs
CPU times: user 7.43 s, sys: 37.2 ms, total: 7.47 s
Wall time: 7.46 s


In [15]:
%time cross = _area_tables_binning(tgt, src)

Setup: 0.003404378890991211 secs
Buckets: 0.7955982685089111 secs
Intersections: 8.69243049621582 secs
CPU times: user 9.49 s, sys: 0 ns, total: 9.49 s
Wall time: 9.49 s


To do:

- [X] Paralellise `pygeos` operations (`parall_exec`)
- [ ] Type inputs (explicitly like [here](https://github.com/pysal/esda/blob/master/esda/crand.py#L309))
- [ ] Document
- [ ] Add tests