# `numba` tests

This notebook documents and serves as a scratchpad for exploring `numba`-based acceleration on areal interpolation.

**NOTE** - To be removed/relocated once/if functionality is merged

---

**IMPORTANT**

As of Dec. 17th'20, the multi-core implementation requires the versions in `main` for `pygeos` and `geopandas`. On a working environment with the latest released versions (as the `gds_env:5.0`), this can be achieved by:

```shell
pip install --no-deps git+https://github.com/pygeos/pygeos.git
pip install --no-deps git+https://github.com/geopandas/geopandas.git
```

---

In [1]:
from tobler.area_weighted.area_interpolate import _area_tables_binning, area_tables_binning_numba
from importlib import reload
import geopandas

summary = lambda src, tgt: print(
    f"Transfer {src.shape[0]} polygons into {tgt.shape[0]}"
)

## Data setup

- Minimal problem

In [19]:
p = ("https://geographicdata.science/book/_downloads/"\
     "f2341ee89163afe06b42fc5d5ed38060/sandiego_tracts.gpkg")
src = geopandas.read_file(p)

p = ("https://geographicdata.science/book/_downloads/"\
     "d740a1069144baa1302b9561c3d31afe/sd_h3_grid.gpkg")
tgt = geopandas.read_file(p).to_crs(src.crs)

w, s, e, n = tgt.total_bounds
#src = src.cx[w:e, s:n]
summary(src, tgt)

Transfer 628 polygons into 644


- Slightly larger problem

In [28]:
# Tracts
p = "https://ndownloader.figshare.com/files/20460645"
src = geopandas.read_file(p)

# Precincts
p = "https://ndownloader.figshare.com/files/20460549"
tgt = geopandas.read_file(p).to_crs(src.crs)
summary(src, tgt)

Transfer 822 polygons into 3780


## Correctness

In [27]:
cross2 = area_tables_binning_numba(src, tgt)
cross = _area_tables_binning(src, tgt)
(cross != cross2).sum()

0

## Performance

Results with all observations in first dataset:

In [20]:
%time cross2 = area_tables_binning_numba(src, tgt)

CPU times: user 345 ms, sys: 21.9 ms, total: 367 ms
Wall time: 622 ms


In [21]:
%time cross2 = area_tables_binning_numba(src, tgt, n_jobs=1)

CPU times: user 965 ms, sys: 0 ns, total: 965 ms
Wall time: 962 ms


In [22]:
%time cross = _area_tables_binning(src, tgt)

CPU times: user 4.29 s, sys: 0 ns, total: 4.29 s
Wall time: 4.29 s


---

Results with second dataset:

In [24]:
%time cross2 = area_tables_binning_numba(src, tgt)

CPU times: user 9.31 s, sys: 502 ms, total: 9.81 s
Wall time: 16.7 s


In [25]:
%time cross2 = area_tables_binning_numba(src, tgt, n_jobs=1)

CPU times: user 12.5 s, sys: 66.9 ms, total: 12.6 s
Wall time: 12.6 s


In [26]:
%time cross = _area_tables_binning(src, tgt)

CPU times: user 17.9 s, sys: 0 ns, total: 17.9 s
Wall time: 17.9 s


---

Results with second dataset, flipped:

In [29]:
%time cross2 = area_tables_binning_numba(tgt, src)

CPU times: user 15.2 s, sys: 792 ms, total: 16 s
Wall time: 21.9 s


In [30]:
%time cross2 = area_tables_binning_numba(tgt, src, n_jobs=1)

CPU times: user 15.9 s, sys: 291 ms, total: 16.2 s
Wall time: 16.1 s


In [31]:
%time cross = _area_tables_binning(tgt, src)

CPU times: user 17 s, sys: 19.1 ms, total: 17 s
Wall time: 17 s


To do:

- [X] Paralellise `pygeos` operations (`parall_exec`)
- [ ] Type inputs (explicitly like [here](https://github.com/pysal/esda/blob/master/esda/crand.py#L309))
- [ ] Document
- [ ] Add tests