# Spatial Feature Engineering (I)

## Map Matching

## 📖 Ahead of time...

Feature Engineering is a common term in machine learning that refers to the processes and transformations involved in turning data from the state in which the modeller access them into what is then fed to a model. This can take several forms, from standardisation of the input data, to the derivation of numeric scores that better describe aspects (*features*) of the data we are using. 

*Spatial* Feature Engineering refers to operations we can use to derive "views" or summaries of our data that we can use in models, *using space* as the key medium to create them.

There is only one reading to complete for this block, [Chapter 12](https://geographicdata.science/book/notebooks/12_feature_engineering.html) of the GDS Book {cite}`reyABwolf`. The first block of Spatial Feature Engineering in this course loosely follows the first part of the chapter ([Map Matching](https://geographicdata.science/book/notebooks/12_feature_engineering.html#feature-engineering-using-map-matching)), so focus on this first sections for the block.

## 💻 Hands-on coding

In [3]:
import geopandas

```{margin} Data
If you want to read more about the data sources behind this dataset, head to the [Datasets](../data/datasets) section
```



````{tabbed} Local files

Assuming you have the file locally on the path `../data/`: 

```python
regions = geopandas.read_file("../data/cambodia_regional.gpkg")
cities = geopandas.read_file("../data/cambodian_cities.geojson")
```
````

````{tabbed} Online read

If you're online, you can do:

```python
regions = geopandas.read_file(
    "https://darribas.org/gds4ae/_downloads/9366d230310a8a68b2ce6cf2787a2f1c/cambodia_regional.gpkg"
)
cities = geopandas.read_file(
    "https://darribas.org/gds4ae/_downloads/b2bc4ad46ffb5fcec467286c022adf14/cambodian_cities.geojson"

    )
```
````


In [7]:
regions = geopandas.read_file("../data/cambodia_regional.gpkg")
cities = geopandas.read_file("../data/cambodian_cities.geojson")

Check both geo-tables are in the same CRS:

In [11]:
regions.crs == cities.crs

True

### Points to polygons

*In which region is a city?*

In [12]:
sj = geopandas.sjoin(
    cities,
    regions
)

In [21]:
#   City name | Region name
sj[["UC_NM_MN", "adm2_name"]]

Unnamed: 0,UC_NM_MN,adm2_name
0,Sampov Lun,Sampov Lun
1,Khum Pech Chenda,Phnum Proek
2,Poipet,Paoy Paet
3,Sisophon,Serei Saophoan
4,Battambang,Battambang
5,Siem Reap,Siem Reap
6,Sihanoukville,Preah Sihanouk
7,,Trapeang Prasat
8,Kampong Chhnang,Kampong Chhnang
9,Phnom Penh,Tuol Kouk


If we were after the number of cities per region, it is a similar approach, with a (`groupby`) twist at the end:

````{margin}
```{note}

1. We `set_index` to align both tables
1. We `assign` to create a new column

If you want no missing values, you can `fillna(0)` since you *know* missing data are zeros
```
````

In [28]:
regions.set_index(
    "adm2_name"
).assign(
    city_count=sj.groupby("adm2_name").size()
).info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 198 entries, Mongkol Borei to Administrative unit not available
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   adm2_altnm  122 non-null    object  
 1   motor_mean  198 non-null    float64 
 2   walk_mean   198 non-null    float64 
 3   no2_mean    198 non-null    float64 
 4   geometry    198 non-null    geometry
 5   city_count  11 non-null     float64 
dtypes: float64(4), geometry(1), object(1)
memory usage: 10.8+ KB


### Polygons to points

### Surface to points

### Surface to polygons

`rasterstats`

### Points to surface

Spatial interpolation

### Surface to surface

### Polygons to polygons

## 🐾 Next steps

If you are interested in learning more about spatial feature engineering through map matching, the following pointers might be useful to delve deeper into specific types of "data transfer":

- The [`datashader`](https://datashader.org) library is a great option to transfer geo-tables into surfaces, providing tooling to perform these operations in a highly efficient and performant way.
- When aggregating surfaces into geo-tables, the library [`rasterstats`](https://pythonhosted.org/rasterstats/) contains most if not all of the machinery you will need.
- For transfers from polygon to polygon geographies, [`tobler`](https://pysal.org/tobler/) is your friend. Its official documentation contains examples for different use cases.