<img src="images/Geowrangler.svg" alt="Geowrangler logo" style="max-width: 245px;" />

### Overview

**Geowrangler** is a python package for geodata wrangling. It helps build data transformation workflows that have no out-of-the-box solutions from other geospatial libraries.

We surveyed our past geospatial projects to extract these solutions for our work and hope that these will be useful for others as well.

Our audience are researchers, analysts and engineers delivering geospatial projects. 

We [welcome your comments, suggestions, bug reports and code contributions](https://github.com/thinkingmachines/geowrangler/discussions) to make **Geowrangler** better.

[![](/images/github.svg "View on github button")](https://github.com/butchtm/geowrangler2)


### Context

**Geowrangler** was borne out of our efforts to reduce the amount of boilerplate code in wrangling geospatial data. 
It builds on top of existing geospatial libraries such as geopandas, rasterio, rasterstats, morecantile and others.
Our goals are centered on the following tasks:

 * Extracting area of interest zonal statistics from vector and raster data
 * Gridding areas of interest
 * Validating geospatial datasets
 * Downloading of publically available geospatial datasets (e.g. OSM, Ookla, Nightlights)
 * Other geospatial vector and raster data processing tasks

To make it easy to document, maintain and extend the package, we opted to maintain the source code, tests and documentation
on Jupyter notebooks. We use [nbdev](https://nbdev.fast.ai) to generate the python package and documentation 
from the notebooks. See this document to learn more about our development workflow. 

By doing this, we hope to make it easy for geospatial analysts, scientists and engineers to learn, explore and extend this package 
for their geospatial processing needs. 

Aside from providing reference documentation for each module, we have included extensive tutorials 
and use case examples in order to make it easy to learn and use.


### Modules

* Grid Tile Generation
* Geometry Validation 
* Vector Zonal Stats 
* Raster Zonal Stats 
* Area Zonal Stats 
* Distance Zonal Stats 
* Demographic and Health Survey (DHS) Processing Utils 
* Geofabrik (OSM) Data Download
* Ookla Data Download

_Check [this page for more details about our Roadmap](https://github.com/orgs/thinkingmachines/projects/17)_

### Installation

```
pip install git+https://github.com/butchtm/geowrangler2.git
```


### Exploring the Documentation

We develop the package modules alongside their documentation on Jupyter notebooks. 
Each page comes with an _Open in Colab_ button that will open
the jupyter notebook in Google Colab for exploration (including this page).

Click on the _Open in Colab_ button below to open this page as a Google Colab notebook.

[![](https://colab.research.google.com/assets/colab-badge.svg "Open in Colab button")](https://colab.research.google.com/github/butchtm/geowrangler2/blob/main/nbs/index.ipynb)

:::{.callout-note}

all the documentation pages (including the references) are executable Jupyter notebooks.

:::

In [1]:
#| include: false
# no_test
# run this cell in Colab to install the package
! [ -e /content ] && pip install -Uqq git+https://github.com/butchtm/geowrangler2.git

In [2]:
#| include: false
# no_test
%reload_ext autoreload
%autoreload 2
%matplotlib inline

#### Sample Code

In [3]:
#| code-fold: true
#| code-summary: Sample code using grids to display source
#| no_test
import geopandas as gpd

import geowrangler2.grids

# view the source of a grid component
gdf = gpd.GeoDataFrame()
grid = geowrangler2.grids.SquareGridGenerator(gdf, 1)
grid??

[0;31mType:[0m        SquareGridGenerator
[0;31mString form:[0m <geowrangler2.grids.SquareGridGenerator object at 0x7fc284380880>
[0;31mFile:[0m        ~/work/unicef-ai4d/geowrangler2-1/geowrangler2/grids.py
[0;31mSource:[0m     
[0;32mclass[0m [0mSquareGridGenerator[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;32mdef[0m [0m__init__[0m[0;34m([0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m,[0m[0;34m[0m
[0;34m[0m        [0mcell_size[0m[0;34m:[0m [0mfloat[0m[0;34m,[0m  [0;31m# height and width of a square cell in meters[0m[0;34m[0m
[0;34m[0m        [0mgrid_projection[0m[0;34m:[0m [0mstr[0m [0;34m=[0m [0;34m"EPSG:3857"[0m[0;34m,[0m  [0;31m# projection of grid output[0m[0;34m[0m
[0;34m[0m        [0mboundary[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mSquareGridBoundary[0m[0;34m,[0m [0mList[0m[0;34m[[0m[0mfloat[0m[0;34m][0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m  [0;31m# original boundary[0m[0;34m[0m


#### Tutorials

* [Grids Generation](tutorial.grids.html)
* [Geometry Validation](tutorial.geometry_validation.html)
* [Vector Zonal Stats](tutorial.vector_zonal_stats.html)
* [Raster Zonal Stats](tutorial.raster_zonal_stats.html)
* [Area Zonal Stats](tutorial.area_zonal_stats.html)
* [Distance Zonal Stats](tutorial.distance_zonal_stats.html)
* [DHS Processing Utils](tutorial.dhs.html)
* [Dataset Downloads](tutorial.datasets.html)

#### Reference

* [Grids Generation](grids.html)
* [Geometry Validation](validation.html)
* [Vector Zonal Stats](vector_zonal_stats.html)
* [Raster Zonal Stats](raster_zonal_stats.html)
* [Area Zonal Stats](area_zonal_stats.html)
* [Distance Zonal Stats](distance_zonal_stats.html)
* [DHS Processing Utils](dhs.html)
* [Dataset Geofabrik (OSM)](datasets_geofabrik.html)
* [Dataset Ookla](datasets_ookla.html)

:::{.callout-note}

all the documentation pages (including the references) are executable Jupyter notebooks.

:::