## Geohash

Geohashing is a geocoding method used to encode geographic coordinates (latitude and longitude) into a short string of digits and letters delineating an area on a map, which is called a cell, with varying resolutions. The more characters in the string, the more precise the location.

### Generation of dummy data

In [None]:
import numpy as np
import pandas as pd
from pyinterp import geohash
from pyinterp import geodetic

In [None]:
SIZE = 1000000

In [None]:
lon = np.random.uniform(-180, 180, SIZE)
lat = np.random.uniform(-80, 80, SIZE)
measures = np.random.random_sample(SIZE)

This algorithm is very fast, which makes it possible to process a lot of data quickly.

In [None]:
%timeit geohash.encode(lon, lat)

### Geohash index

In [None]:
store = geohash.storage.MutableMapping()
index = geohash.index.init_geohash(store, precision=3)

In [None]:
%%time
# The index can contain anything, as long as it's possible to serialize the data.
index.update(zip(index.encode(lon, lat), measures))

Number of box filled in this index

In [None]:
len(index)

Let's imagine that we want to retrieve the data in the following polygon:
* `POLYGON((-33.75 39.375,-33.75 45,-22.5 45,-22.5 39.375,-33.75 39.375))`

In [None]:
polygon = geodetic.Polygon.read_wkt(
    "POLYGON((-33.75 39.375,-33.75 45,-22.5 45,-22.5 39.375,-33.75 39.375))")
items = index.items(index.keys(polygon.envelope()))

### Density calculation

In [None]:
import pandas as pd

df = pd.DataFrame(dict(lon=lon, lat=lat, measures=measures, geohash=geohash.encode(lon, lat, precision=3)))
df.set_index("geohash", inplace=True)

In [None]:
df = df.groupby("geohash").count()["measures"].rename("count").to_frame()

In [None]:
df["density"] = df["count"] / (geohash.area(df.index.values.astype('S')) / 1e6)

In [None]:
array = geohash.to_xarray(df.index.values.astype('S'), df.density)
array = array.where(array != 0, np.nan)
array.plot()