Skip to content

Commit

Permalink
[#46] Enrich IndexManager with postalcode
Browse files Browse the repository at this point in the history
# Scenario Summary

Along with the reverse geocoding response, we should provide also  the
information of postalCode

# Proposed Solution

Enrich the `ReverseGeocodingResponse` returned by the reverse geocoding
API with the postalCode, adding the logic into the `IndexManager`. So
when the `IndexManager` will start, it will load also all available
postalCode for each country.
  • Loading branch information
giorgioamato committed Mar 4, 2021
1 parent f54b717 commit 7cf062b
Show file tree
Hide file tree
Showing 15 changed files with 369 additions and 14 deletions.
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,51 @@
GIS is a library to calculate ReverseGeocoding, MapMatching and the belonging of a point to an area of interest.
It is possible use OSM maps or HERE maps or extends Loader for other maps.

### Tools used for generate shape files

- **osmconvert**: allow to convert osm.pbf file in other useful formats (osm, o5m, etc)
- **osmfilter**: used for filter osm files with just the objects wanted
- **ogr2gr**: used for generate a shape file from a given osm file

Execute the following for install the tools mentioned above:

```
sudo apt install osmctools
sudo apt install gdal-bin
```

### Generate a shape file with postalCodes

- Download country maps in osm.pbf format from https://download.geofabrik.de/
- Convert a file into .o5m format
```
osmconvert ../italy-latest.osm.pbf -o=italy.o5m
osmfilter italy.o5m --out-count | grep addr: # it prints count of nodes having an addr: tag
```
- Filter the o5m file, keeping just elements with addr:city addr:postcode tags and for those objects removing also all the other existing tags.
```
osmfilter italy.o5m --keep="addr:city addr:postcode" --keep-tags="all addr:city= addr:postcode=" --drop-tags="all" --ignore-dependencies --drop-version --drop-author > italy_postalcode.osm
osmfilter italy_postalcode.osm --out-count | grep addr: # should match count above
```

- Create a shape file from it
```
ogr2ogr -f "ESRI Shapefile" -skip shape_postalcode italy_postalcode.osm
```
- Extract key value and transform it into columns
```
cd shape_postalcode
ogr2ogr -sql "select hstore_get_value(other_tags,'addr:postcode') as cap, hstore_get_value(other_tags,'addr:city') as city from points" postalcode.shp points.shp
```

- Group by city and for each one we take just a point and the minimum cap
```
ogr2ogr -dialect sqlite -sql "select min(cap), city, max(geometry) from postalcode where cap is not NULL group by city " Italy_gis_postalcode.shp postalcode.shp
```

You can try also to execute locally the script _create_postalcode_shapefile.sh_ available in the script folder.

### Test GraphHopper
Download country maps in osm.pbf format from https://download.geofabrik.de/europe.html
[OPTIONAL] Merge all downloaded files with the following command: (only if you download multiple countries)
Expand Down
131 changes: 131 additions & 0 deletions scripts/create_postalcode_shapefile.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
#!/bin/bash

# This script allow to extract postal codes from a given osm.pbf.
# Example usage: ./create_postalcode_shapefile.sh italy-latest.osm.pbf output italy_postalcode.shp

usage() { echo "Usage: $0 -f <osm.pbf file path> -o <shape destination folder> -n <shape file name(must end with .shp)> " 1>&2; exit 1; }

while getopts ":f:o:n:" o; do
case "${o}" in
f)
OSM_PBF_PATH=${OPTARG}
;;
o)
OUTPUT_PATH=${OPTARG}
;;
n)
OUTPUT_FILENAME=${OPTARG}
;;
*)
usage
;;
esac
done
shift $((OPTIND-1))

OSM_PBF_EXT="osm.pbf"

if [ -z "${OSM_PBF_PATH}" ] || [ -z "${OUTPUT_PATH}" ] || [ -z "${OUTPUT_FILENAME}" ]; then
usage
fi

if [[ ! -f "${OSM_PBF_PATH}" ]]
then
echo "Error: ${OSM_PBF_PATH} doesn't exists on your filesystem."
exit 1
fi

if [ "${OSM_PBF_PATH#*.}" != "$OSM_PBF_EXT" ]
then
echo "Error: ${OSM_PBF_PATH} is not a valid osm.pbf file"
exit 1
fi

if [[ "${OUTPUT_FILENAME}" != *.shp ]]
then
echo "Error: ${OUTPUT_FILENAME} is not a valid shape filename"
exit 1
fi

OSM_PBF_ABS_PATH=$(readlink -m "$OSM_PBF_PATH")
OUTPUT_PATH_ABS_PATH=$(readlink -m "$OUTPUT_PATH")
OSM_PBF_FILENAME=$(basename "$OSM_PBF_PATH")
O5M_FILENAME_WITHOUT_EXT=$(basename "$OSM_PBF_PATH" .$OSM_PBF_EXT)
O5M_FILENAME=$O5M_FILENAME_WITHOUT_EXT".o5m"
OSM_FILTERED_FILENAME=$O5M_FILENAME_WITHOUT_EXT"-filtered.osm"
TMP_FOLDER="tmp_$(date +%Y%m%d)"
SHAPEFILE_EXT="shp"
TRANSFORMED_SHAPE_FILE="transformed"

rm -rf $TMP_FOLDER
mkdir $TMP_FOLDER
cd $TMP_FOLDER

echo "Uncompressing $OSM_PBF_FILENAME"
osmconvert $OSM_PBF_ABS_PATH -o=$O5M_FILENAME

if [ $? -eq 0 ]
then
echo "Successfully created $O5M_FILENAME"
else
echo $1
exit 1
fi

echo "Creating filtered osm"
osmfilter $O5M_FILENAME --keep="addr:city addr:postcode" --keep-tags="all addr:city= addr:postcode=" --drop-tags="all" --ignore-dependencies --drop-version --drop-author > $OSM_FILTERED_FILENAME

if [ $? -eq 0 ]
then
echo "Successfully created $OSM_FILTERED_FILENAME"
else
echo $1
exit 1
fi


echo "Creating shape file from filtered file"
ogr2ogr -f "ESRI Shapefile" -skip shape $OSM_FILTERED_FILENAME

if [ $? -eq 0 ]
then
echo "Successfully created shapefile"
else
echo $1
exit 1
fi

echo "Extracting tags from shape file"
cd shape
ogr2ogr -sql "select hstore_get_value(other_tags,'addr:postcode') as cap, hstore_get_value(other_tags,'addr:city') as city from points" $TRANSFORMED_SHAPE_FILE.$SHAPEFILE_EXT points.shp

if [ $? -eq 0 ]
then
echo "Successfully extracted tags from shapefile"
else
echo $1
exit 1
fi


echo "Creating final shape file"
mkdir -p final
ogr2ogr -dialect sqlite -sql "select min(cap), city, max(geometry) from $TRANSFORMED_SHAPE_FILE where cap is not NULL group by city " final/$OUTPUT_FILENAME $TRANSFORMED_SHAPE_FILE.$SHAPEFILE_EXT

if [ $? -eq 0 ]
then
echo "Successfully created final shapefile named $OUTPUT_FILENAME"
else
echo $1
exit 1
fi

echo "Copying $OUTPUT_FILENAME into $OUTPUT_PATH"
mkdir -p $OUTPUT_PATH_ABS_PATH
cp -R final/* $OUTPUT_PATH_ABS_PATH

echo "Removing tmp dir"
cd ../..
rm -rf $TMP_FOLDER

echo "Terminated with $?"
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ object ManagerUtils {

case class Paths(boundary: Array[Path], roads: Array[Path], addresses: Array[Path])

case class BoundaryPathGroup(country: List[Path], region: List[Path], county: List[Path], city: List[Path])
case class BoundaryPathGroup(country: List[Path], region: List[Path], county: List[Path], city: List[Path], postalCode: List[Path])

case class CountryPathSet(boundary: BoundaryPathGroup, roads: Array[Path], addresses: Option[Path])

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
package it.agilelab.bigdata.gis.domain.loader

import com.typesafe.config.Config
import com.vividsolutions.jts.geom.Geometry
import it.agilelab.bigdata.gis.core.loader.Loader
import it.agilelab.bigdata.gis.domain.managers.PathManager
import it.agilelab.bigdata.gis.domain.models.OSMPostalCode


case class OSMPostalCodeLoader(config: Config, pathManager: PathManager) extends Loader[OSMPostalCode] {

override def loadFile(source: String): Iterator[(Array[AnyRef], Geometry)] = {

ShapeFileReader.readPointFeatures(source).map { case (point, list) =>
(list.toArray) -> point
}.toIterator

}

protected def objectMapping(fields: Array[AnyRef], line: Geometry): OSMPostalCode = {

val postalCodeValue = fields(1).toString
val cityValue = fields(2).toString

OSMPostalCode(
point = line,
postalCode = postalCodeValue,
city = Some(cityValue)
)
}

protected def parseStringName(string: String): String = {
new String(string.getBytes("ISO-8859-1"), "UTF-8")
}
}
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
package it.agilelab.bigdata.gis.domain.managers

import com.typesafe.config.Config
import it.agilelab.bigdata.gis.core.utils.{Configuration, Logger, ObjectPickler}
import it.agilelab.bigdata.gis.core.utils.ManagerUtils.{BoundaryPathGroup, CountryPathSet, Path}
import it.agilelab.bigdata.gis.core.utils.{Configuration, Logger, ObjectPickler}
import it.agilelab.bigdata.gis.domain.configuration.IndexManagerConfiguration
import it.agilelab.bigdata.gis.domain.loader.{OSMAdministrativeBoundariesLoader, OSMGenericStreetLoader}
import it.agilelab.bigdata.gis.domain.models.{OSMBoundary, OSMStreetAndHouseNumber}
import it.agilelab.bigdata.gis.domain.loader.{OSMAdministrativeBoundariesLoader, OSMGenericStreetLoader, OSMPostalCodeLoader}
import it.agilelab.bigdata.gis.domain.models.{OSMBoundary, OSMPostalCode, OSMStreetAndHouseNumber}
import it.agilelab.bigdata.gis.domain.spatialList.GeometryList

import java.io.File
Expand All @@ -17,6 +17,8 @@ case class IndexManager(conf: Config) extends Configuration with Logger {
val pathManager: PathManager = PathManager(indexConfig.pathConf)
val boundariesLoader: OSMAdministrativeBoundariesLoader =
OSMAdministrativeBoundariesLoader(indexConfig.boundaryConf, pathManager)
val postalCodeLoader: OSMPostalCodeLoader =
OSMPostalCodeLoader(indexConfig.boundaryConf, pathManager)
val indexSet: IndexSet = createIndexSet(indexConfig.inputPaths)

/**
Expand All @@ -42,7 +44,7 @@ case class IndexManager(conf: Config) extends Configuration with Logger {
val indexStuffs: List[IndexStuffs] =
multiCountriesPathSet
.par
.map(countryPathSet => createCountryBoundaries(countryPathSet.boundary, boundariesLoader))
.map(countryPathSet => createCountryBoundaries(countryPathSet.boundary, boundariesLoader, postalCodeLoader))
.toList

val cityIndexStuff: List[OSMBoundary] = indexStuffs.flatMap(_.cityIndex)
Expand Down Expand Up @@ -71,10 +73,13 @@ case class IndexManager(conf: Config) extends Configuration with Logger {

//TODO review performances
def createCountryBoundaries(paths: BoundaryPathGroup,
boundariesLoader: OSMAdministrativeBoundariesLoader): IndexStuffs = {
boundariesLoader: OSMAdministrativeBoundariesLoader,
postalCodeLoader: OSMPostalCodeLoader): IndexStuffs = {

val loadPostalCode: Seq[Path] => Seq[OSMPostalCode] = pathList => pathList.flatMap(postalCodeLoader.loadObjects(_))
val loadBoundaries: Seq[Path] => Seq[OSMBoundary] = pathList => pathList.flatMap(boundariesLoader.loadObjects(_))

val postalCodes: Seq[OSMPostalCode] = loadPostalCode(paths.postalCode)
val cities: Seq[OSMBoundary] = loadBoundaries(paths.city)
val counties: Seq[OSMBoundary] = loadBoundaries(paths.county)
val regions: Seq[OSMBoundary] = loadBoundaries(paths.region)
Expand All @@ -84,7 +89,8 @@ case class IndexManager(conf: Config) extends Configuration with Logger {

logger.info(s"Start loading boundary of: $countryName...")

val citiesWithCounties: Seq[OSMBoundary] = mergeBoundaries(cities, counties)
val postalCodesWithCities: Seq[OSMBoundary] = enrichCities(cities, postalCodes)
val citiesWithCounties: Seq[OSMBoundary] = mergeBoundaries(postalCodesWithCities, counties)
val countiesWithRegion: Seq[OSMBoundary] = mergeBoundaries(citiesWithCounties, regions)

val primaryIndexBoundaries: Seq[OSMBoundary] = countiesWithRegion.map(_.merge(countryBoundary))
Expand Down Expand Up @@ -119,6 +125,18 @@ case class IndexManager(conf: Config) extends Configuration with Logger {
}
}

private def enrichCities(cities: Seq[OSMBoundary], postalCodes: Seq[OSMPostalCode]): Seq[OSMBoundary] = {

cities
.filter(_.city.isDefined)
.map{ city =>
postalCodes.find(_.point.coveredBy(city.multiPolygon)) match {
case Some(found) => city.copy(postalCode = Some(found.postalCode))
case _ => city
}
}
}

/** Create the addresses index that will be used to decorate the road index leaves by adding
* a sequence of OSMAddress to retrieve the candidate street number
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ private object Bound {
val COUNTY = "county"
val REGION = "region"
val CITY = "city"
val POSTAL_CODE = "postalCode"
}

case class PathManager(conf: Config) extends Configuration {
Expand All @@ -25,7 +26,8 @@ case class PathManager(conf: Config) extends Configuration {
regionSuffixList <- read[List[String]](countryConfig, Bound.REGION)
countySuffixList <- read[List[String]](countryConfig, Bound.COUNTY)
citySuffixList <- read[List[String]](countryConfig, Bound.CITY)
} yield CountrySettings(countrySuffixList, regionSuffixList, countySuffixList, citySuffixList)).get
postalCodeSuffixList <- read[List[String]](countryConfig, Bound.POSTAL_CODE)
} yield CountrySettings(countrySuffixList, regionSuffixList, countySuffixList, citySuffixList, postalCodeSuffixList)).get
}

def getCountryPathSet(countryFolder: File): CountryPathSet = {
Expand All @@ -43,27 +45,32 @@ case class PathManager(conf: Config) extends Configuration {
val regionPathList: List[Path] = countrySettings.regionSuffixes.flatMap(validSuffix => paths.filter(_.endsWith(validSuffix)))
val countyPathList: List[Path] = countrySettings.countySuffixes.flatMap(validSuffix => paths.filter(_.endsWith(validSuffix)))
val cityPathList: List[Path] = countrySettings.citySuffixes.flatMap(validSuffix => paths.filter(_.endsWith(validSuffix)))
val postalCodePathList: List[Path] = countrySettings.postalCodeSuffixes.flatMap(validSuffix => paths.filter(_.endsWith(validSuffix)))

BoundaryPathGroup(
country = countryPathList,
region = regionPathList,
county = countyPathList,
city = cityPathList
city = cityPathList,
postalCode = postalCodePathList
)
}
}

case class CountrySettings(countrySuffixes: List[String],
regionSuffixes: List[String],
countySuffixes: List[String],
citySuffixes: List[String]) {
citySuffixes: List[String],
postalCodeSuffixes: List[String]
) {

def clean: CountrySettings = {
CountrySettings(
this.countrySuffixes.map(_.split('.').head),
this.regionSuffixes.map(_.split('.').head),
this.countySuffixes.map(_.split('.').head),
this.citySuffixes.map(_.split('.').head)
this.citySuffixes.map(_.split('.').head),
this.postalCodeSuffixes.map(_.split('.').head)
)
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ case class OSMBoundary(multiPolygon: Geometry,
country: Option[String] = None,
countryCode: Option[String] = None,
countyCode: Option[String] = None,
postalCode: Option[String] = None,
boundaryType: String,
env: Envelope)
extends MultiPolygon(
Expand All @@ -28,6 +29,7 @@ case class OSMBoundary(multiPolygon: Geometry,
|County: ${county.map(_.toString)}
|Region: ${region.map(_.toString)}
|Country: ${country.map(_.toString)}
|PostalCode: ${postalCode.map(_.toString)}
""".stripMargin
}

Expand Down
Loading

0 comments on commit 7cf062b

Please sign in to comment.