The OpenEO UDF Python reference implementation and interface description
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
docker
docs
ml
scripts
src/openeo_udf
tests
.coveragerc
.gitignore
.travis.yml
AUTHORS.rst
CHANGELOG.rst
LICENSE.txt
Makefile
README.rst
requirements.txt
setup.cfg
setup.py

README.rst

OpenEO UDF Framework

OpenEO User Defined Functions (UDF) are an approach to run arbitrary code on geographical data that is available in an OpenEO processing backend like R, GRASS GIS or GeoTrellis. This document describes the UDF interface and provides reference implementation for Python3. The reference implementation includes:

  • An OpenEO UDF REST test server that processes user defined data with user defined functions and describes its interface using swagger2.0.
  • A command line tool to apply user defined functions on raster and vector data
  • A Python3 API that specifies how UDF must be implemented in Python3
  • The UDF test server and the command line tool are examples howto implement the UDF approach in an OpenEO processing backend

Backend integration

This UDF implementation contains an abstract swagger description of schemas that must be used when an API for a specific programming language is implemented. They are documented in the swagger 2.0 API description that is provided by the UDF test server. However, the Python file that defines the swagger description is available here:

The basis of the swagger 2.0 API description are basic data-types that are available in many programming languages. These basic data-types are:

  • Strings
  • Floating point values
  • Integer values
  • Date and time data-types
  • Multi-dimensional arrays, lists or vectors of floating point values, integer values and date time data-types
  • Maps or dictionaries

The entry point of an UDF is a single dictionary or map, that can be represented by a class object as well, depending on the programming language.

The schemas SpatialExtent, RasterCollectionTile, FeatureCollectionTile, StructuredData, MachineLearnModel and UdfData are a swagger 2.0 based definitions for the UDF API. These schemas are implemented as classes with additional functionality in the Python prototype.

The schemas UdfCode, UdfRequest and ErrorResponse are used by the UDF test server to provide the POST endpoint. They are not part of the UDF API.

To support UDF's in the backend the following approaches can be used:
  • The backend implements for specific languages the UDF API based on the provided swagger 2.0 API description
  • The backend uses the Python prototype implementation for Python based UDF's
  • The backend uses the UDF test server to run Python UDF's
  • The backend uses the command line tool to execute Python UDF's.

Installation

The installation was tested on ubuntu 16.04 and 18.04. It requires python3.6 and pip3. It will install several python3 libraries like pygdal [1], pytorch [2], scikit-learn [3], tensorflow [4] and other libraries that require additional development files on the host system.

Footnotes

[1]https://github.com/nextgis/pygdal
[2]https://pytorch.org/
[3]http://scikit-learn.org/stable/
[4]https://www.tensorflow.org/

Local installation

  1. Clone the git repository into a specific directory and create the virtual python3 environment:

    mkdir -p ${HOME}/src/openeo
    cd ${HOME}/src/openeo
    
    git clone https://github.com/Open-EO/openeo-udf.git
    virtualenv -p python3 openeo_venv
  2. Install are requirements in the virtual environment:

    source openeo_venv/bin/activate
    cd openeo-udf
    pip3 install -r requirements.txt
  3. Install the openeo-udf software and run the tests:

    python3 setup.py install
    python3 setup.py test
    python3 tests/test_doctests.py
  1. Create the UDF documentation that includes the python3 API description and start firefox to read it:

    cd docs
    make html
    firefox _build/html/index.html &
    cd ..
  2. Run the udf server and download the swagger definition:

    run_udf_server &
    
    # Download the swagger JSON file
    wget http://localhost:5000/api/v0/swagger.json
  3. Run the UDF execution command line tool:

    The following command computes the NDVI on a raster image series of three multi-band tiff files. Two bands are provided with the names RED and NIR for the UDF. The three resulting single-band GeoTiff files are written to the /tmp directory.

    execute_udf -r data/red_nir_1987.tif,data/red_nir_2000.tif,data/red_nir_2002.tif \
                -b RED,NIR \
                -u src/openeo_udf/functions/raster_collections_ndvi.py

    This command computes the sum of the raster series for each band. A single raster image with two bands is written as GeoTiff file to the directory /tmp.

    execute_udf -r data/red_nir_1987.tif,data/red_nir_2000.tif,data/red_nir_2002.tif \
                -b RED,NIR \
                -u src/openeo_udf/functions/raster_collections_reduce_time_sum.py

    This command reads a feature collection stored in a gepackge file and applies the UDF buffer function. The result is a new geopackage that contains the buffers written in directory /tmp:

    execute_udf -v data/sampling_points.gpkg -u src/openeo_udf/functions/feature_collections_buffer.py

    This command reads a series of raster GeoTiff files and a feature collection stored in a gepackge file and applies the UDF sampling function. The result is a new geopackage that contains the sampled attributes written in directory /tmp:

    execute_udf -r data/red_nir_1987.tif,data/red_nir_2000.tif,data/red_nir_2002.tif \
                -b RED,NIR \
                -v data/sampling_points.gpkg \
                -u src/openeo_udf/functions/raster_collections_sampling.py

Docker image

The openeo-udf repository contains the build instruction of an openeo-udf docker image:

  1. Clone the git repository into a specific directory and create the virtual python3 environment:

    mkdir -p ${HOME}/src/openeo
    cd ${HOME}/src/openeo
    
    git clone https://github.com/Open-EO/openeo-udf.git
  2. Build the docker image and run it:

    cd openeo-udf/docker
    docker build -t openeo_udf .
    docker run --name "openeo-udf-server" -p 5000:5000 -p 80:80 -t openeo_udf
  3. Have a look at the documentation that is available in the docker deployment. This includes this document with the python3 API description, that must be used in the UDF's and the swagger documentation of the REST UDF service:

    # This document
    firefox http://localhost/index.html
    
    # The python3 API description that must be used in the python3 UDF
    firefox http://localhost/api/openeo_udf.api.html#module-openeo_udf.api.base
    
    # The swagger API description
    firefox http://localhost/api_docs/index.html
    
    # Download the swagger JSON file
    wget http://localhost:5000/api/v0/swagger.json

Coding an UDF

The python3 reference implementation provides an API to implement UDF conveniently. It makes use of many python3 libraries that provide functionality to access raster and vector geo-data.

The following libraries should be used implementations UDF's:

  • The python3 library numpy [5] should be used to process the raster data.
  • The python3 library geopandas [6] and shapely [7] should be used to process the vector data.
  • The python3 library pandas [8], specifically pandas.DatetimeIndex should be used to process time-series data

Footnotes

[5]http://www.numpy.org/
[6]http://geopandas.org/index.html
[7]https://github.com/Toblerity/Shapely
[8]http://pandas.pydata.org/

The python3 API is well documented and fully tested using doctests. The doctests show the handling of the API with simple examples. This document and the full API description is available when you installed openeo_udf locally or if you use the docker image. However, the original python3 file that implements the OpenEO UDF python3 API is available here:

The UDF's are directly available for download from the repository:

Several UDF were implemented and provide and example howto develop an UDF. A unittest was implemented for each UDF. The tests are available here:

The following classes are part of the UDF Python API and should be used for implementation of UDF's and backend driver:

  • SpatialExtent
  • RasterCollectionTile
  • FeatureCollectionTile
  • StructuredData
  • UdfData

Using the UDF command line tool

The python3 reference implementation provides a command line tool to run an UDF on raster images that are supported by GDAL and/or vector files support by OGR. The command line tool has a help interface with examples that run on test data available in the OpenEO UDF repository:

(openeo_venv) soeren@Knecht:~/src/openeo/openeo-udf$ execute_udf --help
usage: execute_udf [-h] [-r RASTER_FILES] [-v VECTOR_FILES]
                   [-t RASTER_TIME_STAMPS] [-b BAND_NAMES]
                   [-o RASTER_OUTPUT_DIR] -u PATH_TO_UDF

This program reads a list of single- or multi-band GeoTiff files and vector files
and applies a user defined function (UDF) on them.
The GeoTiff files must be provided as comma separated list, as well as the band names.
The vector files must be provides as comma separated list of files as well. The UDF
must be accessible on the file system. The computed results are single- or multi-band GeoTiff files
in case of raster output and geopackage vector files in case of vector output
that are written into a specific output directory. Raster and vector files can be specified together.
However, all provided files must have the same projection.

Examples:

    The following command computes the NDVI on a raster
    image series of three multi-band tiff files. Two bands are provided with the names RED and NIR for
    the UDF. The three resulting single-band GeoTiff files are written to the /tmp directory.

        execute_udf -r data/red_nir_1987.tif,data/red_nir_2000.tif,data/red_nir_2002.tif \
                    -b RED,NIR \
                    -u src/openeo_udf/functions/raster_collections_ndvi.py

    This command computes the sum of the raster series for each band. A single raster image
    with two bands is written as GeoTiff file to the directory /tmp.

        execute_udf -r data/red_nir_1987.tif,data/red_nir_2000.tif,data/red_nir_2002.tif \
                    -b RED,NIR \
                    -u src/openeo_udf/functions/raster_collections_reduce_time_sum.py


    This command reads a feature collection stored in a gepackge file
    and applies the UDF buffer function. The result is a new geopackage
    that contains the buffers written in directory /tmp:

        execute_udf -v data/sampling_points.gpkg -u src/openeo_udf/functions/feature_collections_buffer.py

    This command reads a series of raster GeoTiff files and a feature collection stored in a gepackge file
    and applies the UDF sampling function. The result is a new geopackage
    that contains the sampled attributes written in directory /tmp:

        execute_udf -r data/red_nir_1987.tif,data/red_nir_2000.tif,data/red_nir_2002.tif \
                    -b RED,NIR \
                    -v data/sampling_points.gpkg \
                    -u src/openeo_udf/functions/raster_collections_sampling.py

optional arguments:
  -h, --help            show this help message and exit
  -r RASTER_FILES, --raster_files RASTER_FILES
                        Comma separated list of raster files. If several
                        raster files are provided, then each raster file must
                        have the same number of bands.
  -v VECTOR_FILES, --vector_files VECTOR_FILES
                        Comma separated list of vector files. Each vector file
                        will be converted into a vector collection tile.
  -t RASTER_TIME_STAMPS, --raster_time_stamps RASTER_TIME_STAMPS
                        A comma separated list of time stamps, that must have
                        the same number of entries as the list of raster
                        files.
  -b BAND_NAMES, --band_names BAND_NAMES
                        A comma separated list of band names.
  -o RASTER_OUTPUT_DIR, --raster_output_dir RASTER_OUTPUT_DIR
                        The output directory to store the computed results.
  -u PATH_TO_UDF, --path_to_udf PATH_TO_UDF
                        The UDF file to execute.

Using the UDF server

Raster Example

The first example removes the raster collection tiles from the provided input data. The code is very simple and removes all raster collection tiles in the input object that always has the name data:

data.del_raster_collection_tiles()

The following JSON definition includes the python3 code and a simple raster collection with two 2x2 tiles with two start and end time stamps.

{
  "code": {
    "source": "data.del_raster_collection_tiles()",
    "language": "python"
  },
  "data": {
    "proj": "EPSG:4326",
    "raster_collection_tiles": [
      {
        "data": [
          [
            [
              0,
              1
            ],
            [
              2,
              3
            ]
          ],
          [
            [
              0,
              1
            ],
            [
              2,
              3
            ]
          ]
        ],
        "extent": {
          "north": 53,
          "south": 50,
          "east": 30,
          "nsres": 0.01,
          "ewres": 0.01,
          "west": 24
        },
        "end_times": [
          "2001-01-02T00:00:00",
          "2001-01-03T00:00:00"
        ],
        "start_times": [
          "2001-01-01T00:00:00",
          "2001-01-02T00:00:00"
        ],
        "id": "test_data",
        "wavelength": 420
      }
    ]
  }
}

Running the code, with the assumption that the JSON code was placed in the shell environmental variable "JSON", should look like this:

curl -H "Content-Type: application/json" -X POST -d "${JSON}" http://localhost:5000/udf

The result of the processing should be the elimination of the raster and feature collections, since the provided data object will be used to create the resulting data:

{
  "feature_collection_tiles": [],
  "models": {},
  "proj": "EPSG:4326",
  "raster_collection_tiles": []
}

Hence, a data object that contains the raster and feature collections is provided to the user defined function. The UDF code works on the data and stores the result in the same data object.

Vector Example

The second examples applies a buffer operation on a feature collection. It computes a buffer of size 5 on all features of the first feature collection tile and stores the result in the input data object:

tile = data.get_feature_collection_tiles()[0]
buf = tile.data.buffer(5)
new_data = tile.data.set_geometry(buf)
data.set_feature_collection_tiles([FeatureCollectionTile(id=tile.id + "_buffer", data=new_data, start_times=tile.start_times, end_times=tile.end_times),])

The following JSON definition includes the python3 code that applies the buffer operation and a simple feature collection that contains two points with start and end time stamps.

{
  "code": {
    "source": "tile = data.get_feature_collection_tiles()[0] \nbuf = tile.data.buffer(5) \nnew_data = tile.data.set_geometry(buf) \ndata.set_feature_collection_tiles([FeatureCollectionTile(id=tile.id + \"_buffer\", data=new_data, start_times=tile.start_times, end_times=tile.end_times),])\n",
    "language": "python"
  },
  "data": {
    "proj": "EPSG:4326",
    "feature_collection_tiles": [
      {
        "id": "test_data",
        "data": {
          "features": [
            {
              "geometry": {
                "coordinates": [
                  24,
                  50
                ],
                "type": "Point"
              },
              "id": "0",
              "type": "Feature",
              "properties": {
                "a": 1,
                "b": "a"
              }
            },
            {
              "geometry": {
                "coordinates": [
                  30,
                  53
                ],
                "type": "Point"
              },
              "id": "1",
              "type": "Feature",
              "properties": {
                "a": 2,
                "b": "b"
              }
            }
          ],
          "type": "FeatureCollection"
        },
        "end_times": [
          "2001-01-02T00:00:00",
          "2001-01-03T00:00:00"
        ],
        "start_times": [
          "2001-01-01T00:00:00",
          "2001-01-02T00:00:00"
        ]
      }
    ]
  }
}

Running the code, with the assumption that the JSON code was placed in the shell environmental variable "JSON", should look like this:

curl -H "Content-Type: application/json" -X POST -d "${JSON}" http://localhost:5000/udf

The result of the processing are two polygons (coordinates are truncated):

{
  "feature_collection_tiles": [
    {
      "data": {
        "features": [
          {
            "geometry": {
              "coordinates": [
                [
                  [
                    29.0,
                    50.0
                  ],
                  [
                    "..."
                  ],
                  [
                    29.0,
                    50.0
                  ]
                ]
              ],
              "type": "Polygon"
            },
            "id": "0",
            "properties": {
              "a": 1,
              "b": "a"
            },
            "type": "Feature"
          },
          {
            "geometry": {
              "coordinates": [
                [
                  [
                    35.0,
                    53.0
                  ],
                  [
                    "..."
                  ],
                  [
                    35.0,
                    53.0
                  ]
                ]
              ],
              "type": "Polygon"
            },
            "id": "1",
            "properties": {
              "a": 2,
              "b": "b"
            },
            "type": "Feature"
          }
        ],
        "type": "FeatureCollection"
      },
      "end_times": [
        "2001-01-02T00:00:00",
        "2001-01-03T00:00:00"
      ],
      "id": "test_data_buffer",
      "start_times": [
        "2001-01-01T00:00:00",
        "2001-01-02T00:00:00"
      ]
    }
  ],
  "models": {},
  "proj": "EPSG:4326",
  "raster_collection_tiles": []
}