Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GNIP 90 New importer module #10474

Closed
2 of 5 tasks
giohappy opened this issue Dec 23, 2022 · 2 comments
Closed
2 of 5 tasks

GNIP 90 New importer module #10474

giohappy opened this issue Dec 23, 2022 · 2 comments
Assignees
Labels
4.1.x gnip A GeoNodeImprovementProcess Issue master

Comments

@giohappy
Copy link
Contributor

giohappy commented Dec 23, 2022

GNIP 90 - New importer module

Overview

The upload workflow is complex and requires revision, updates, and improvements.
Currently, the critical points of the upload workflow are:

  • The steps are hardcoded and shared between all the resources
  • It's dependent on the GeoServer importer
  • It's not 100% async

With this proposal, we want to resolve the critical aspects of the actual upload flow and improve the reliability and speed of the import by:

  • relying on celery and async at 100%
  • removing the dependency with GeoServer
  • encapsulating the upload flow for a single resource
  • letting GeoNode be responsible for the data source
  • having the available resources as a registry
  • relying on ogr2ogr for the data import into the datastore DB

Proposed By

@mattiagiupponi
@giohappy

Assigned to Release

This proposal is for GeoNode.

State

  • Under Discussion
  • In Progress
  • Completed
  • Rejected
  • Deferred

**NOTICE**: GeoNode master demo is already configured with the new importer, which is also activated for ESRI Shapefiles and Geotiffs. Feel free to test it!

Motivation

The upload workflow needs some revisions even if over the years it had some evolution, but it relies on a set of predefined steps that are inflexible and difficult to extend/adapt for more advanced flows or different input formats (e.g. GPKG, etc.).
GeoNode cannot expose the imported data since it is not managed by the app, and the integration with the GeoServer importer (which also runs asynchronously can create some inconsistency with the resource state.

Proposal

The new importer is based on a few strong concepts:

  • orchestrator: the main object which is responsible for starting the import process and evaluating its progression
  • dynamic models: dynamic creation of the Django model for the resource (vector only)
  • handlers: objects entitled to handle the resource

Handlers

The handler is the core component of the new importer.
It represents an object that handles all the aspects of the resource imported in GeoNode.

Each handler has the responsibility to:

  • validate the input data
  • define the upload step required for importing the resource
  • manage the lifecycle of the data behind a resource (e.g. copy, delete, etc.)

A few handlers are already available that encapsulate the shared operations for a generic resource type (Vector or Raster file),. However, each handler can override the functionalities by defining the specific behavior for an input format.

Additional documentation that describes how to create a handler, will be available soon on the geonode-importer repository.

Examples for handlers code can be found here

Handlers registry

A new setting is added and named IMPORTER_HANDLERS.
This new setting registers in the application scope the handlers available.

IMPORTER_HANDLERS = os.getenv('IMPORTER_HANDLERS', [
    'importer.handlers.geojson.handler.GeoJsonFileHandler',
    'importer.handlers.shapefile.handler.ShapeFileHandler'
])

At the moment, if the new importer module is installed and configured it will operate along with the old importer. The new importer will be triggered for input formats listed inside the IMPORTER_HANDLERS list only.

Vector handlers

The following vector extensions are supported:

  • GeoPackage
  • GeoJson
  • Shapefiles
  • KML & KMZ

Raster handlers

The following raster extension is supported:

  • GeoTiff

NB: Since we are not relying anymore on GeoServer and GeoServer must have access to the raster data, both GeoNode and Geoserver must share a long-term storage folder where the raster data is stored.

Non-spatial data

Thanks to the versatility of the handlers the geonode-importer can handle non-spatial data like documents, pdf, CSV, json, etc...

Dynamic Models

One of the limits of GeoNode is not being aware of the base data that is imported and not being able to navigate it.
One of the features introduced with the geonode-importer makes GeoNode able to perform these actions.
During the import process, the handler (for the vector file type only) will read the file schema with ogr2ogr, define a dynamic model based on that specification and save its configuration in the datastore database.

Orchestrator

This object calls the next step defined in the handler. Is also the core point where the upload progression is updated and estimated.

Backwards Compatibility

The new workflow is an external third-party library and isn't contained in the core code atm.

Future evolution

Thanks to the dynamic model available, GeoNode could be able to expose CRUD API on the data itself.
Progressively the current upload flow will be transferred to the new one.

Feedback

Update this section with relevant feedback, if any.

Voting

Project Steering Committee:

  • Alessio Fabiani: 👍
  • Francesco Bartoli: 0
  • Giovanni Allegri: 👍
  • Toni Schoenbuchner: 👍
  • Florian Hoedt: 👍

Links

Remove unused links below.

@giohappy giohappy added gnip A GeoNodeImprovementProcess Issue master 4.1.x labels Dec 23, 2022
@gannebamm
Copy link
Contributor

This could be used to implement parts of #8714
We will review the concept and implementation at the first weeks of January. Thanks for the work @giohappy @mattiagiupponi

@giohappy
Copy link
Contributor Author

giohappy commented May 2, 2023

FYI the new importer has been moved under GeoNode organization: https://github.com/GeoNode/geonode-importer
We're going to make it the default importer for GeoNode in master branch the next days.

mattiagiupponi added a commit that referenced this issue May 2, 2023
mattiagiupponi added a commit that referenced this issue May 3, 2023
mattiagiupponi added a commit that referenced this issue May 3, 2023
mattiagiupponi added a commit that referenced this issue May 3, 2023
mattiagiupponi added a commit that referenced this issue May 3, 2023
mattiagiupponi added a commit that referenced this issue May 3, 2023
mattiagiupponi added a commit that referenced this issue May 3, 2023
mattiagiupponi added a commit that referenced this issue May 3, 2023
mattiagiupponi added a commit that referenced this issue May 3, 2023
mattiagiupponi added a commit that referenced this issue May 3, 2023
ridoo pushed a commit to Thuenen-GeoNode-Development/geonode that referenced this issue Jun 2, 2023
* [Fixes GeoNode#10474] enable geonode-importer by default

* [Fixes GeoNode#10474] enable geonode-importer by default

* [Fixes GeoNode#10474] fix importlayers

* [Fixes GeoNode#10474] fix importlayers

* [Fixes GeoNode#10474] fix importlayers

* [Fixes GeoNode#10474] fix importlayers

* [Fixes GeoNode#10474] fix importlayers

* [Fixes GeoNode#10474] fix tests

* [Fixes GeoNode#10474] fix tests

* [Fixes GeoNode#10474] fix tests

* [Fixes GeoNode#10474] fix tests

* [Fixes GeoNode#10474] fix tests

* add logger for publishing

* test fix

* test fix

* Test removed since is no longer needed

* fix requirements

* remove unwanted change in the settings file

---------

Co-authored-by: Alessio Fabiani <alessio.fabiani@geosolutionsgroup.com>
Co-authored-by: Giovanni Allegri <giohappy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4.1.x gnip A GeoNodeImprovementProcess Issue master
Projects
None yet
Development

No branches or pull requests

3 participants