New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GNIP: GeoNode signals/notification refactor. #2889

Closed
afabiani opened this Issue Feb 3, 2017 · 8 comments

Comments

Projects
None yet
7 participants
@afabiani
Contributor

afabiani commented Feb 3, 2017

Proposed by

Ariel Nunez (Terranodo)
Alessio Fabiani (GeoSolutions)

Assigned to release

None yet.

GeoNode signals/notification refactor.

1. Background

GeoNode has been increasingly removing expensive operations from the request/response cycle. Before GeoNode 1.1, it used to do network calls to GeoServer in the HTML templates of the index page. Later on, most of the code involving communication with GeoServer was moved to signals that do not require the code to be written in the layers/model file but can be referenced in external apps. However, up until now, whenever a file is uploaded or a request to access a layer is sent, the end user keeps waiting for network calls (gsconfig or smtp) instead of having these processes happen out of band.

2. Proposal

In this GNIP, we propose to change the communication between apps to follow a topic approach, similar to how logging works, geonode.layers would send notifications to an exchange saying that a user uploaded a file, that someone wants access to a dataset, etc. Consumers would then subscribe to topics and act on messages that are sent, allowing for example for both a QGIS backend and a GeoServer backend to change their internal status (layer configuration) based on user actions with the GeoNode UI. Similarly, a layer edited via the Geoserver admin interface would be able to broadcast messages about that change and a consumer on the geonode side could rebuild the thumbnail or update the bounding box without a full updatelayers.

2.1 Functional perspective

As stated above, the main focus of this work package is to allow GeoNode to:

  • Generate auditing messages when resources of interests are modified.
  • Perform actions as consequence of auditing (sending emails, accessing the database, doing network calls,...) without blocking the main request/response cycle.

This is possible by implementing a mechanism which allows GeoNode to schedule actions which will run asynchronously, executing configurable operations.

The paradigm we are going to base our work on in order to implement such mechanism is a producer/consumer message passing method based on queues and deferred tasks. Message passing is a method which program components can use to communicate and exchange information. It can be implemented synchronously or asynchronously and can allow discrete processes to communicate without problems. Message passing is often implemented as an alternative to traditional databases for this type of usage because message queues often implement additional features, provide increased performance, and can reside completely in-memory.

A conceptual perspective of the infrastructure we are proposing is depicted in Fig. 1. Following the decoupled producer/consumer paradigm we intend to extend the GeoNode infrastructure (which comprises of GeoServer as well) in order to serialize and send audit messages about internal resources modifications to an external message broker which is responsible for the guaranteed asynchronous delivery of such messages to the registered consumers.

The Consumers will be responsible for collecting and properly processing incoming messages for various purposes; this section of the proposal mainly focuses on delivering notifications to end users in order to inform them about audit events generated in the system but additional consumers might be configured for different purposes like as an instance logging to a persistent storage such message for security reasons (being able to reconstruct who did what) or redirecting them into systems used to performing monitoring of the infrastructure; as a consequence the consumers should be pluggable as well as finely configurable.

In terms of delivering the notifications to end users we envision the need to implement at least consumers able to:

  1. Deliver emails to registered users containing the audit message.
  2. Create a custom RSS feed filtered on the base of the current logged user.

It is worth to point that the the message consumers devoted to send notifications to users will need to:

  • Account for associating users’ specific filters to be defined when subscribing for notifications and therefore applied on the incoming messages in order to deliver to right amount of information to end users.
  • Account for defining the frequency of the notifications being sent (e.g. daily summaries, near real-time notifications).

Aside from the two notifications mechanism mentioned above others might be implemented (e.g. Social Media integration) since as mentioned above we intend to create a pluggable and extensible API for the message consumers.

image1

2.2 About celery tasks

Initial work has been done on GeoNode to use celery to move off tasks from the main thread but a more comprehensive approach needs to be done by auditing the codebase for these operations and moving them to external tasks. However, celery tasks have a drawback, they imply the code that triggers the signal or the code that sends the notification needs to know what to do about it. i.e. which task to defer. As GeoNode gains more backends and monitoring frameworks, a new model that allows consumers to decide what to do with the notifications is a better architecture choice than the tight integration celery implies.

2.3 GeoServer as a notifications producer

While the Layer metadata is stored and managed by GeoNode, the geospatial resource lives into GeoServer. A geospatial resource can be imported through the GeoNode interface but it will served and managed by GeoServer together with some ancillary information like styling and so on, as such if we want to be able to send users notifications about geospatial resources, e.g. when a style has been changed on a certain layer or when a certain vector layers has been edited we must wire also GeoServer and allow it to send notifications. We should also mention the fact that sometime administrative actions could be taken directly on GeoServer (legitimately or not) which is something we should be able to track and notify administrative user about.

Fortunately GeoServer is mature application which was built with this use case in mind. It provide a plethora of extension point and listeners that can be implemented in order to collect information about the actions being taken either through the GUI or through the REST Interface. In the context of this proposal we intend to:
Implement proper transaction listeners to launch notifications whenever someone is performing a WFS-T transaction on a certain layer.
Implement proper catalog listener in order to launch notification whenever someone is making changes to the internal GeoServer configuration.

Currently there is no way from GeoNode to capture information about updates performed on a vector layer in terms of editing or on a GeoServer resource (style change, geospatial layer and publishing settings, deletion, …), therefore a user is not able to be notified on the real resource usage but only on changes made on its GeoNode representation. In the current core development we also envisage an improvement of the GeoNode notifications mechanism in order to allow a user to register also for resource updates done through GeoServer.

The proposal is to provide GeoServer with a transaction as well as catalog change listener plugin as part of the core development that can be used to post asynchronously information about such events to GeoNode Django which will submit them as Celery tasks into RabbitMQ to be handled asynchronously.

3. Technical Details

The implementation will rely on Kombu + RabbitMQ and implement a new Django management command to run the message listener and perform the backend tasks.

3.1 GeoNode producers

The signals will be updated so that instead of performing work, they only send out notifications to the message queue, making the request / response cycle as fast and simple and possible.

The code will be updated in the following locations:

accounts		emails		emails: invite, admin approve, etc
notifications   all			all notifications triggers an email depending on user preferences	
layers			signals		upload uses geoserver and notifications services
layers			signals		delete uses geoserver and notifications services	
maps			creation	uses geoserver and notification services	
documents		creation	uses notification services	
geoserver		updates		geonode should subscribe to this events
3.2 GeoNode consumers

The proposed implementation for the consumers is listeners running on an out of band management command, similar to the celery daemon that would receive the messages and pass them on to the appropriate function.

    def on_layer_viewer(self,body,message):
        logger.info("on_layer_viewer: RECEIVED MSG - body: %r" % (body,))
        viewer = body.get("viewer")
        owner_layer = body.get("owner_layer")
        layer_id = body.get ("layer_id")
        layer_view_counter(layer_id)
        send_email_owner_on_view(owner_layer,viewer,layer_id)
        message.ack()
        logger.info("on_layer_viewer: finished")

    def get_consumers(self, Consumer, channel):
        return [
            Consumer(queue_layer_viewers,
                     callbacks=[self.on_layer_viewer]),
        ]
3.3 GeoServer producers

It will be provided a GeoServer extension, pluggable and configurable, allowing it to monitor event thrown by resources and catalog modifications and publish them on external queue and/or message brokers.

The extension will be extensible enough to allow users to define both custom writers, responsible to create the message in a specific format through the use of templates, and custom publishers, responsible to publish messages on specific endpoints, which could be loggers as well as message brokers, databases or other systems.

The figure below, depicts architectural details of the GeoServer extension
image3

GeoNode Specific Writers and Publishers
Specifically for GeoNode, will be implemented both custom writer and publisher.

GeoServerNotificationGeoNodeMessageWriter
Notifications messages will be read by GeoNode through the Kombu Python messaging library.
Kombu, since 3.0, will only accept json/binary or text messages by default.

Example messages generated from GeoServer:

Adding a new vectorial feature to the Catalog

{
    "id":123e4567-e89b-12d3-a456-426655440000,
    "type":"CatalogAddEvent",
    "generator":"<GEOSERVER_ID>",
    "timestamp": 573828946728,
    "user": "admin",
    "originator": "<IP_OR_HOST>",
    "source": {
        "id":"FeatureTypeInfoImpl--570ae188:124761b8d78:-7fc1",
        "resource":"FeatureTypeInfo",
        "name":"states",
        "nativeName":"states",
        "namespace":"topp",
        "title":"USA Population",
        "abstract":"This is some census data on the states."
    }
}

NOTE: notice that the property “generator” will contain the ID of the GeoServer instance. In GeoServer there is also the possibility to uniquely identify the single instance, even if part of a cluster, by using a specific extension allowing users to define its identifier. The message will also contain the HOST or IP of the GeoServer host in the “originator” property.

Adding a new vectorial layer to the Catalog

{
    "id":123e4567-e89b-12d3-a456-426655440001,
    "type":"CatalogAddEvent",
    "generator":"GeoServer",
    "timestamp": 573828946729,
    "user": "admin",
    "originator": "localhost",
    "source": {
        "id":"LayerInfoImpl--570ae188:124761b8d78:-7fc0",
        "resource":"LayerInfo",
        "type": "VECTOR",
        "name":"states",
        "nativeName":"states",
        "namespace":"topp",
        "path":"/",
        "defaultStyle":"polygon"
        "styles": [
            {"style": "line"},
            {"style": "point"}
        ]
    }
}

Adding new features to a Resource

{
    "id":123e4567-e89b-12d3-a456-426655440006,
    "type":"PostUpdateEvent",
    "generator":"GeoServer",
    "timestamp": 573828946756,
    "user": "admin",
    "originator": "localhost",
    "source": {
        "id":"FeatureTypeInfoImpl--570ae188:124761b8d78:-7fc1",
        "name":"states",
        "nativeName":"states",
        "namespace":"topp",
        "title":"USA Population",
        "abstract":"This is some census data on the states."
    },
    "totalInserted": 56    
}
3.4 GeoServerNotificationRabbitMQPublisher

A RabbitMQ Publisher is an extension able to connect and publish messages to RabbitMQ Message Broker using the RabbitMQ Client APIs.

  • It is configurable through external properties files, and it will be aware of:
  • How to connect to RabbitMQ Server.
  • Content Type of Messages to serialize.
  • Channel details, like the name, the type (e.g. “direct”, “fanout”, …).
  • Timeouts and retries.

4 Potential problems

  • Race conditions with multiple consumers doing potentially conflicting operations
  • The same message being sent multiple times
  • Some of the consumers missing messages
@simod

This comment has been minimized.

Show comment
Hide comment
@simod

simod Feb 3, 2017

Member

Alessio, very interesting proposal. Designed for scaling, thanks. Looking forward for it.

Member

simod commented Feb 3, 2017

Alessio, very interesting proposal. Designed for scaling, thanks. Looking forward for it.

@ingenieroariel

This comment has been minimized.

Show comment
Hide comment
@ingenieroariel

ingenieroariel Feb 6, 2017

Member

If there are no initial objections we will start work and update the GNIP as coding progresses.

Member

ingenieroariel commented Feb 6, 2017

If there are no initial objections we will start work and update the GNIP as coding progresses.

@waybarrios

This comment has been minimized.

Show comment
Hide comment
@waybarrios

waybarrios Feb 7, 2017

Contributor

I can identificate, we need to apply this implementation on the following signals and methods on GeoNode:

image

image

image

image

image

So, if you have any feedback or suggestions please let us know.

Contributor

waybarrios commented Feb 7, 2017

I can identificate, we need to apply this implementation on the following signals and methods on GeoNode:

image

image

image

image

image

So, if you have any feedback or suggestions please let us know.

@afabiani

This comment has been minimized.

Show comment
Hide comment
@afabiani

afabiani Feb 13, 2017

Contributor

+1

Contributor

afabiani commented Feb 13, 2017

+1

@francbartoli

This comment has been minimized.

Show comment
Hide comment
@francbartoli

francbartoli Feb 16, 2017

Member

@waybarrios very useful change, but I would expect more huge testing in this PR

Member

francbartoli commented Feb 16, 2017

@waybarrios very useful change, but I would expect more huge testing in this PR

@ingenieroariel ingenieroariel self-assigned this Feb 20, 2017

@davisc

This comment has been minimized.

Show comment
Hide comment
@davisc

davisc Mar 14, 2017

Contributor

@afabiani and @ingenieroariel nice work. We've been working on a kafka-geoserver plugin which would be an interesting test case. https://github.com/boundlessgeo/kafka-geoserver-plugin

Contributor

davisc commented Mar 14, 2017

@afabiani and @ingenieroariel nice work. We've been working on a kafka-geoserver plugin which would be an interesting test case. https://github.com/boundlessgeo/kafka-geoserver-plugin

@ingenieroariel

This comment has been minimized.

Show comment
Hide comment
@ingenieroariel

ingenieroariel Mar 14, 2017

Member

We also explored kafka in our initial analysis but found it may be harder for downstream projects to adopt and preferred to just use the existing RabbitMQ. Very interested to hear more about your experience with Kafka as I think it is a technically superior alternative (message durability, scalability, etc). How hard was it to update GeoNode installers to use Kafka?

Member

ingenieroariel commented Mar 14, 2017

We also explored kafka in our initial analysis but found it may be harder for downstream projects to adopt and preferred to just use the existing RabbitMQ. Very interested to hear more about your experience with Kafka as I think it is a technically superior alternative (message durability, scalability, etc). How hard was it to update GeoNode installers to use Kafka?

@afabiani

This comment has been minimized.

Show comment
Hide comment
@afabiani

afabiani Mar 14, 2017

Contributor
Contributor

afabiani commented Mar 14, 2017

@capooti capooti added the celery label May 17, 2017

cezio added a commit to geosolutions-it/geonode that referenced this issue Sep 25, 2017

cezio added a commit to geosolutions-it/geonode that referenced this issue Sep 25, 2017

cezio added a commit to geosolutions-it/geonode that referenced this issue Sep 25, 2017

afabiani added a commit that referenced this issue Sep 25, 2017

Merge pull request #3307 from geosolutions-it/notifications_improved
PR for GNIP: GeoNode signals/notification refactor. #2889

@afabiani afabiani closed this Dec 7, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment