Skip to content
/ SMDRM Public

Text based Named Entity Recognition data pipeline to annotate, and geo locate disaster related data points.

License

Notifications You must be signed in to change notification settings

ec-jrc/SMDRM

Repository files navigation

Social Media Disaster Risk Management

EuropeanCommission

Social Media Disaster Risk Management, SMDRM in short, is a Python based data pipeline application to process social media datapoints.

The goal of SMDRM is to provide you with an enriched version of your input data that you can further analyse, and visualize through a powerful dashboard.

Installation and Usage

Docker  Docker-compose  Python  Code Format

Requirements

SMDRM application is Docker Compose based. A running Docker daemon, and docker-compose software are required.

The current configuration is intended to run on a single machine. Ensure your machine meets the minimum requirements:

  • 8 CPUs
  • 12 GB free memory
  • 10 GB free disk storage
  • Access to public docker registry

If you have multiple machines, and you instend to use this solution in a production environment, we recommend to setup an orchestrated solution that runs on several machines.

In that case, Docker Swarm may be the easiest way, as it is configurable via docker-compose.yaml files.

‼️ Execute all bash commands from project root directory

Build

Build the application components

☕ Building the app for the first time can take several minutes to complete

docker-compose --profile pipelines build

Run

Start the application

☕ Although the command exits successfully, the app still takes several minutes to be up and running.

docker-compose up

Usage

‼️ Ensure your input data has the expected format, and it does not exceed 64mb after compression. For more details, read the Input Data section.

The application waits on you to upload your zipfile data, and start an Airflow workflow.

Go to the SMDRM API swagger UI, and select the uploads/upload endpoint to upload a zipfile.

‼️ Leave the default values for the dag_id, and collection_id fields.

A response with {"status": "queued"} in the payload indicates that Airflow has received the request, and it has triggered a workflow to processing you input zipfile.

You can check the status of the workflow progress. Go to the Airflow UI, and click on the Twitter DAG.

Once the workflow is successful, you can use the Kibana Dashboard to interactively visualize your enriched data.

Extras

For further documentation resources, check the docs directory

Credits

Licence

European Union Public Licence (EUPL) V1.2