Social Media Disaster Risk Management, SMDRM in short, is a Python based data pipeline application to process social media datapoints.
The goal of SMDRM is to provide you with an enriched version of your input data that you can further analyse, and visualize through a powerful dashboard.
SMDRM application is Docker Compose based. A running Docker daemon, and docker-compose software are required.
The current configuration is intended to run on a single machine. Ensure your machine meets the minimum requirements:
- 8 CPUs
- 12 GB free memory
- 10 GB free disk storage
- Access to public docker registry
If you have multiple machines, and you instend to use this solution in a production environment, we recommend to setup an orchestrated solution that runs on several machines.
In that case, Docker Swarm may be the easiest way, as it is configurable via docker-compose.yaml files.
‼️ Execute all bash commands from project root directory
Build the application components
☕ Building the app for the first time can take several minutes to complete
docker-compose --profile pipelines build
Start the application
☕ Although the command exits successfully, the app still takes several minutes to be up and running.
docker-compose up
‼️ Ensure your input data has the expected format, and it does not exceed 64mb after compression. For more details, read the Input Data section.
The application waits on you to upload your zipfile data, and start an Airflow workflow.
Go to the SMDRM API swagger UI, and select the uploads/upload endpoint to upload a zipfile.
‼️ Leave the default values for thedag_id
, andcollection_id
fields.
A response with {"status": "queued"}
in the payload indicates that Airflow has received the request,
and it has triggered a workflow to processing you input zipfile.
You can check the status of the workflow progress. Go to the Airflow UI, and click on the Twitter DAG.
Once the workflow is successful, you can use the Kibana Dashboard to interactively visualize your enriched data.
For further documentation resources, check the docs directory