DiscreETLy is an add-on dashboard service on top of Apache Airflow. It is a user friendly UI showing status of particular DAGs. Moreover, it allows the users to map Tasks within a particular DAG to tables available in any system (relational and non-relational) via friendly yaml definition. DiscreETLy provides fuctionality for monitoring DAGs status as well as optional communication with services such as Prometheus or InfluxDB.
Minimal setup required to run the dashboard requires
docker. You can find docker installation instructions on
official docker website.
The minimal setup requires also access to Airflow MySQL instance (MySQL version should be >= 8 and allow analytical functions).
Before running or deploying DiscreETLy a configuration file needs to be provided. The template for configuration
file can be found in
settings.py.template. Configuration is provided as a standard python file,
which makes it easy to define and change utilizng Python APIs. The bare minimum configuration needed for the app to run requires definition of a secret key (stub provided in template document) and connection details for Airflow database (currently only MySQL is supported).
Configuration options for InfluxDB and Prometheus are optional. If those services are not defined in configuration file they will be simply ignored while running the app.
If environment is not specified, the application is run in DEBUG mode, so any errors will be reported on dashboard UI. If environment variable
ENV_VAR_PREFIX is set to
PROD or appropriate
option is changed in
settings.py file the application will serve
500 errors as defined in dashboard template.
Configuring dashboard views
The basic configuration file is enough to run the dashboard, however, in order to take full advantage of dashboard features and functionality there are some additional steps that need to be performed.
The dashboard allows the users to monitor the progress of particular Airflow DAGs and tasks, moreover, it can also show the status of tasks in relation to tables they populate with data.
However, Airflow does not contain the mapping between tables (Hive, database) and that is
why a mapping needs to be provided to discreETLy. A mapping can be defined in
tables.yaml file available in
config folder. Each mapping consists of a few pieces of information:
name- the name of the table in a database
db- a database definition (can be a namespace)
uses- a table that provides data for the task populating currently describe table (helps if there are dependencies between tables)
dag_id- id of a DAG that contains the task required for the mapping
task_id- id of the task that populates the table
The application will automatically ingest the definition of the tables and map them to particular tasks.
The dashboard allows to monitor sets of tables that constitute to a report that is maintained by DE team or stakeholders.
The definition of the set of tables and general report metadata can be
reports.yaml file available in
Please, refer to
report.yaml.template to learn more about particular options that
need to be provided.
Not all ETLs are always defined through Airflow. Data engineering field is rich and
complex and requires a plethora of tools to ensure high quality of operations. Should there be any ETLs defined outside of Airflow that need to be maintained and
monitored they can be defined in
extra_etl.yaml file available in
config folder. Due to the fact that it is impossible to predict the availability of metadata for
those definitions separate links need to be provided to point to external monitoring systems.
In order to display information related to any external ETLs there are some additional steps required:
EXTRA_ACTIVEoption in configuration to true,
- providing a custom jinja template named
dashboard/blueprints/extra/templates. An example of such template is available here,
- providing custom logic for data processing in
extra_etlto process and enrich the data.
The table descriptions tab displays table and column descriptions (comments). This can be useful for stakeholders to better understand your data structure and search for particular information. The default implementation takes them from AWS glue which stores comments added during table creation. Since not all tables have to have comments provided, this tab is fully optional.
It's possible to set a custom data provider which reads the table descriptions from a different source than AWS glue. This is controlled by the TABLE_DESCRIPTION_SERVICE setting.
Before running the container with app we first need to build it so it becomes available in our local docker repository. Run the following command from project's root directory.
docker build -t <image_name>:<image_version> .
Once the image is build the application can be triggered by running:
docker run -e <env_name>=<env_value> --rm --name <container_name> -v <project_root_folder>:/app -p 8000:8000 <docker_image_name>:<image_version>
Let's dissect this command option by option:
-eflag allows to set up different evnvironment varaibles required to e.g. configure the app. Most of those options can be hardcoded in configuration file, however, passing them through environment is recommended. For more detials see configuration section of this README.
--rmremoves the container after stopping it. It ensures that there is always a fresh version of conpfiguration and other features while running the app.
-vmaps folders containing application from local environment to container. It ensures that if in development mode all changes applied to files on local file system are immediately reflected in container.
-pmaps a port from container to
If some of the configuration options are already available through
settings.py file the command for running the application can be significantly abbreviated (from project root folder):
docker run --rm -v $(pwd):/app -p 8000:8000 fandom/discreetly:latest
Remember to use docker image name and version provided during
Once the container is ready and running navigate to
localhost:8000 in a browser and enjoy.
In order to run the tests a docker image needs to be build first. The Dockerfile is available in
dashboard/tests/ folder. To build an image one can run the following command from project's root directory:
docker build -t dashboard:tests -f dashboard/tests/Dockerfile .
Once the image is build the tests can be preformed by typing
docker run --rm dashboard:tests
The output of this command shows a nicely formatted information of number of tests performed and success ratio (all tests are performed by using
If working iteratively rebuilding the image everytime some changes are made would be cumbersome. In order to avoid that one can pass additional parameter to subsequent runs (mapping of a local project folder to container destination):
docker run --rm -v <absolute_path_to_project_root_directory>:/tmp/dashboard/ dashboard:tests