Skip to content

-----===== Airflow ROADMAP =====----- #417

@mistercrunch

Description

@mistercrunch

This github issue is meant to track the big roadmap items ahead of use and for the community to provide ideas, feedback, steer and comment

Lineage Annotations

Airflow knows a lot about your job dependencies, but how much does it know about your data objects (databases, tables, reports, ...)? Lineage annotations would expose a way for users to define lineage between data objects explicitly and tie that to tasks and/or DAG. The framework may include hooks for programatic inference (HiveOperator could introspect the HQL and guess table names and lineage), and a way to override or complement these. The framework will most likely be fairly agnostic about your data objects, letting you namespace them however you want, and simply consider them as an array of parent/child relationship between strings. It may be nice to use dot notation and reserve the first part of the expression to object_type, allowing for color coding in a graph view and tying actions (links) and the like. This will course ship with nice graph visualization, browsing and navigation features.

Picklin'

stateless/codeless web servers and workers

Backfilll UI

Trigger a backfill from the UI

REST API

Essentially offering features similar to what is available in the CLI through a REST API, there may be some automagic solutions here that can figure out how to build REST specs from argparse, it would also insure consistency between the two

[done!]

Continuous integration with Travis-CI

Systematically run all unit tests against CDH and HDP on Python 2.7 and 3.5

Externally Triggered DAGs

Airflow currently assumes that you run your workflows on a fixed schedule interval. This is perfect for hourly, daily and weekly jobs. When thinking about "analytics as a service" and "analysis automation", many use cases are more of the "on demand" nature. For instance if you have a workflow that processes someone's genome when ordered to, or a workflow that builds a dataset on demand for data scientists based on parameters they provide, .... This requires a new class of DAGs that are triggered externally, not on a fixed schedule interval.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RoadmapThe issues /PR that are big changes on the roadmap

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions