Skip to content

DACRepair/airflow-repoman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Airflow DAG Repo Manager

A plugin for Apache Airflow.

##Setup: At the moment, I have not set up binary builds, so adding it with pip will require a "git+< url >" entry. ###Prerequisites:

apache-airflow
click
cryptography
gitpython

# pip install apache-airflow click cryptography gitpython

NOTE: Airflow must be installed and configured prior to plugin install.

###Installation:

NOTE: This will install needed prerequisites

pip install git+https://github.com/DACRepair/airflow-repoman.git

Once this has been completed, you will need to verify your dag_folder is set to where you want it. Open up a command line and run airflow-repoman init. If this does not work, verify that there is a binary in your bin folder.

Usage:

###Airflow Webserver To access the DAG Repos, simply navigate to Admin -> DAG Repos (this applies to both flask_admin and F.A.B)

alt text

Once you are on the list page, press + or Create to begin generating a record.

alt text

Field Description
Repo Name The name of the repo (used to name the folder / for organizational use)
Repo Enabled Enable or Disable sync
Repo URL The URL to the git repo.
Repo Branch The branch you would like to pull your DAG's from.
Repo Username The username used to access the git repo.
Repo Password The password used to access the git repo (This fields is encrypted in the database much like Connection passwords.
Refresh Interval The rate that the app will check for changes (in seconds).

Note about SSH repos:

Officially, this plugin does not support SSH repos. This does not mean they do not work. You will need to make sure you have password-less login to the git server or the proper keys set up in the OS. You will just need to make sure your "Repo User" is set as well as a proper URL. The password field is not used for this at all. Functionally, this is just passing information to the 'git' command, so you could flex it however you want, but it is not something the maintainer wants to support.

Cron job / Reposync worker

To run the repo sync job, you simply need to run airflow-reposync reposync. This will run the sync once, so setting it up as a cronjob can be used with this, or if you want to perform a one-shot sync. If you want to run it continuously (IE like a daemon), you will need to pass the --continuous flag at the end of the command.

Additional Notes:

The airflow-reposync job runner will only touch files in your DAG folder that have been created by the reposync. If you manually add other non managed DAG's, this is fine. Enabling / Disabling a repo does not delete the folder in the DAG folder either, nor does it affect the DAG's running capability (it only turns off automatic update). However, if you remove a repo from the web panel, it WILL delete the dag folder it generated.

Note about git:

Since this syncs DAGs via git, you will need to maintain that dag folder as a read-only folder. If you generate flat files such as csvs/blobs/etc, make sure you add an exception to your .gitignore file.

About

A repo manager for airflow DAG/Plugin repos.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages