Skip to content

Conversation

@ldct
Copy link
Contributor

@ldct ldct commented Jul 28, 2016

Dear Airflow Maintainers,

Please accept this PR that addresses the following issues:

  • (to be replaced with a link to AIRFLOW-X)

Testing Done:

  • TBD

Changes

  • Assuming that DAG_FOLDER is a git repository, on dag parsing, the created dag objects will have the git revision (sha) of DAG_FOLDER as a property
  • Task instances now have the git sha of their parent DAG as a property.
    • This is first set when the scheduler creates them.
    • It is stored in the database.
  • An executor and worker that executes a task instance now check a copy of the DAG_FOLDER out at the revision of the task instance

TODO:

  • Check that changing the version changes values used in derived files as well (eg .hql)
  • Limit resources consumed by git checkout folders
  • Unit tests
  • Test on real cluster, deploy changes to the dag folder while server and workers are running

@criccomini
Copy link
Contributor

What is this? Are you assuming that the DAGs dir is going to be a git repo?

@ldct
Copy link
Contributor Author

ldct commented Jul 28, 2016

Are you assuming that the DAGs dir is going to be a git repo?

Yes, as stated in the first entry in "changes".

@criccomini
Copy link
Contributor

Is that a valid assumption? Thus far, I think Airflow has worked if the dags directory doesn't have a .git dir in it (or some parent).

@criccomini
Copy link
Contributor

I see this, though: Opt-in via config file. So this is the escape hatch to make it backwards compatible?

@ldct
Copy link
Contributor Author

ldct commented Jul 28, 2016

I see this, though: Opt-in via config file. So this is the escape hatch to make it backwards compatible?

Yeah, the idea I'm working towards is for the user to be able to choose their method of version control via the config file, with the default being no version control (ie same behaviour as current master). Sorry if this was not clear.

I'm trying to isolate the git-specific logic to 2-3 functions, as in https://github.com/zodiac/incubator-airflow/blob/3e3715d5a15d09351e1944bdb56b5fe91bec16cf/airflow/version_control/__init__.py#L11-L12, so a new version control system (eg mercurial) can be implemented by writing those 2-3 functions.

It is not completely backwards-compatible, though, because I need to add a column to task_instance.

@criccomini
Copy link
Contributor

It is not completely backwards-compatible, though, because I need to add a column to task_instance.

Should be fine, since we have airflow upgradedb

@plypaul
Copy link
Contributor

plypaul commented Aug 1, 2016

@artwr If you have a moment to take a look at this, that would be great!

subdir = os.path.abspath(os.path.expanduser(subdir))
return subdir

if "DAGS_FOLDER" in subdir:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here if subdir is None?

@r39132
Copy link
Contributor

r39132 commented Oct 27, 2016

@zodiac Please rebase this commit and let me know your intention on whether you plan to get this across the line. I will wait 5 days to hear back on next steps.

@asfgit asfgit closed this in 0b6ac66 Nov 2, 2016
@r39132
Copy link
Contributor

r39132 commented Nov 2, 2016

Pls reopen when ready for review, test, and merge. Also, pls create a JIRA at that time.

alekstorm pushed a commit to alekstorm/incubator-airflow that referenced this pull request Jun 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants