Skip to content

Improve pre-commit to generate Airflow diagrams as a code#36333

Merged
potiuk merged 1 commit intoapache:mainfrom
potiuk:distribute-diagram-generation
Dec 20, 2023
Merged

Improve pre-commit to generate Airflow diagrams as a code#36333
potiuk merged 1 commit intoapache:mainfrom
potiuk:distribute-diagram-generation

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Dec 20, 2023

Since we are getting more diagrams generated in Airflow using the "diagram as a code" approach, this PR improves the pre-commit to be more suitable to support generation of more of the images coming from different sources, placed in different directories and generated independently, so that the whole process is more distributed and easy for whoever creates diagrams to add their own diagram.

The changes implemented in this PR:

  • the code to generate the diagrams is now next to the diagram they generate. It has the same name as the diagram, but it has the .py extension. This way it is immediately visible where is the source of each diagram (right next to each diagram)

  • each of the .py diagram Python files is runnable on its own. This way you can easily regenerate the diagrams by running corresponding Python file or even automate it by running "save" action and generate the diagrams automatically by running the Python code every time the file is saved. That makes a very nice workflow on iterating on each diagram, independently from each othere

  • the pre-commit script is given a set of folders which should be scanned and it finds and run the diagrams on pre-commmit. It also creates and verifies the md5sum hash of the source Python file separately for each diagram and only runs diagram generation when the source file changed vs. last time the hash was saved and committed. The hash sum is stored next to the image and sources with .md5sum extension

Also updated documentation in the CONTRIBUTING.rst explaining how to generate the diagrams and what is the mechanism of that generation.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Dec 20, 2023

This one should set us on a path where we should be able to convert pretty much all our architecture-like-diagrams to diagrams as a code approach easily.

It also allows to iterate on the diagrams very easily and explains how to integrate diagram "live preview" making it possible to instantly regenerate each diagram separately using "save actions".

Copy link
Contributor

@josh-fell josh-fell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This split is nice and more scalable indeed.

@potiuk potiuk force-pushed the distribute-diagram-generation branch from 85b6d9f to 4e972c4 Compare December 20, 2023 16:25
@vincbeck
Copy link
Contributor

Feel free to merge without my approval (I am now away until next year)

@potiuk potiuk force-pushed the distribute-diagram-generation branch from 4e972c4 to 7f8dcaf Compare December 20, 2023 17:24
Since we are getting more diagrams generated in Airflow using the
"diagram as a code" approach, this PR improves the pre-commit to be
more suitable to support generation of more of the images coming
from different sources, placed in different directories and generated
independently, so that the whole process is more distributed and easy
for whoever creates diagrams to add their own diagram.

The changes implemented in this PR:

* the code to generate the diagrams is now next to the diagram they
  generate. It has the same name as the diagram, but it has the .py
  extension. This way it is immediately visible where is the source
  of each diagram (right next to each diagram)

* each of the .py diagram Python files is runnable on its own. This
  way you can easily regenerate the diagrams by running corresponding
  Python file or even automate it by running "save" action and generate
  the diagrams automatically by running the Python code every time
  the file is saved. That makes a very nice workflow on iterating on
  each diagram, independently from each othere

* the pre-commit script is given a set of folders which should be
  scanned and it finds and run the diagrams on pre-commmit. It also
  creates and verifies the md5sum hash of the source Python file
  separately for each diagram and only runs diagram generation when
  the source file changed vs. last time the hash was saved and
  committed. The hash sum is stored next to the image and sources
  with .md5sum extension

Also updated documentation in the CONTRIBUTING.rst explaining how
to generate the diagrams and what is the mechanism of that
generation.
@potiuk potiuk force-pushed the distribute-diagram-generation branch from 7f8dcaf to f56d37d Compare December 20, 2023 17:28
@potiuk
Copy link
Member Author

potiuk commented Dec 20, 2023

Feel free to merge without my approval (I am now away until next year)

Have good holidays :)

@potiuk potiuk merged commit b35b08e into apache:main Dec 20, 2023
@potiuk potiuk deleted the distribute-diagram-generation branch December 20, 2023 18:44
potiuk added a commit that referenced this pull request Dec 30, 2023
Since we are getting more diagrams generated in Airflow using the
"diagram as a code" approach, this PR improves the pre-commit to be
more suitable to support generation of more of the images coming
from different sources, placed in different directories and generated
independently, so that the whole process is more distributed and easy
for whoever creates diagrams to add their own diagram.

The changes implemented in this PR:

* the code to generate the diagrams is now next to the diagram they
  generate. It has the same name as the diagram, but it has the .py
  extension. This way it is immediately visible where is the source
  of each diagram (right next to each diagram)

* each of the .py diagram Python files is runnable on its own. This
  way you can easily regenerate the diagrams by running corresponding
  Python file or even automate it by running "save" action and generate
  the diagrams automatically by running the Python code every time
  the file is saved. That makes a very nice workflow on iterating on
  each diagram, independently from each othere

* the pre-commit script is given a set of folders which should be
  scanned and it finds and run the diagrams on pre-commmit. It also
  creates and verifies the md5sum hash of the source Python file
  separately for each diagram and only runs diagram generation when
  the source file changed vs. last time the hash was saved and
  committed. The hash sum is stored next to the image and sources
  with .md5sum extension

Also updated documentation in the CONTRIBUTING.rst explaining how
to generate the diagrams and what is the mechanism of that
generation.

(cherry picked from commit b35b08e)
@potiuk potiuk added this to the Airflow 2.8.1 milestone Dec 30, 2023
@potiuk potiuk added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Dec 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) kind:documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants