Skip to content

New Breeze command start-airflow, it replaces the previous flag#11157

Merged
potiuk merged 11 commits intoapache:masterfrom
francescomucio:master
Sep 27, 2020
Merged

New Breeze command start-airflow, it replaces the previous flag#11157
potiuk merged 11 commits intoapache:masterfrom
francescomucio:master

Conversation

@francescomucio
Copy link
Copy Markdown
Contributor

After few PRs, ideas and exchanges, we reached the consensus to make start-airflow a full fledged Breeze command and add some specific flags.

./breeze start-airflow starts the Airflow Scheduler and Webserver, it then leaves the user in front of a tmux session with the possibility to observe errors and use the Breeze shell. From the tmux shell is possible to run ./stop_airflow.sh to stop the Airflow components and close tmux.

By default start-airflow will not create the default connections or load the example dags. Two flags can be used if these objects are necessary: --load-default-connections and --load-example-dags.

In the previous implementation as flag, the init script was executed twice; to avoid this the tmux session is executed at a later stage with the script scripts/in_container/run_tmux.sh.

Previous PRs to check the history of this feature:
Starting breeze will run an init script after the environment is setup #11029
Introducing flags to skip example dags and default connections #11099

Revert "Introducing flags to skip example dags and default connection… #11110


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

Copy link
Copy Markdown
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic! Just two small nits and I am super-happy to merge :)

Copy link
Copy Markdown
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is soo cool. Thanks @francescomucio !

@potiuk potiuk merged commit 0db7a30 into apache:master Sep 27, 2020
@potiuk potiuk added this to the Airflow 1.10.13 milestone Sep 27, 2020
potiuk pushed a commit that referenced this pull request Sep 27, 2020
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Sep 27, 2020

Cherry picked it also to 1.10.13 - this way it will be a ..... Breeze .... to test the new release when we VOTE on it.

@francescomucio
Copy link
Copy Markdown
Contributor Author

That's great, happy to be helpful :)

@ashb
Copy link
Copy Markdown
Member

ashb commented Sep 28, 2020

What version of Airflow would this use by default? Does this fall foul of the "users should only be directed to releases, not head" point of the Apache release process?

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Sep 28, 2020

I believe breeze was so far meant as a development tool - for the people who have access to sources of Airflow and it is only advertised in the devlist and dev talks about contributing to Airflow. I don't think we ever advertised it to "users" of Airflow. And it is nowhere near running airflow in any "production" like environment - it is purely a development setup where you conveniently replace typing few commands with automation.

However, since you mentioned it - I think you gave birth to an idea to start advertising it. And I really like the idea.

Breeze is already part of each release (officially) and if someone uses those released sources and uses Breeze from those sources, it will install the very version Airflow it is the part of. So if you run Breeze from the sources of 1.10.13, it will install 1.10.13 by default (all technically sound - signed, released, and published in SVN).

This is perfectly sound with regards to Apache Release Policies I believe. It will use the released sources and - optionally the convenience binary images to rebuild the CI image and run Airflow. Unless there is something I missed this is perfectly OK with the release policy.

And with the addition of start-airflow command, Breeze actually becomes a very useful tool for users as well to test their DAGs, so I think it might be a good time to think about a section of "How to test your DAGs" in the documentation and point it to Breeze (the officially released one in sources). We can even make some instructions for the users on how to run their "DAG testing" environment easily. It literally requires just explanation in the documentation and nothing else - everything else is already in 1.10.13 when we release it.

Also, we can mention that the (1.10.13 released) version can be used to test any other Airflow Release. When you run start-airflow you can also add --install-airflow-version X.Y.Z (for example 1.10.13rc1 if we have it) to install Airflow from PyPI or --install-airflow-reference 1.10.12rc1 to install it from Github tag (or any other Git reference with that regard). This is not auto-completable or presented as an available option in --help (only the officially released versions are). This is something you have to do explicitly - knowing the version/tag you want to have, similarly as pip install apache-airflow==1.10.13rc1.

This is pretty much what I was doing to test if the RC candidates work (and we described it in dev/README). I got airflow sources from the .tar.gz source package from SVN but then I installed it using --install-airflow-version 1.10.12rc3.

Now with the start-airflow command, it's even better because I always did exactly this, started tmux session, initialized the db, created users, started webserver, and scheduler. Now, this all happens automatically, which is cool. Also, 1.10.13 version has an extra --use-rbac option that will allow starting an RBAC/non-RBAC version of Airflow this way which I also used during testing.

Do you think there is any danger with violating Policies @ashb if we do that and announce it as a tool for users to test their DAGs? I think - since everything is released in sources and available to build by the users with the right "platform and tools", this sounds pretty much OK?

Do you have any concerns if we start announcing it to the users? I think it is a really handy tool for them now.

@ashb
Copy link
Copy Markdown
Member

ashb commented Sep 28, 2020

The "not advertising it outside of dev list" answers my question, yes.

We should make RBAC the default (change --use-rbac in to --no-rbac)

That said, a lot of what breeze does is around mouting files, taking care of rebuilds, cachine, and the like - none of which is strictly necessary or needed when running and installing releases from PyPi. For users wanting to run Airflow a much simpler solution is a single Docker compose file -- all we have to worry about mounting for them is AIRFLOW_HOME folder. Something like https://github.com/puckel/docker-airflow/blob/master/docker-compose-LocalExecutor.yml

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Sep 28, 2020

The "not advertising it outside of dev list" answers my question, yes

We should make RBAC the default (change --use-rbac in to --no-rbac)

Feel free :) . It's a v1-10-test only feature and can be changed any time :)

That said, a lot of what breeze does is around mouting files, taking care of rebuilds, cachine, and the like - none of which is strictly necessary or needed when running and installing releases from PyPi. For users wanting to run Airflow a much simpler solution is a single Docker compose file -- all we have to worry about mounting for them is AIRFLOW_HOME folder. Something like https://github.com/puckel/docker-airflow/blob/master/docker-compose-LocalExecutor.yml

Absolutely - there is an Umbrella issue for that: #8548 which links to some good examples that users contributed and it would be great if that one gets implemented. There are few people who volunteered for it at different times. This docker-compose should be both development and production-friendly. However, I believe it is more important for production deployments especially if the compose specification https://www.docker.com/blog/announcing-the-compose-specification/ will get a "released" stage.
It's based on docker-compose to start with, but it will become an open standard (docker-compose is not).

For development environments, I think it is much more important to have devcontainer.json integration https://code.visualstudio.com/docs/remote/devcontainerjson-reference that will allow running the full development environment in GitHub once CodeSpaces go out of beta (in Q4 so quite soon). I looked at it when developing breeze and I made sure it will be very easy to integrate those two. Then for those who would not like CodeSpaces, they will continue having "agnostic" Breeze and for those who would like point-and-run in web, Codespaces will also work using the same "engine" under the hood (it's mostly about automatically building and updating the right docker image and starting integrations if needed).

@ashb
Copy link
Copy Markdown
Member

ashb commented Sep 28, 2020

https://xkcd.com/927/ (devcontainer.json doesn't help for Pycharm or other IDEs.)

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Sep 28, 2020

https://xkcd.com/927/ (devcontainer.json doesn't help for Pycharm or other IDEs.)

There are already signs and rumours that IntelliJ works on their own integration with cloud dev machines based on similar principles. I'd love to also integrate Breeze with it when it comes out. I think they will have no choice when CodeSpaces go out of the beta. Everyone who I talked to who used CodeSpaces loves the simplicity it brings in being able to work from anywhere without setting up anything. I think 2021 will be when we switch to cloud development for non-power users (i.e. casual contributors).

Hopefully, those tools will converge eventually.

I see Breeze as a common denominator that will be used by those who would not want to be tied with particular IDEs and a helper tool to automate tedious tasks (thanks to it I can test 5-10 different combinations of RC with different rbac/norbac/database/python versions in a matter of minutes (and with start-airflow command it will be much faster).

The "not advertising it outside of dev list" answers my question, yes

Do you have anything against if I add it to the docs of Airflow at some point in time that Breeeze is one of the ways of testing DAGs locally? We already have it, it's just a matter to point to the right (released) sources and docs.

@ashb
Copy link
Copy Markdown
Member

ashb commented Sep 28, 2020

I'm wondering if some form of "airflow cookiecutter template" might be the way to solve most of these? We could create docker-compose, devcontainer.json etc.

Do you have anything against if I add it to the docs of Airflow at some point in time that Breeeze is one of the ways of testing DAGs locally? We already have it, it's just a matter to point to the right (released) sources and docs.

Nope, no problem, assuming breeze is obtained from a released source in some way, and defaults to that released version.

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Sep 28, 2020

I'm wondering if some form of "airflow cookiecutter template" might be the way to solve most of these? We could create docker-compose, devcontainer.json etc.

Depends what form the Itelij one brings and where compose specification goes and whether they will diverge. There are certain areas (precisely the ones that breeze solves - sourcem file mounting, caching images for rebuilds and rebuilding images in general) that are specific to a development environment not at all addressed by docker-compose yet. And trying it to fit is not straightforward - docker compose has inherent limitations here as it was not created as development-friendly tool). This is the main reason why breeze is there at all. I don't think it makes sense to try to build this kind of cookie-cutter template.

Actually since you already mentioned it, this would be perfect manifestation of https://xkcd.com/927/

RaviTezu pushed a commit to RaviTezu/airflow that referenced this pull request Oct 25, 2020
kaxil pushed a commit that referenced this pull request Nov 12, 2020
@potiuk potiuk added the type:misc/internal Changelog: Misc changes that should appear in change log label Nov 14, 2020
potiuk pushed a commit that referenced this pull request Nov 16, 2020
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools type:misc/internal Changelog: Misc changes that should appear in change log

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants