-
Notifications
You must be signed in to change notification settings - Fork 16.3k
[AIRFLOW-571] Airflow CLI: add gunicorn_config param and refactor webserver cli function #4174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4174 +/- ##
=========================================
+ Coverage 77.7% 77.7% +<.01%
=========================================
Files 199 199
Lines 16309 16319 +10
=========================================
+ Hits 12673 12681 +8
- Misses 3636 3638 +2
Continue to review full report at Codecov.
|
|
I would probably take a look at https://docs.python.org/2/library/argparse.html#partial-parsing instead of using it as a single string:
|
|
@ashb, I understand about what you talk and it's good way, but, it will side all airflow cli. I think we need more opinions about it. If we will agree about parse_known_args() - I will change implementation to use it. But, I think need save config option, to get possible define such webserver run args in config |
|
@ashb, I check what need to do if we want to use a solution, so, need to modify this - https://github.com/apache/incubator-airflow/blob/master/airflow/bin/airflow#L31 and for commans
so, for correct work, because all cli wait for Namespace object, not tuple, I need to add: this way all works and user don't need to use -gc='' in cli. He can just put all args at the end of line, and all other code will be without changes @ashb is it okay? |
|
Oh - Hmmm. Hmmmm I say! |
|
How much time do you have to put in to this? One possible option would be to switch from ArgParse to https://click.palletsprojects.com/en/6.x/ which is already a dep (although only a dev-time one). But that is a whole-other chunk of work, and should probably be done separately to adding gunicorn config opts. |
|
@ashb to add changes what I describe upper with parse_known_args() - 5 min ) I can do it right now. About click - I ready to take this task )) I will be glad to refactor cli and add tests, but it's a huge task, so I prefer to move it in a separate ticket - and I can take it after this task. I believe what anyway refactor to cli need to start from some kind poc for one command (of course without merging, just for discussion) how will be it done for all commands and how it will be covered with tests. |
|
Cool - if you can make The click refactor isn't required, but would probably make the CLI code a bit nicer. Worth doing a PoC for one or two commands and then having a look before porting all of the commands. |
|
@ashb, sounds good, let's do it this way. I made changes and checked by hand, seems all ok, but need to wait for Travis. |
airflow/bin/airflow
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to use __setattr__ here? would args[0].gunicorn_config = args[1] not do the same, or is the Namespace object being difficult on us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashb, you right, it's not needed - works fine without it. Changed.
|
@ashb , can you look whats going on with Travis or ping somebody? One test fails on several PRs - [fail] 0.00% tests.contrib.hooks.test_redis_hook.TestRedisHook.test_get_conn: 0.0227s same issue here - https://travis-ci.org/apache/incubator-airflow/jobs/455404767 (pr with doc's changes) , here also this test one of failed https://travis-ci.org/apache/incubator-airflow/builds/455525726?utm_source=github_status&utm_medium=notification |
|
redis-py 3.0.0 just released a new version and it broke some things :( In this particular case it just broke the tests, but not anything beyond that. But it might :( |
|
@ashb, for now, maybe set up dependency with https://www.python.org/dev/peps/pep-0440/#compatible-release '~=' to fix it quick and then investigate what's new with 3.0.0? Can I help with it? |
|
Please! See also celery/kombu#946 |
|
@ashb , I added version pin, wait for Travis now |
|
If it works that should probably be a separate PR - there's alredy a JIRA for that opened today. |
|
@ashb, test_get_conn (tests.contrib.hooks.test_redis_hook.TestRedisHook) ... passed in current run, seems it works ) |
|
@ashb, all passed |
… possible to test it
|
Oh, I just found this in the Gunicorn docs http://docs.gunicorn.org/en/stable/settings.html#settings
Do we need special code in Airflow to handle this, or is it worth just adding someething to our docs to mention this? |
|
If we do have this, I was expecting the usage to be like: airflow webserver --do-handshake-on-connect=true --graceful-timeout 60 -w 2I.e. being able to freely mix our args with Gunicorn's directly. Is this how your PR makes it work or not? |
|
@ashb, I know about this feature, it works from env variable by default. We don't need to do something for the support it, but a user should set up env variables, what not always acceptable |
|
@ashb but you need to use --do-handshake-on-connect without =true, because if you use this flag it's == true - http://docs.gunicorn.org/en/stable/settings.html#do-handshake-on-connect |
|
@ashb, all tests are passed, need I to add more tests? |
|
@ashb, any concerns/decisions about PR? |
|
@kaxil, @criccomini, @davydov, hi guys! maybe somebody else can also review this PR? it's covered 3 tasks: https://issues.apache.org/jira/browse/AIRFLOW-571 (main), subsets: https://issues.apache.org/jira/browse/AIRFLOW-1592, https://issues.apache.org/jira/browse/AIRFLOW-1822 thank you in advance! |
|
Sorry - got way-laid by the release management of 1.10.1. I'll try and take another look at this tomorrow or Thursday |
|
@ashb, ? :) |
| parser = CLIFactory.get_parser() | ||
| args = parser.parse_args() | ||
| args.func(args) | ||
| args = parser.parse_known_args() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm.
So the problem with doing this is that now all arguments will accept and silently ignore unknown args, which is not great user behaviour.
Before:
airflow version --flibble --flux
[2018-12-10 08:49:38,495] {__init__.py:51} INFO - Using executor SequentialExecutor
usage: airflow [-h]
{kerberos,run,worker,task_failed_deps,scheduler,users,resetdb,upgradedb,trigger_dag,flower,unpause,webserver,variables,serve_logs,list_tasks,list_dags,delete_dag,pause,pool,version,render,sync_perm,test,dag_state,initdb,clear,backfill,connections,list_dag_runs,next_execution,task_state}
...
airflow: error: unrecognized arguments: --flibble --flux
After:
airflow version --flibble --flux
[2018-12-10 08:49:42,827] {__init__.py:51} INFO - Using executor SequentialExecutor
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
v2.0.0.dev0+incubating
| if isinstance(self.args.gunicorn_config, list): | ||
| self.args.gunicorn_config = self.args.gunicorn_config[0] | ||
| for arg in self.args.gunicorn_config.split(): | ||
| run_args.append(arg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used as:
airflow webserver '--gunicorn-opt-a --gunicorn-opt-b'
If so why not just take the list directly, make it multiple args not a single one, remove the need for lines 955-958
| run_args.append(arg) | |
| run_args += self.args.gunicorn_args |
| default=conf.get('webserver', 'ERROR_LOGFILE'), | ||
| help="The logfile to store the webserver error log. Use '-' to print to " | ||
| "stderr."), | ||
| 'gunicorn_config': Arg( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this and parse_unknown_args - just one is sufficient.
|
Why did this get closed? We are having difficulty running Airflow with HTTPS behind a load balancer, and may need to pass configuration to Gunicorn. Is there a different way, such as environment vars mentioned above? |
|
@brylie , hi! I closed it because I have no time to continue work on it. If you want - you can pick it up freely. |
|
Thanks @xnuinside. I think this should be supported by Airflow, because it opens up possibilities like running Airflow behind a load balancer. We ended up putting nginx in front of gunicorn to redirect HTTP to HTTPS, since our load balancer is handling SSL. |
Make sure you have checked all steps below.
Jira
Description
I added gunicorn_config (-gc) param for get possible send all params what available in gunicorn server without adding support of them as separate args in Airflow. I also refactored webserver function with saving full backward compatibility. It's necessary to get possible to add tests.
I remove lines with getting vars from config (https://github.com/apache/incubator-airflow/pull/4174/files#diff-1c2404a3a60f829127232842250ff406L839), because it's already done one step before - in defaults https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L1745
Tests
Commits
Documentation
Code Quality
flake8