Skip to content

Conversation

@xnuinside
Copy link
Contributor

@xnuinside xnuinside commented Nov 10, 2018

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:

I added gunicorn_config (-gc) param for get possible send all params what available in gunicorn server without adding support of them as separate args in Airflow. I also refactored webserver function with saving full backward compatibility. It's necessary to get possible to add tests.

I remove lines with getting vars from config (https://github.com/apache/incubator-airflow/pull/4174/files#diff-1c2404a3a60f829127232842250ff406L839), because it's already done one step before - in defaults https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L1745

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.

Code Quality

  • Passes flake8

@xnuinside
Copy link
Contributor Author

xnuinside commented Nov 10, 2018

@ashb, @Fokko, please review or say who can do this ) I also start adding more tests to cli and think about refactor it in more testable form, what do you think about it?

@codecov-io
Copy link

codecov-io commented Nov 11, 2018

Codecov Report

Merging #4174 into master will increase coverage by <.01%.
The diff coverage is 53.94%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #4174      +/-   ##
=========================================
+ Coverage    77.7%   77.7%   +<.01%     
=========================================
  Files         199     199              
  Lines       16309   16319      +10     
=========================================
+ Hits        12673   12681       +8     
- Misses       3636    3638       +2
Impacted Files Coverage Δ
airflow/bin/cli.py 64.77% <53.94%> (+0.18%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d5035c1...b75ec80. Read the comment docs.

@ashb
Copy link
Member

ashb commented Nov 12, 2018

I would probably take a look at https://docs.python.org/2/library/argparse.html#partial-parsing instead of using it as a single string:

Sometimes a script may only parse a few of the command-line arguments, passing the remaining arguments on to another script or program. In these cases, the parse_known_args() method can be useful. It works much like parse_args() except that it does not produce an error when extra arguments are present. Instead, it returns a two item tuple containing the populated namespace and the list of remaining argument strings.

@xnuinside
Copy link
Contributor Author

@ashb, I understand about what you talk and it's good way, but, it will side all airflow cli. I think we need more opinions about it. If we will agree about parse_known_args() - I will change implementation to use it. But, I think need save config option, to get possible define such webserver run args in config

@xnuinside
Copy link
Contributor Author

xnuinside commented Nov 15, 2018

@ashb, I check what need to do if we want to use a solution, so, need to modify this - https://github.com/apache/incubator-airflow/blob/master/airflow/bin/airflow#L31 and for commans airflow webserver -p 9090 -op papap we will get args:

(Namespace(access_logfile='-', daemon=False, debug=False, error_logfile='-', func=<function webserver at 0x10c7dd5f0>, hostname='0.0.0.0', log_file=None, pid=None, port=9090, ssl_cert=None, ssl_key=None, stderr=None, stdout=None, subcommand='webserver', worker_timeout=120, workerclass='sync', workers=4), ['-op', 'papap'])

so, for correct work, because all cli wait for Namespace object, not tuple, I need to add:

args = parser.parse_known_args()
args[0].__setattr__('gunicorn_config', args[1])
args[0].func(args[0])

this way all works and user don't need to use -gc='' in cli. He can just put all args at the end of line, and all other code will be without changes

@ashb is it okay?

@ashb
Copy link
Member

ashb commented Nov 15, 2018

Oh - parse_args is called very early on inside airflow/bin/airflow - I see.

Hmmm. Hmmmm  I say!

@ashb
Copy link
Member

ashb commented Nov 15, 2018

How much time do you have to put in to this? One possible option would be to switch from ArgParse to https://click.palletsprojects.com/en/6.x/ which is already a dep (although only a dev-time one).

But that is a whole-other chunk of work, and should probably be done separately to adding gunicorn config opts.

@xnuinside
Copy link
Contributor Author

@ashb to add changes what I describe upper with parse_known_args() - 5 min ) I can do it right now. About click - I ready to take this task )) I will be glad to refactor cli and add tests, but it's a huge task, so I prefer to move it in a separate ticket - and I can take it after this task.

I believe what anyway refactor to cli need to start from some kind poc for one command (of course without merging, just for discussion) how will be it done for all commands and how it will be covered with tests.

@ashb
Copy link
Member

ashb commented Nov 15, 2018

Cool - if you can make parse_known_args work (and ideally have unknown args be ignored an error in other tasks still) go for that.

The click refactor isn't required, but would probably make the CLI code a bit nicer. Worth doing a PoC for one or two commands and then having a look before porting all of the commands.

@xnuinside
Copy link
Contributor Author

@ashb, sounds good, let's do it this way. I made changes and checked by hand, seems all ok, but need to wait for Travis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to use __setattr__ here? would args[0].gunicorn_config = args[1] not do the same, or is the Namespace object being difficult on us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashb, you right, it's not needed - works fine without it. Changed.

@xnuinside
Copy link
Contributor Author

xnuinside commented Nov 15, 2018

@ashb , can you look whats going on with Travis or ping somebody? One test fails on several PRs - [fail] 0.00% tests.contrib.hooks.test_redis_hook.TestRedisHook.test_get_conn: 0.0227s same issue here - https://travis-ci.org/apache/incubator-airflow/jobs/455404767 (pr with doc's changes) , here also this test one of failed https://travis-ci.org/apache/incubator-airflow/builds/455525726?utm_source=github_status&utm_medium=notification

@ashb
Copy link
Member

ashb commented Nov 15, 2018

redis-py 3.0.0 just released a new version and it broke some things :(

In this particular case it just broke the tests, but not anything beyond that. But it might :(

@xnuinside
Copy link
Contributor Author

xnuinside commented Nov 15, 2018

@ashb, for now, maybe set up dependency with https://www.python.org/dev/peps/pep-0440/#compatible-release '~=' to fix it quick and then investigate what's new with 3.0.0? Can I help with it?

@ashb
Copy link
Member

ashb commented Nov 15, 2018

Please!

See also celery/kombu#946

@xnuinside
Copy link
Contributor Author

@ashb , I added version pin, wait for Travis now

@ashb
Copy link
Member

ashb commented Nov 15, 2018

If it works that should probably be a separate PR - there's alredy a JIRA for that opened today.

@xnuinside
Copy link
Contributor Author

@ashb, test_get_conn (tests.contrib.hooks.test_redis_hook.TestRedisHook) ... passed in current run, seems it works )

@xnuinside
Copy link
Contributor Author

@ashb, all passed

@ashb
Copy link
Member

ashb commented Nov 15, 2018

Oh, I just found this in the Gunicorn docs http://docs.gunicorn.org/en/stable/settings.html#settings

Settings can be specified by using environment variable GUNICORN_CMD_ARGS. All available command line arguments can be used. For example, to specify the bind address and number of workers:

$ GUNICORN_CMD_ARGS="--bind=127.0.0.1 --workers=3" gunicorn app:app

Do we need special code in Airflow to handle this, or is it worth just adding someething to our docs to mention this?

@ashb
Copy link
Member

ashb commented Nov 15, 2018

If we do have this, I was expecting the usage to be like:

airflow webserver --do-handshake-on-connect=true --graceful-timeout 60 -w 2

I.e. being able to freely mix our args with Gunicorn's directly. Is this how your PR makes it work or not?

@xnuinside
Copy link
Contributor Author

xnuinside commented Nov 15, 2018

@ashb, I know about this feature, it works from env variable by default. We don't need to do something for the support it, but a user should set up env variables, what not always acceptable

@xnuinside
Copy link
Contributor Author

xnuinside commented Nov 15, 2018

@ashb
the second question about args - yes, you can mix it and all work ok, if you want - I can add mix args test case, it's not a big deal

but you need to use --do-handshake-on-connect without =true, because if you use this flag it's == true - http://docs.gunicorn.org/en/stable/settings.html#do-handshake-on-connect

@xnuinside
Copy link
Contributor Author

@ashb, all tests are passed, need I to add more tests?

@xnuinside
Copy link
Contributor Author

@ashb, any concerns/decisions about PR?

@xnuinside
Copy link
Contributor Author

@kaxil, @criccomini, @davydov, hi guys! maybe somebody else can also review this PR? it's covered 3 tasks: https://issues.apache.org/jira/browse/AIRFLOW-571 (main), subsets: https://issues.apache.org/jira/browse/AIRFLOW-1592, https://issues.apache.org/jira/browse/AIRFLOW-1822 thank you in advance!

@ashb
Copy link
Member

ashb commented Nov 27, 2018

Sorry - got way-laid by the release management of 1.10.1. I'll try and take another look at this tomorrow or Thursday

@xnuinside
Copy link
Contributor Author

@ashb, ? :)

parser = CLIFactory.get_parser()
args = parser.parse_args()
args.func(args)
args = parser.parse_known_args()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm.

So the problem with doing this is that now all arguments will accept and silently ignore unknown args, which is not great user behaviour.

Before:

airflow  version --flibble --flux
[2018-12-10 08:49:38,495] {__init__.py:51} INFO - Using executor SequentialExecutor
usage: airflow [-h]
               {kerberos,run,worker,task_failed_deps,scheduler,users,resetdb,upgradedb,trigger_dag,flower,unpause,webserver,variables,serve_logs,list_tasks,list_dags,delete_dag,pause,pool,version,render,sync_perm,test,dag_state,initdb,clear,backfill,connections,list_dag_runs,next_execution,task_state}
               ...
airflow: error: unrecognized arguments: --flibble --flux

After:

airflow  version --flibble --flux
[2018-12-10 08:49:42,827] {__init__.py:51} INFO - Using executor SequentialExecutor
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
   v2.0.0.dev0+incubating

if isinstance(self.args.gunicorn_config, list):
self.args.gunicorn_config = self.args.gunicorn_config[0]
for arg in self.args.gunicorn_config.split():
run_args.append(arg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used as:

airflow webserver '--gunicorn-opt-a --gunicorn-opt-b'

If so why not just take the list directly, make it multiple args not a single one, remove the need for lines 955-958

Suggested change
run_args.append(arg)
run_args += self.args.gunicorn_args

default=conf.get('webserver', 'ERROR_LOGFILE'),
help="The logfile to store the webserver error log. Use '-' to print to "
"stderr."),
'gunicorn_config': Arg(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this and parse_unknown_args - just one is sufficient.

@xnuinside xnuinside closed this Jan 25, 2019
@brylie
Copy link
Contributor

brylie commented Feb 28, 2019

Why did this get closed? We are having difficulty running Airflow with HTTPS behind a load balancer, and may need to pass configuration to Gunicorn. Is there a different way, such as environment vars mentioned above?

@xnuinside
Copy link
Contributor Author

@brylie , hi! I closed it because I have no time to continue work on it. If you want - you can pick it up freely.

@brylie
Copy link
Contributor

brylie commented Feb 28, 2019

Thanks @xnuinside.

I think this should be supported by Airflow, because it opens up possibilities like running Airflow behind a load balancer. We ended up putting nginx in front of gunicorn to redirect HTTP to HTTPS, since our load balancer is handling SSL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants