Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-2840] - add update connections cli option #3684

Closed
wants to merge 1 commit into from
Closed

[AIRFLOW-2840] - add update connections cli option #3684

wants to merge 1 commit into from

Conversation

caddac
Copy link
Contributor

@caddac caddac commented Aug 2, 2018

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    Adds an Update Connection CLI option. This is useful for updating connections in a CI/CD environment where deleting and adding connections could cause DAGS to fail.

Tests

  • My PR adds the following unit tests:
    My PR adds a update connection step to an existing unit test in test/core.py

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.

Code Quality

  • Passes git diff upstream/master -u -- "*.py" | flake8 --diff

@codecov-io
Copy link

codecov-io commented Aug 2, 2018

Codecov Report

Merging #3684 into master will increase coverage by 0.82%.
The diff coverage is 67.39%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #3684      +/-   ##
=========================================
+ Coverage   76.68%   77.5%   +0.82%     
=========================================
  Files         199     205       +6     
  Lines       16189   15817     -372     
=========================================
- Hits        12414   12259     -155     
+ Misses       3775    3558     -217
Impacted Files Coverage Δ
airflow/bin/cli.py 64.52% <67.39%> (-0.31%) ⬇️
airflow/operators/slack_operator.py 0% <0%> (-97.37%) ⬇️
airflow/sensors/s3_key_sensor.py 31.03% <0%> (-68.97%) ⬇️
airflow/sensors/s3_prefix_sensor.py 41.17% <0%> (-58.83%) ⬇️
airflow/utils/helpers.py 67.07% <0%> (-17.31%) ⬇️
airflow/example_dags/example_python_operator.py 78.94% <0%> (-15.79%) ⬇️
airflow/hooks/mysql_hook.py 78% <0%> (-12.39%) ⬇️
airflow/sensors/sql_sensor.py 90.47% <0%> (-9.53%) ⬇️
airflow/utils/sqlalchemy.py 73.91% <0%> (-7.52%) ⬇️
airflow/configuration.py 83.95% <0%> (-5.47%) ⬇️
... and 96 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 64b94d8...757c1b5. Read the comment docs.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connection ID is (by design) non-unique - its a mechanism of load balancing across multiple DBs for instance - you can have three conns with an ID of "mysql" and Airflow will pick a random one each time.

Can you explicitly handle/detect this case (probably by refusing to update any of them?) and add tests that cover it too? Given it's relatively infrequent to do this I think it is okay to just refuse to update any of the conns in this case.

@caddac
Copy link
Contributor Author

caddac commented Aug 4, 2018

Would updating all of the connections be acceptable? Maybe with a flag to indicate this behavior? Or is refusing to update preferable?

@caddac
Copy link
Contributor Author

caddac commented Aug 4, 2018

I've also noticed the cli option for adding connections doesn't support adding connections with duplicate conn_id, should it?

@bolkedebruin
Copy link
Contributor

bolkedebruin commented Aug 5, 2018

Please move the core of the code out of the cli (I.e. split between api/Commons and changes to the connection model) and make sure it works through the REST API as well. Direct access to the database should be limited in the cli as we will move away from this in the future.

@bolkedebruin
Copy link
Contributor

O sorry @ashb I think I accidentally removed the label

@ashb
Copy link
Member

ashb commented Aug 5, 2018

@bolkedeburin no worries on the label, I was just using them as a way of tracking in the PRs page which ones we've looked at - entirely manual for now.

I think the default should refuse to update multiple, with an --update-all flag

@caddac
Copy link
Contributor Author

caddac commented Aug 12, 2018

@ashb @bolkedebruin I've been working on refactoring the connection api and wanted to run the implementation by you before going too far. My thought is in api/common/experimental/ there will be 4 files: add_connection.py, list_connection.py, delete_connection.py, update_connection.py where the DB access will occur. These functions will take explicitly defined arguments based on what they are doing (not dicts or something) and return Connections (or in the case of delete, conn_id(s)).

Then each client (cli, local, json) would be responsible for parsing arguments and constructing meaningful print/return statements to the user (e.g. the cli client would formulate the message Successfully added conn_id={conn_id} : {uri} where as the json client would just return http 200 and the connection as json).

thoughts?

@ashb
Copy link
Member

ashb commented Aug 14, 2018

Something like that sounds sensible - removing duplication is good. Don't forget about the Flask web UI(s - right now we still have two)

@bolkedebruin
Copy link
Contributor

bolkedebruin commented Aug 14, 2018

I wouldn't put the CRUD operations in separate files. Just use api/common/experimental/connections.py for simplicity

@caddac
Copy link
Contributor Author

caddac commented Oct 30, 2018

Can you give some feedback on the json_api? I'm not really sure what I should be doing there or how to test it. Thanks!

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't had time to give this a full re-review yet, sorry.

airflow/api/client/local_client.py Show resolved Hide resolved
@ron819
Copy link
Contributor

ron819 commented Nov 3, 2018

Regarding @ashb first comment from Aug about multiple connections. There is also a bug to consider: https://issues.apache.org/jira/browse/AIRFLOW-2784 Your tests can't assume that all conn_id with same name are for the same conn type until this bug is fixed.

@ashb ashb mentioned this pull request Dec 1, 2018
6 tasks
@caddac
Copy link
Contributor Author

caddac commented Dec 2, 2018

@ron819 I'm going to address this bug in my PR.

@ashb I'm still looking for some clarity on what is going on with the json_client. Even looking at the pool examples that currently exist there, I'm still not sure what exactly I need to do. Seems like some kind of http handling, but I assumed that was being handled by the www and www_rbac code. Any guidance you can give would be great.

@caddac
Copy link
Contributor Author

caddac commented Dec 3, 2018

@ashb Think I got the json_client figured out, will try to get my changes committed tonight.

@ashb
Copy link
Member

ashb commented Dec 4, 2018

Great! Sorry I didn’t respond - paid work has been busy :)

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small changes/comments. Looking good overall.

One thing that wasn't clear from a read of the code: does the API return the passwords on list or not? (I don't think it should, as right now none of the current methods of viewing connection in the CLI or WebUI give the value of the password)

return
conns = api_client.list_connections()
# format it for the cli
conns = list(map(lambda c: (c['conn_id'], c['conn_type'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point: we don't need the list() here

try:
print(api_client.delete_connection(args.conn_id, args.delete_all))
except MissingArgument as ma:
print(ma)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should exit with a Non-zero code here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is raising the exception good enough? or is there a specific way to exit with non-zero code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising the ma exception would work but would provide a messy output (i.e. it would include the stack trace which is not useful here):

https://github.com/apache/airflow/blob/9f3eae5e82aef4550027e5c3163715dad67bdb57/airflow/bin/cli.py#L1295-L1296

        except MissingArgument as ma:
            raise SystemExit(ma)

Should probably do the trick (and not include any errant stack-traces. If that still includes a stack trace, or errors then SystemExit(str(ma)))

'--add flag and --conn_uri flag: {invalid!r}\n')
msg = msg.format(invalid=invalid_args)
try:
new_conn = api_client.add_connection(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be easier to do new_conn = api_client.add_connection(vars(args))? (that does rely on the names from argparser matching the named args in the api client, but they do right now)

Not sure on this one.

conn_extra=args.conn_extra)

except MissingArgument as ma:
print(ma)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-zero exit code here too

tests/api/common/experimental/connection_tests.py Outdated Show resolved Hide resolved
tests/api/common/experimental/connection_tests.py Outdated Show resolved Hide resolved
password="airflow",
schema="airflow",
port="5432"
))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look it appears that all the test cases use this connection - could we create it once in setUp() to reduce the amount of code in each test function?

tests/core.py Show resolved Hide resolved

session.query(models.Pool).delete()
session.query(models.Variable).delete()
session.query(models.Connection).delete()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing your test failures:

AirflowException: The conn_id sqlite_default isn't defined

from elsewhere in the tests.

Adds an Update Connection CLI option.\nThis is useful for updating connections in a CI/CD environment\n where deleting and adding connections\n could cause DAGS to fail. \nAlso exposes connection api via rest
@caddac
Copy link
Contributor Author

caddac commented Jan 12, 2019

I see 5 tests failing, 1 of which is my test_delete_connection test. This test succeeds locally (on docker) for both python 2.7 and 3.6. The other tests also fail locally for me (on docker) from master. Not sure where to go from here, any suggestions @ashb ?

"Successfully added `conn_id`=new3 : postgres://airflow:airflow@host:5432/airflow",
"Successfully added `conn_id`=new4 : postgres://airflow:airflow@host:5432/airflow",
"Successfully added `conn_id`=new5 : hive_metastore://airflow:airflow@host:9083/airflow",
"Successfully added `conn_id`=new6 : google_cloud_platform://:@:",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the "RFC 3986: Uniform Resource Identifier (URI): Generic Syntax" specification, this is not a valid address.
Schema part must be compatible with the format:

      scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

I created a issue https://issues.apache.org/jira/browse/AIRFLOW-3616 and working on PR: PolideaInternal#30
which deals with this topic.

Copy link
Member

@mik-laj mik-laj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please improve compatibility with RFC 3986 Section 3.1

:return: The updated Connection
"""
endpoint = '/api/experimental/connections/'
url = urljoin(self._api_base_url, endpoint)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done in the _request method. Can you correct this or will we do it in a separate PR?

_log.error(err)
response = jsonify(error="{}".format(err))
response.status_code = err.status_code
return response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code snippet is repeated in any method. Should we think about how to avoid repeating this piece of code? Maybe introducing a decorator?

@caddac
Copy link
Contributor Author

caddac commented Jan 26, 2019

will reopen when working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants