Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-3622] Add ability to pass hive_conf to HiveToMysqlTransfer #4424

Conversation

aliceabe
Copy link
Contributor

@aliceabe aliceabe commented Jan 2, 2019

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    Right now we cannot overwrite hive queue because hive_conf is not passed to HiveToMySqlTransfer

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

Code Quality

  • Passes flake8

@aliceabe aliceabe force-pushed the AIRFLOW-3622-add-ability-to-pass-hiveconf-to-hive-to-mysql-operator branch from ba870fd to 90aaa31 Compare January 2, 2019 21:40
jawang35 and others added 29 commits January 3, 2019 08:39
…che#4036)

Airflow Users that wish to create plugins for the new www_rbac UI
can not add plugin views or links. This PR fixes that by letting
a user specify their plugins for www_rbac and maintains backwards
compatibility with the existing plugins system.
…pache#4092)

Flake8 3.6.0 was just released and it introduced some new checks that
didn't exist before. As a result all of our CI pipelines are now failing.

To avoid this happening in future we should pin the version of flake8
to 3.5.0 (currently this is in tox.ini, and setup.py)
- Remove `template_ext = ('.sql',)`
- Fix Docstrings (Incorrect connection name and indentation)
apache#4038)

Once the user has installed Fernet package then the application enforces setting valid Fernet key.
This change will alter this behavior into letting empty Fernet key or special `no encryption` phrase and interpreting those two cases as no encryption desirable.
…#4110)

MySQL hook does not support "unix_socket" extra - which allows
to specify a different location of Linux socket than the default one.
This is a blocker for tools like cloud-sql-proxy that
creates sockets in an arbitrary place:
https://mysqlclient.readthedocs.io/user_guide.html
)

Full URL decoding is performed now when parsing different
components of URI for connection. This enables to configure
paths to sockets including (for example ":") - so far
only '/' (%2f) was hard-coded in hostname. This change introduces
full decoding for all components of the URI.

Note that this is potentially breaking change if someone uses
% in some of their AIRFLOW_CONN_ defined connections.
This commit adds update, replace and delete operations for the Mongo
hook.
DaskExecutor is not mentioned in docs/scheduler.rst,
while it's listed as one of the main executors in 
`airflow/config_templates/default_airflow.cfg`.
…at it does not crash (apache#3650)

If there is any issue in DB connection then rest of the functions take care of those exceptions but in heartbeat of scheduler, there is no handling for this kind of situation.

Airflow Scheduler should not crash if a "transient" DB exception occurs in the heartbeat of scheduler.
…dition (apache#3994)

 We were seeing an intermittent issue where executor reports task instance finished while task says it's in queue state, it was due to a race condition between scheduler which was clearing event_buffer in _process_executor_events method in jobs.py executor was about to put next_retry task's status as running which was failed in previous try. So, we thought to add retry_number as the member of TaskInstance key property.
Add CloudSqlInstanceInsertOperator, CloudSqlInstancePatchOperator and CloudSqlInstanceDeleteOperator.

Each operator includes:
- core logic
- input params validation
- unit tests
- presence in the example DAG
- docstrings
- How-to and Integration documentation

Additionally, small improvements to GcpBodyFieldValidator were made:
- add simple list validation capability (type="list")
- introduced parameter allow_empty, which can be set to False
	to test for non-emptiness of a string instead of specifying
	a regexp.

Co-authored-by: sprzedwojski <szymon.przedwojski@polidea.com>
Co-authored-by: potiuk <jarek.potiuk@polidea.com>
This re-works the SageMaker functionality in Airflow to be more complete, and more useful for the kinds of operations that SageMaker supports.

We removed some files and operators here, but these were only added after the last release so we don't need to worry about any sort of back-compat.
feluelle and others added 27 commits January 3, 2019 08:39
…pache#4371)

Fix TypeError on GoogleCloudStorageToS3Operator & S3ToGoogleCloudStorageOperator
* Remove DagStat usage

* Remove tests

* Remove dag_stat table from db

* Removed dagstat class

* Revert change

* Fixing test
* Refactor Kubernetes operator with git-sync

Currently the implementation of git-sync is broken because:
- git-sync clones the repository in /tmp and not in airflow-dags volume
- git-sync add a link to point to the revision required but it is not
taken into account in AIRFLOW__CORE__DAGS_FOLDER

Dags/logs hostPath volume has been added (needed if airflow run in
kubernetes in local environment)

To avoid false positive in CI `load_examples` is set to `False`
otherwise DAGs from `airflow/example_dags` are always loaded. In this
way is possible to test `import` in DAGs

Remove `worker_dags_folder` config:
`worker_dags_folder` is redundant and can lead to confusion.
In WorkerConfiguration `self.kube_config.dags_folder` defines the path of
the dags and can be set in the worker using airflow_configmap
Refactor worker_configuration.py
Use a docker container to run setup.py
Compile web assets
Fix codecov application path

* Fix kube_config.dags_in_image
- adds missing doc parameter destination_filepath
- adds missing file close for tmp file (through ContextManager Usage)
- refactoring
* Removed Dagbag from delete dag

* delete when fileloc does not exist
* Remove dagbag from trigger call

* Adding fix to rbac

* empty commit

* Added create_dagrun to DagModel

* Adding testing to /trigger calls

* Make session a class var
When running integration tests on a k8s cluster vs. Minikube
I discovered that we were actually using an invalid permission
structure for our persistent volume. This commit fixes that.
…pache#4380)

'root' is not used anywhere in `delete` method in either
www/views.py or www_rbac/views.py.

Having it in url_for("airflow.delete", dag_id=dag.dag_id, root=root)
in dag.html is meaningless.
…ce Flake8 test was broken (apache#4415)

The flake8 test in the Travis CI was broken since apache#4361
(apache@7a6acbf )

And some Flake8 errors (code style/quality issues. found in 10 files) were introduce since flake8 test was broken.
…k_stats (apache#4395)

These condition checks will always be pass no matter
whether "all_dags" is in filter_dag_ids.

They're not necessary.
To help move away from Minikube, we need to remove the dependency on
a local docker registry and move towards a solution that can be used
in any kubernetes cluster. Custom image names allow users to use
systems like docker, artifactory and gcr
- adds tests for hive_to_mysql operator
- refactoring
* Fix Type Error for BigQueryOperator and support the unicode object.
* Add tests
…ache#4247)

* Support setting global k8s affinity and toleration configuration in the airflow config file.

* Copy annotations as dict, not list

* Update airflow/contrib/kubernetes/pod.py

Co-Authored-By: kppullin <kevin.pullin@gmail.com>
@aliceabe aliceabe force-pushed the AIRFLOW-3622-add-ability-to-pass-hiveconf-to-hive-to-mysql-operator branch from 90aaa31 to e385d9c Compare January 3, 2019 16:40
@aliceabe aliceabe closed this Jan 3, 2019
@aliceabe aliceabe deleted the AIRFLOW-3622-add-ability-to-pass-hiveconf-to-hive-to-mysql-operator branch January 3, 2019 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet