New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-3001] Add index 'ti_dag_date' to taskinstance #3885

Merged
merged 1 commit into from Oct 12, 2018

Conversation

Projects
None yet
5 participants
@ubermen
Copy link
Contributor

ubermen commented Sep 12, 2018

[ Description ]
There was no index composed of dag_id and execution_date. So, when scheduler finds all tis of dagrun like this "select * from task_instance where dag_id = 'some_id' and execution_date = '2018-09-01 ...'", this query will be using ti_dag_state index (I was testing it in mysql workbench. I was expecting 'ti_state_lkp' but, it was not that case). Perhaps there's no problem when range of execution_date is small (under 1000 dagrun), but I had experienced slow allocation of tis when the dag had 1000+ accumulative dagrun. So, now I was using airflow with adding new index ti_dag_date (dag_id, execution_date) on task_instance table. I have attached result of my test :)

[ Test ] I have tested using 1.10 version

  1. just running scheduler with past start_date and high concurrency. (3 years ago, 10 minute interval)
  2. scheduler may be executing backfill and "select tis" query (like below sequence)
    models.py > DAG.run
    jobs.py > BaseJob.run
    jobs.py > BackfillJob._execute
    jobs.py > BackfillJob._execute_for_run_dates
    jobs.py > BackfillJob._task_instances_for_dag_run
    models.py > DagRun.get_task_instances
    tis = session.query(TI).filter(
    TI.dag_id == self.dag_id,
    TI.execution_date == self.execution_date,
    )
  3. wait until enough dagruns will be accumlated.
    I can find that many slow query logs get to occur from mysql log file. (query like below sample)
    "select * from task_instance where dag_id = 'some_id' and execution_date = '2018-09-01 ...'"

[ASIS] current
image

[TOBE] after adding new index
image

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-3001] My Airflow PR"

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.

Code Quality

  • Passes git diff upstream/master -u -- "*.py" | flake8 --diff
@Fokko

This comment has been minimized.

Copy link
Contributor

Fokko commented Sep 12, 2018


Running flake8 on the diff in the range 1411245..33a63de (1 commit(s)):
--------------------------------------------------------------------------------
airflow/migrations/versions/bf00311e1990_add_index_to_taskinstance.py:9:2: W291 trailing whitespace
#
 ^
airflow/migrations/versions/bf00311e1990_add_index_to_taskinstance.py:11:2: W291 trailing whitespace
#
 ^
airflow/migrations/versions/bf00311e1990_add_index_to_taskinstance.py:33:1: E402 module level import not at top of file
from alembic import op
^
airflow/migrations/versions/bf00311e1990_add_index_to_taskinstance.py:34:1: E402 module level import not at top of file
import sqlalchemy as sa
^
airflow/migrations/versions/bf00311e1990_add_index_to_taskinstance.py:34:1: F401 'sqlalchemy as sa' imported but unused
import sqlalchemy as sa
^
airflow/migrations/versions/bf00311e1990_add_index_to_taskinstance.py:38:91: E501 line too long (95 > 90 characters)
    op.create_index('ti_dag_date', 'task_instance', ['dag_id', 'execution_date'], unique=False)
                                                                                          ^
ERROR: InvocationError for command '/app/scripts/ci/flake8-diff.sh' (exited with code 1)
@Fokko

This comment has been minimized.

Copy link
Contributor

Fokko commented Sep 15, 2018


Running flake8 on the diff in the range 1411245..b2a71de (2 commit(s)):
--------------------------------------------------------------------------------
airflow/migrations/versions/bf00311e1990_add_index_to_taskinstance.py:9:2: W291 trailing whitespace
#
 ^
airflow/migrations/versions/bf00311e1990_add_index_to_taskinstance.py:11:2: W291 trailing whitespace
#
 ^
airflow/migrations/versions/bf00311e1990_add_index_to_taskinstance.py:33:1: E402 module level import not at top of file
from alembic import op
@ubermen

This comment has been minimized.

Copy link
Contributor Author

ubermen commented Sep 21, 2018

I've resolved some CI issues. Please confirm or correct any problems. Thanks :)

@xnuinside

This comment has been minimized.

Copy link
Contributor

xnuinside commented Oct 2, 2018

@ubermen , you need to squash all commits in one. @Fokko , any comments?

kjh3477
[AIRFLOW-3001] Add index 'ti_dag_date' to taskinstance
[AIRFLOW-3001] Add index 'ti_dag_date' to taskinstance

[AIRFLOW-3001] Add index 'ti_dag_date' to taskinstance

[AIRFLOW-3001] Add index 'ti_dag_date' to taskinstance

@ubermen ubermen closed this Oct 5, 2018

@ubermen ubermen deleted the ubermen:master-add-index-taskinstance branch Oct 5, 2018

@ubermen ubermen restored the ubermen:master-add-index-taskinstance branch Oct 5, 2018

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Oct 5, 2018

Codecov Report

Merging #3885 into master will increase coverage by 60.27%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #3885       +/-   ##
==========================================
+ Coverage   15.22%   75.5%   +60.27%     
==========================================
  Files         199     199               
  Lines       15946   15947        +1     
==========================================
+ Hits         2428   12040     +9612     
+ Misses      13518    3907     -9611
Impacted Files Coverage Δ
airflow/models.py 91.71% <ø> (+63.94%) ⬆️
airflow/exceptions.py 100% <0%> (+2.85%) ⬆️
airflow/utils/operator_resources.py 86.95% <0%> (+4.34%) ⬆️
airflow/executors/__init__.py 55.76% <0%> (+5.76%) ⬆️
airflow/utils/decorators.py 91.66% <0%> (+14.58%) ⬆️
airflow/settings.py 81.15% <0%> (+15.21%) ⬆️
airflow/hooks/oracle_hook.py 15.47% <0%> (+15.47%) ⬆️
airflow/task/task_runner/__init__.py 63.63% <0%> (+18.18%) ⬆️
airflow/utils/db.py 33.6% <0%> (+18.4%) ⬆️
airflow/__init__.py 74.28% <0%> (+19.99%) ⬆️
... and 151 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8be322...4090057. Read the comment docs.

@ubermen ubermen reopened this Oct 5, 2018

@ubermen ubermen force-pushed the ubermen:master-add-index-taskinstance branch from 2eacb69 to 4090057 Oct 5, 2018

@ubermen

This comment has been minimized.

Copy link
Contributor Author

ubermen commented Oct 5, 2018

All commits have been squashed in one. Thanks :)

@xnuinside

This comment has been minimized.

Copy link
Contributor

xnuinside commented Oct 5, 2018

@ubermen, thank you! you can use git rebase -i HEAD~3, where 3(number of commits, could be any) , to squash ) and push --force in your old branch after that, just to the future.. to avoid open-reopen and delete branches )

@ashb, @Fokko please review if you have time

@Fokko Fokko merged commit 93406f4 into apache:master Oct 12, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@Fokko

This comment has been minimized.

Copy link
Contributor

Fokko commented Oct 12, 2018

Thanks @ubermen Looks good! 👍

wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Oct 12, 2018

wayne.morris
[AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators apache#3828
Fixed variable for deleting resources.

[AIRFLOW-XXX] Remove residual line in Changelog (apache#3814)

[AIRFLOW-2930] Fix celery excecutor scheduler crash (apache#3784)

Caused by an update in PR apache#3740.
execute_command.apply_async(args=command, ...)
-command is a list of short unicode strings and the above code pass multiple
arguments to a function defined as taking only one argument.
-command = ["airflow", "run", "dag323",...]
-args = command = ["airflow", "run", "dag323", ...]
-execute_command("airflow","run","dag3s3", ...) will be error and exit.

[AIRFLOW-2854] kubernetes_pod_operator add more configuration items (apache#3697)

* kubernetes_pod_operator add more configuration items
* fix test_kubernetes_pod_operator test_faulty_service_account failure case
* fix review comment issues
* pod_operator add hostnetwork config
* add doc example

[AIRFLOW-2994] Fix command status check in Qubole Check operator (apache#3790)

[AIRFLOW-2949] Add syntax highlight for single quote strings (apache#3795)

* AIRFLOW-2949: Add syntax highlight for single quote strings

* AIRFLOW-2949: Also updated new UI main.css

[AIRFLOW-2948] Arg check & better doc - SSHOperator & SFTPOperator (apache#3793)

There may be different combinations of arguments, and
some processings are being done 'silently', while users
may not be fully aware of them.

For example
- User only needs to provide either `ssh_hook`
  or `ssh_conn_id`, while this is not clear in doc
- if both provided, `ssh_conn_id` will be ignored.
- if `remote_host` is provided, it will replace
  the `remote_host` which wasndefined in `ssh_hook`
  or predefined in the connection of `ssh_conn_id`

These should be documented clearly to ensure it's
transparent to the users. log.info() should also be
used to remind users and provide clear logs.

In addition, add instance check for ssh_hook to ensure
it is of the correct type (SSHHook).

Tests are updated for this PR.

[AIRFLOW-XXX] Fix Broken Link in CONTRIBUTING.md

[AIRFLOW-2980] ReadTheDocs - Fix Missing API Reference

[AIRFLOW-2779] Make GHE auth third party licensed (apache#3803)

This reinstates the original license.

[AIRFLOW-XXX] Add Format to list of companies (apache#3824)

[AIRFLOW-2900] Show code for packaged DAGs (apache#3749)

[AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro (apache#3821)

[AIRFLOW-2974] Extended Databricks hook with clusters operation (apache#3817)

Add hooks for:
- cluster start,
- restart,
- terminate.
Add unit tests for the added hooks.
Add hooks for cluster start, restart and terminate.
Add unit tests for the added hooks.
Add cluster_id variable for performing cluster operation tests.

[AIRFLOW-2951] Update dag_run table end_date when state change (apache#3798)

The existing airflow only change dag_run table end_date value when
a user teminate a dag in web UI. The end_date will not be updated
if airflow detected a dag finished and updated its state.

This commit add end_date update in DagRun's set_state function to
make up tho problem mentioned above.

[AIRFLOW-2145] fix deadlock on clearing running TI (apache#3657)

a `shutdown` task is not considered be `unfinished`, so a dag run can
deadlock when all `unfinished` downstreams are all waiting on a task
that's in the `shutdown` state. fix this by considering `shutdown` to
be `unfinished`, since it's not truly a terminal state

[AIRFLOW-XXX] Fix typo in docstring of gcs_to_bq (apache#3833)

[AIRFLOW-2476] Allow tabulate up to 0.8.2 (apache#3835)

[AIRFLOW-XXX] Fix typos in faq.rst (apache#3837)

[AIRFLOW-2979] Make celery_result_backend conf Backwards compatible (apache#3832)

(apache#2806) Renamed `celery_result_backend` to `result_backend` and broke backwards compatibility.

[AIRFLOW-2866] Fix missing CSRF token head when using RBAC UI (apache#3804)

[AIRFLOW-491] Add feature to pass extra api configs to BQ Hook (apache#3733)

[AIRFLOW-3007] Update backfill example in Scheduler docs

The scheduler docs at https://airflow.apache.org/scheduler.html#backfill-and-catchup use deprecated way of passing `schedule_interval`. `schedule_interval` should be pass to DAG as a separate parameter and not as a default arg.

[AIRFLOW-3005] Replace 'Airbnb Airflow' with 'Apache Airflow' (apache#3845)

[AIRFLOW-3002] Fix variable & tests in GoogleCloudBucketHelper (apache#3843)

[AIRFLOW-2991] Log path to driver output after Dataproc job (apache#3827)

[AIRFLOW-XXX] Fix python3 and flake8 errors in dev/airflow-jira

This is a script that checks if the Jira's marked as fixed in a release
are actually merged in - getting this working is helpful to me in
preparing 1.10.1

[AIRFLOW-2883] Add import and export for pool cli using JSON

[AIRFLOW-3021] Add Censys to who uses Airflow list

> Censys
> Find and analyze every reachable server and device on the Internet
> https://censys.io/

closes AIRFLOW-3021 https://issues.apache.org/jira/browse/AIRFLOW-3021

Add Branch to Company List

[AIRFLOW-3008] Move Kubernetes example DAGs to contrib

[AIRFLOW-2997] Support cluster fields in bigquery (apache#3838)

This adds a cluster_fields argument to the bigquery hook, GCS to
bigquery operator and bigquery query operators. This field requests that
bigquery store the result of the query/load operation sorted according
to the specified fields (the order of fields given is significant).

[AIRFLOW-XXX] Redirect FAQ `airflow[crypto]` to How-to Guides.

[AIRFLOW-XXX] Remove redundant space in Kerberos (apache#3866)

[AIRFLOW-3028] Update Text & Images in Readme.md

[AIRFLOW-1917] Trim extra newline and trailing whitespace from log (apache#3862)

[AIRFLOW-2985] Operators for S3 object copying/deleting (apache#3823)

1. Copying:
Under the hood, it's `boto3.client.copy_object()`.
It can only handle the situation in which the
S3 connection used can access both source and
destination bucket/key.

2. Deleting:
2.1 Under the hood, it's `boto3.client.delete_objects()`.
It supports either deleting one single object or
multiple objects.
2.2 If users try to delete a non-existent object, the
request will still succeed, but there will be an
entry 'Errors' in the response. There may also be
other reasons which may cause similar 'Errors' (
request itself would succeed without explicit
exception). So an argument `silent_on_errors` is added
to let users decide if this sort of 'Errors' should
fail the operator.

The corresponding methods are added into S3Hook, and
these two operators are 'wrappers' of these methods.

[AIRFLOW-3030] Fix CLI docs (apache#3872)

[AIRFLOW-XXX] Update kubernetes.rst docs (apache#3875)

Update kubernetes.rst with correct KubernetesPodOperator inputs
for the volumes.

[AIRFLOW-XXX] Add Enigma to list of companies

[AIRFLOW-2965] CLI tool to show the next execution datetime

Cover different cases

- schedule_interval is "@once" or None, then following_schedule
  method would always return None
- If dag is paused, print reminder
- If latest_execution_date is not found, print warning saying
  not applicable.

[AIRFLOW-XXX] Add Bombora Inc using Airflow

[AIRFLOW-XXX] Move Dag level access control out of 1.10 section (apache#3882)

It isn't in 1.10 (and wasn't in this section when the PR was created).

[AIRFLOW-3012] Fix Bug when passing emails for SLA

[AIRFLOW-2797] Create Google Dataproc cluster with custom image (apache#3871)

[AIRFLOW-XXX] Updated README  to include CAVA

[AIRFLOW-3035] Allow custom 'job_error_states' in dataproc ops (apache#3884)

Allow caller to pass in custom list of Dataproc job states into the
DataProc*Operator classes that should result in the
_DataProcJob.raise_error() method raising an Exception.

[AIRFLOW-3034]: Readme updates : Add Slack & Twitter, remove Gitter

[AIRFLOW-3056] Add happn to Airflow user list

[AIRFLOW-3052] Add logo options to Airflow (apache#3892)

[AIRFLOW-2524] Add SageMaker Batch Inference (apache#3767)

* Fix for comments
* Fix sensor test
* Update non_terminal_states and failed_states to static variables of SageMakerHook

Add SageMaker Transform Operator & Sensor
Co-authored-by: srrajeev-aws <srrajeev@amazon.com>

[AIRFLOW-XXX] Added Jeitto as one of happy Airflow users! (apache#3902)

[AIRFLOW-XXX] Add Jeitto as one happy Airflow user!

[AIRFLOW-3044] Dataflow operators accept templated job_name param (apache#3887)

* Default value of new job_name param is templated task_id, to match the
existing behavior as much as possible.
* Change expected value in test_mlengine_operator_utils.py to match
default for new job_name param.

[AIRFLOW-2707] Validate task_log_reader on upgrade from <=1.9 (apache#3881)

We changed the default logging config and config from 1.9 to 1.10, but
anyone who upgrades and has an existing airflow.cfg won't know they need
to change this value - instead they will get nothing displayed in the UI
(ajax request fails) and see "'NoneType' object has no attribute 'read'"
in the error log.

This validates that config section at start up, and seamlessly upgrades
the old previous value.

[AIRFLOW-3025] Enable specifying dns and dns_search options for DockerOperator (apache#3860)

Enable specifying dns and dns_search options for DockerOperator

[AIRFLOW-1298] Clear UPSTREAM_FAILED using the clean cli (apache#3886)

* [AIRFLOW-1298] Fix 'clear only_failed'

* [AIRFLOW-1298] Fix 'clear only_failed'

[AIRFLOW-3059] Log how many rows are read from Postgres (apache#3905)

To know how many data is being read from Postgres, it is nice to log
this to the Airflow log.

Previously when there was no data, it would still create a single file.
This is not something that we want, and therefore we've changed this
behaviour.

Refactored the tests to make use of Postgres itself since we have it
running. This makes the tests more realistic, instead of mocking
everything.

[AIRFLOW-XXX] Fix typo in docs/timezone.rst (apache#3904)

[AIRFLOW-3068] Remove deprecated imports

[AIRFLOW-3036] Add relevant ECS options to ECS operator. (apache#3908)

The ECS operator currently supports only a subset of available options
for running ECS tasks. This patch adds all ECS options that could be
relevant to airflow; options that wouldn't make sense here, like
`count`, were skipped.

[AIRFLOW-1195] Add feature to clear tasks in Parent Dag (apache#3907)

[AIRFLOW-3073] Add note-Profiling feature not supported in new webserver (apache#3909)

Adhoc queries and Charts features are no longer supported in new
FAB-based webserver and UI. But this is not mentioned at all in the doc
"Data Profiling" (https://airflow.incubator.apache.org/profiling.html)

This commit adds a note to remind users for this.

[AIRFLOW-XXX] Fix SlackWebhookOperator docs (apache#3915)

The docs refer to `conn_id` while the actual argument is `http_conn_id`.

[AIRFLOW-1441] Fix inconsistent tutorial code (apache#2466)

[AIRFLOW-XXX] Add 90 Seconds to companies

[AIRFLOW-3096] Further reduce DaysUntilStale for probo/stale

[AIRFLOW-3072] Assign permission get_logs_with_metadata to viewer role (apache#3913)

[AIRFLOW-3090] Demote dag start/stop log messages to debug (apache#3920)

[AIRFLOW-2407] Use feature detection for reload() (apache#3298)

* [AIRFLOW-2407] Use feature detection for reload()

[Use feature detection instead of version detection](https://docs.python.org/3/howto/pyporting.html#use-feature-detection-instead-of-version-detection) is a Python porting best practice that avoids a flake8 undefined name error...

flake8 testing of https://github.com/apache/incubator-airflow on Python 3.6.3

[AIRFLOW-XXX] Fix a wrong sample bash command, a display issue & a few typos (apache#3924)

[AIRFLOW-3090] Make No tasks to consider for execution debug (apache#3923)

During normal operation, it is not necessary to see the message.  This
can only be useful when debugging an issue.

AIRFLOW-2952 Fix Kubernetes CI (apache#3922)

The current dockerised CI pipeline doesn't run minikube and the
Kubernetes integration tests. This starts a Kubernetes cluster
using minikube and runs k8s integration tests using docker-compose.

[AIRFLOW-2918] Fix Flake8 violations (apache#3931)

[AIRFLOW-3076] Remove preloading of MySQL testdata (apache#3911)

One of the things for tests is being self contained. This means that
it should not depend on anything external, such as loading data.

This PR will use the setUp and tearDown to load the data into MySQL
and remove it afterwards. This removes the actual bash mysql commands
and will make it easier to dockerize the whole testsuite in the future

[AIRFLOW-2918] Remove unused imports

[AIRFLOW-3090] Specify path of key file in log message (apache#3921)

[AIRFLOW-3067] Display www_rbac Flask flash msg properly (apache#3903)

The Flask flash messages are not displayed properly.

When we don't give a category for a flash message, defautl
value will be 'message'. In some cases, we specify 'error'
category.

Using Flask-AppBuilder, the flash message will be given
a CSS class 'alert-[category]'. But We don't have
'alert-message' or 'alert-error' in the current
'bootstrap-theme.css' file.

This makes the the flash messages in www_rbac UI come with
no background color.

This commit addresses this issue by adding 'alert-message'
(using specs of existing CSS class 'alert-info') and
'alert-error' (using specs of existing CSS class 'alert-danger')
into 'bootstrap-theme.css'.

[AIRFLOW-3109] Bugfix to allow user/op roles to clear task intance via UI by default

add show statements to hql filtering.

[AIRFLOW-3051] Change CLI to make users ops similar to connections

The ability to manipulate users from the  command line is a bit clunky.  Currently 'airflow create_user' and 'airflow delete_user' and 'airflow list_users'.  It seems that these ought to be made more like connections, so that it becomes 'airflow users list ...', 'airflow users delete ...' and 'airflow users create ...'

[AIRFLOW-3009] Import Hashable from collection.abc to fix Python 3.7 deprecation warning (apache#3849)

[AIRFLOW-3111] Fix instructions in UPDATING.md and remove comment (apache#3944)

artifacts in default_airflow.cfg

- fixed incorrect instructions in UPDATING.md regarding core.log_filename_template and elasticsearch.elasticsearch_log_id_template
- removed comments referencing "additional curly braces" from
default_airflow.cfg since they're irrelevant to the rendered airflow.cfg

[AIRFLOW-3117] Add instructions to allow GPL dependency (apache#3949)

The installation instructions failed to mention how to proceed with the GPL dependency. For those who are not concerned by GPL, it is useful to know how to proceed with GPL dependency.

[AIRFLOW-XXX] Add Square to the companies lists

[AIRFLOW-XXX] Add Fathom Health to readme

[AIRFLOW-XXX] Pin Click to 6.7 to Fix CI (apache#3962)

[AIRFLOW-XXX] Fix SlackWebhookOperator execute method comment (apache#3963)

[AIRFLOW-3100][AIRFLOW-3101] Improve docker compose local testing (apache#3933)

[AIRFLOW-3127] Fix out-dated doc for Celery SSL (apache#3967)

Now in `airflow.cfg`, for Celery-SSL, the item names are
"ssl_active", "ssl_key", "ssl_cert", and "ssl_cacert".
(since PR https://github.com/apache/incubator-airflow/pull/2806/files)

But in the documentation
https://airflow.incubator.apache.org/security.html?highlight=celery
or
https://github.com/apache/incubator-airflow/blob/master/docs/security.rst,
it's "CELERY_SSL_ACTIVE", "CELERY_SSL_KEY", "CELERY_SSL_CERT", and
"CELERY_SSL_CACERT", which is out-dated and may confuse readers.

[AIRFLOW-XXX] Fix PythonVirtualenvOperator tests (apache#3968)

The recent update to the CI image changed the default
python from python2 to python3. The PythonVirtualenvOperator
tests expected python2 as default and fail due to
serialisation errors.

[AIRFLOW-2952] Fix Kubernetes CI (apache#3957)

- Update outdated cli command to create user
- Remove `airflow/example_dags_kubernetes` as the dag already exists in `contrib/example_dags/`
- Update the path to copy K8s dags

[AIRFLOW-3104] Add .airflowignore info into doc (apache#3939)

.airflowignore is a nice feature, but it was not mentioned at all in the documentation.

[AIRFLOW-XXX] Add Delete for CLI Example in UPDATING.md

[AIRFLOW-3123] Use a stack for DAG context management (apache#3956)

[AIRFLOW-3125] Monitor Task Instances creation rates (apache#3966)

Montor Task Instances creation rates by Operator type.
These stats can provide some visibility on how much workload Airflow is
getting. They can be used for resource allocation in the long run (i.e.
to determine when we should scale up workers) and debugging in scenarios
like the creation rate of certain type of Task Instances spikes.

[AIRFLOW-3129] Backfill mysql hook unit tests. (apache#3970)

[AIRFLOW-3124] Fix RBAC webserver debug mode (apache#3958)

[AIRFLOW-XXX] Add Compass to companies list (apache#3972)

We're using Airflow at Compass now.

[AIRFLOW-XXX] Speed up DagBagTest cases (apache#3974)

I noticed that many of the tests of DagBags operate on a specific DAG
only, and don't need to load the example or test dags. By not loading
the dags we don't need to this shaves about 10-20s of test time.

[AIRFLOW-2912] Add Deploy and Delete operators for GCF (apache#3969)

Both Deploy and Delete operators interact with Google
Cloud Functions to manage functions. Both are idempotent
and make use of GcfHook - hook that encapsulates
communication with GCP over GCP API.

[AIRFLOW-1390] Update Alembic to 0.9 (apache#3935)

[AIRFLOW-2238] Update PR tool to remove outdated info (apache#3978)

[AIRFLOW-XXX] Don't spam test logs with "bad cron expression" messages (apache#3973)

We needed these test dags to check the behaviour of invalid cron
expressions, but by default we were loading them every time we create a
DagBag (which many, many tests to).

Instead we ignore these known-bad dags by default, and the test checking
those (tests/models.py:DagBagTest.test_process_file_cron_validity_check)
is already explicitly processing those DAGs directly, so it remains
tested.

[AIRFLOW-XXX] Fix undocumented params in S3_hook

Some function parameters were undocumented. Additional docstrings
were added for clarity.

[AIRFLOW-3079] Improve migration scripts to support MSSQL Server (apache#3964)

There were two problems for MSSQL.  First, 'timestamp' data type in MSSQL Server
is essentially a row-id, and not a timezone enabled date/time stamp. Second, alembic
creates invalid SQL when applying the 0/1 constraint to boolean values. MSSQL should
enforce this constraint by simply asserting a boolean value.

[AIRFLOW-XXX] Add DoorDash to README.md (apache#3980)

DoorDash uses Airflow https://softwareengineeringdaily.com/2018/09/28/doordash/

[AIRFLOW-3062] Add Qubole in integration docs (apache#3946)

[AIRFLOW-3129] Improve test coverage of airflow.models. (apache#3982)

[AIRFLOW-2574] Cope with '%' in SQLA DSN when running migrations (apache#3787)

Alembic uses a ConfigParser like Airflow does, and "%% is a special
value in there, so we need to escape it. As per the Alembic docs:

> Note that this value is passed to ConfigParser.set, which supports
> variable interpolation using pyformat (e.g. `%(some_value)s`). A raw
> percent sign not part of an interpolation symbol must therefore be
> escaped, e.g. `%%`

[AIRFLOW-3137] Make ProxyFix middleware optional. (apache#3983)

The ProxyFix middleware should only be used when airflow is running
behind a trusted proxy. This patch adds a `USE_PROXY_FIX` flag that
defaults to `False`.

[AIRFLOW-3004] Add config disabling scheduler cron (apache#3899)

[AIRFLOW-3103][AIRFLOW-3147] Update flask-appbuilder (apache#3937)

 [AIRFLOW-XXX] Fixing the issue in Documentation (apache#3998)

Fixing the operator name from DataFlowOperation  to DataFlowJavaOperator  in Documentation

[AIRFLOW-3088] Include slack-compatible emoji image

[AIRFLOW-3161] fix TaskInstance log link in RBAC UI

[AIRFLOW-3148] Remove unnecessary arg "parameters" in RedshiftToS3Transfer (apache#3995)

"Parameters" are used to help render the SQL command.
But in this operator, only "schema" and "table" are needed.
There is no SQL command to render.

By checking the code,we can also find argument
"parameters" is never really used.

(Fix a minor issue in the docstring as well)

[AIRFLOW-3159] Update GCS logging docs for latest code (apache#3952)

[AIRFLOW-XXX] Fix  airflow.models.DAG docstring mistake

Closes apache#4004 from Sambeth/sambeth

[AIRFLOW-XXX] Adding Home Depot as users of Apache airflow (apache#4013)

* Adding Home Depot as users of Apache airflow

[AIRFLOW-XXX] Added ThoughtWorks as user of Airflow in README (apache#4012)

[AIRFLOW-XXX] Added DataCamp to list of companies in README (apache#4009)

[AIRFLOW-3165] Document interpolation of '%' and warn (apache#4007)

[AIRFLOW-3099] Complete list of optional airflow.cfg sections (apache#4002)

[AIRFLOW-3162] Fix HttpHook URL parse error when port is specified (apache#4001)

[AIRFLOW-3055] add get_dataset and get_datasets_list to bigquery_hook (apache#3894)

* [AIRFLOW-3055] add get_dataset and get_datasets_list to bigquery_hook

[AIRFLOW-3141] Add missing missing sensor tests. (apache#3991)

Fixed string encoding error and updated with master.

[AIRFLOW-XXX] Fix wrong {{ next_ds }} description (apache#4017)

[AIRFLOW-XXX] Fix Typo in SFTPOperator docstring (apache#4016)

[AIRFLOW-XXX] Remove residual line in Changelog (apache#3814)

[AIRFLOW-2930] Fix celery excecutor scheduler crash (apache#3784)

Caused by an update in PR apache#3740.
execute_command.apply_async(args=command, ...)
-command is a list of short unicode strings and the above code pass multiple
arguments to a function defined as taking only one argument.
-command = ["airflow", "run", "dag323",...]
-args = command = ["airflow", "run", "dag323", ...]
-execute_command("airflow","run","dag3s3", ...) will be error and exit.

[AIRFLOW-2854] kubernetes_pod_operator add more configuration items (apache#3697)

* kubernetes_pod_operator add more configuration items
* fix test_kubernetes_pod_operator test_faulty_service_account failure case
* fix review comment issues
* pod_operator add hostnetwork config
* add doc example

[AIRFLOW-2994] Fix command status check in Qubole Check operator (apache#3790)

[AIRFLOW-2949] Add syntax highlight for single quote strings (apache#3795)

* AIRFLOW-2949: Add syntax highlight for single quote strings

* AIRFLOW-2949: Also updated new UI main.css

[AIRFLOW-2948] Arg check & better doc - SSHOperator & SFTPOperator (apache#3793)

There may be different combinations of arguments, and
some processings are being done 'silently', while users
may not be fully aware of them.

For example
- User only needs to provide either `ssh_hook`
  or `ssh_conn_id`, while this is not clear in doc
- if both provided, `ssh_conn_id` will be ignored.
- if `remote_host` is provided, it will replace
  the `remote_host` which wasndefined in `ssh_hook`
  or predefined in the connection of `ssh_conn_id`

These should be documented clearly to ensure it's
transparent to the users. log.info() should also be
used to remind users and provide clear logs.

In addition, add instance check for ssh_hook to ensure
it is of the correct type (SSHHook).

Tests are updated for this PR.

[AIRFLOW-XXX] Fix Broken Link in CONTRIBUTING.md

[AIRFLOW-2980] ReadTheDocs - Fix Missing API Reference

[AIRFLOW-2779] Make GHE auth third party licensed (apache#3803)

This reinstates the original license.

[AIRFLOW-XXX] Add Format to list of companies (apache#3824)

[AIRFLOW-2900] Show code for packaged DAGs (apache#3749)

[AIRFLOW-2974] Extended Databricks hook with clusters operation (apache#3817)

Add hooks for:
- cluster start,
- restart,
- terminate.
Add unit tests for the added hooks.
Add hooks for cluster start, restart and terminate.
Add unit tests for the added hooks.
Add cluster_id variable for performing cluster operation tests.

[AIRFLOW-2951] Update dag_run table end_date when state change (apache#3798)

The existing airflow only change dag_run table end_date value when
a user teminate a dag in web UI. The end_date will not be updated
if airflow detected a dag finished and updated its state.

This commit add end_date update in DagRun's set_state function to
make up tho problem mentioned above.

[AIRFLOW-2145] fix deadlock on clearing running TI (apache#3657)

a `shutdown` task is not considered be `unfinished`, so a dag run can
deadlock when all `unfinished` downstreams are all waiting on a task
that's in the `shutdown` state. fix this by considering `shutdown` to
be `unfinished`, since it's not truly a terminal state

[AIRFLOW-XXX] Fix typo in docstring of gcs_to_bq (apache#3833)

[AIRFLOW-2476] Allow tabulate up to 0.8.2 (apache#3835)

[AIRFLOW-XXX] Fix typos in faq.rst (apache#3837)

[AIRFLOW-2979] Make celery_result_backend conf Backwards compatible (apache#3832)

(apache#2806) Renamed `celery_result_backend` to `result_backend` and broke backwards compatibility.

[AIRFLOW-2866] Fix missing CSRF token head when using RBAC UI (apache#3804)

[AIRFLOW-3007] Update backfill example in Scheduler docs

The scheduler docs at https://airflow.apache.org/scheduler.html#backfill-and-catchup use deprecated way of passing `schedule_interval`. `schedule_interval` should be pass to DAG as a separate parameter and not as a default arg.

[AIRFLOW-3005] Replace 'Airbnb Airflow' with 'Apache Airflow' (apache#3845)

[AIRFLOW-3002] Fix variable & tests in GoogleCloudBucketHelper (apache#3843)

[AIRFLOW-2991] Log path to driver output after Dataproc job (apache#3827)

[AIRFLOW-XXX] Fix python3 and flake8 errors in dev/airflow-jira

This is a script that checks if the Jira's marked as fixed in a release
are actually merged in - getting this working is helpful to me in
preparing 1.10.1

[AIRFLOW-2883] Add import and export for pool cli using JSON

[AIRFLOW-3021] Add Censys to who uses Airflow list

> Censys
> Find and analyze every reachable server and device on the Internet
> https://censys.io/

closes AIRFLOW-3021 https://issues.apache.org/jira/browse/AIRFLOW-3021

Add Branch to Company List

[AIRFLOW-3008] Move Kubernetes example DAGs to contrib

[AIRFLOW-2997] Support cluster fields in bigquery (apache#3838)

This adds a cluster_fields argument to the bigquery hook, GCS to
bigquery operator and bigquery query operators. This field requests that
bigquery store the result of the query/load operation sorted according
to the specified fields (the order of fields given is significant).

[AIRFLOW-XXX] Redirect FAQ `airflow[crypto]` to How-to Guides.

[AIRFLOW-XXX] Remove redundant space in Kerberos (apache#3866)

[AIRFLOW-3028] Update Text & Images in Readme.md

[AIRFLOW-1917] Trim extra newline and trailing whitespace from log (apache#3862)

[AIRFLOW-2985] Operators for S3 object copying/deleting (apache#3823)

1. Copying:
Under the hood, it's `boto3.client.copy_object()`.
It can only handle the situation in which the
S3 connection used can access both source and
destination bucket/key.

2. Deleting:
2.1 Under the hood, it's `boto3.client.delete_objects()`.
It supports either deleting one single object or
multiple objects.
2.2 If users try to delete a non-existent object, the
request will still succeed, but there will be an
entry 'Errors' in the response. There may also be
other reasons which may cause similar 'Errors' (
request itself would succeed without explicit
exception). So an argument `silent_on_errors` is added
to let users decide if this sort of 'Errors' should
fail the operator.

The corresponding methods are added into S3Hook, and
these two operators are 'wrappers' of these methods.

[AIRFLOW-3030] Fix CLI docs (apache#3872)

[AIRFLOW-XXX] Update kubernetes.rst docs (apache#3875)

Update kubernetes.rst with correct KubernetesPodOperator inputs
for the volumes.

[AIRFLOW-XXX] Add Enigma to list of companies

[AIRFLOW-2965] CLI tool to show the next execution datetime

Cover different cases

- schedule_interval is "@once" or None, then following_schedule
  method would always return None
- If dag is paused, print reminder
- If latest_execution_date is not found, print warning saying
  not applicable.

[AIRFLOW-XXX] Add Bombora Inc using Airflow

[AIRFLOW-XXX] Move Dag level access control out of 1.10 section (apache#3882)

It isn't in 1.10 (and wasn't in this section when the PR was created).

[AIRFLOW-3012] Fix Bug when passing emails for SLA

[AIRFLOW-2797] Create Google Dataproc cluster with custom image (apache#3871)

[AIRFLOW-XXX] Updated README  to include CAVA

[AIRFLOW-3035] Allow custom 'job_error_states' in dataproc ops (apache#3884)

Allow caller to pass in custom list of Dataproc job states into the
DataProc*Operator classes that should result in the
_DataProcJob.raise_error() method raising an Exception.

[AIRFLOW-3034]: Readme updates : Add Slack & Twitter, remove Gitter

[AIRFLOW-3056] Add happn to Airflow user list

[AIRFLOW-3052] Add logo options to Airflow (apache#3892)

[AIRFLOW-2524] Add SageMaker Batch Inference (apache#3767)

* Fix for comments
* Fix sensor test
* Update non_terminal_states and failed_states to static variables of SageMakerHook

Add SageMaker Transform Operator & Sensor
Co-authored-by: srrajeev-aws <srrajeev@amazon.com>

[AIRFLOW-XXX] Added Jeitto as one of happy Airflow users! (apache#3902)

[AIRFLOW-XXX] Add Jeitto as one happy Airflow user!

[AIRFLOW-3044] Dataflow operators accept templated job_name param (apache#3887)

* Default value of new job_name param is templated task_id, to match the
existing behavior as much as possible.
* Change expected value in test_mlengine_operator_utils.py to match
default for new job_name param.

[AIRFLOW-2707] Validate task_log_reader on upgrade from <=1.9 (apache#3881)

We changed the default logging config and config from 1.9 to 1.10, but
anyone who upgrades and has an existing airflow.cfg won't know they need
to change this value - instead they will get nothing displayed in the UI
(ajax request fails) and see "'NoneType' object has no attribute 'read'"
in the error log.

This validates that config section at start up, and seamlessly upgrades
the old previous value.

[AIRFLOW-3025] Enable specifying dns and dns_search options for DockerOperator (apache#3860)

Enable specifying dns and dns_search options for DockerOperator

[AIRFLOW-1298] Clear UPSTREAM_FAILED using the clean cli (apache#3886)

* [AIRFLOW-1298] Fix 'clear only_failed'

* [AIRFLOW-1298] Fix 'clear only_failed'

[AIRFLOW-3059] Log how many rows are read from Postgres (apache#3905)

To know how many data is being read from Postgres, it is nice to log
this to the Airflow log.

Previously when there was no data, it would still create a single file.
This is not something that we want, and therefore we've changed this
behaviour.

Refactored the tests to make use of Postgres itself since we have it
running. This makes the tests more realistic, instead of mocking
everything.

[AIRFLOW-XXX] Fix typo in docs/timezone.rst (apache#3904)

[AIRFLOW-3068] Remove deprecated imports

[AIRFLOW-3036] Add relevant ECS options to ECS operator. (apache#3908)

The ECS operator currently supports only a subset of available options
for running ECS tasks. This patch adds all ECS options that could be
relevant to airflow; options that wouldn't make sense here, like
`count`, were skipped.

[AIRFLOW-1195] Add feature to clear tasks in Parent Dag (apache#3907)

[AIRFLOW-3073] Add note-Profiling feature not supported in new webserver (apache#3909)

Adhoc queries and Charts features are no longer supported in new
FAB-based webserver and UI. But this is not mentioned at all in the doc
"Data Profiling" (https://airflow.incubator.apache.org/profiling.html)

This commit adds a note to remind users for this.

[AIRFLOW-XXX] Fix SlackWebhookOperator docs (apache#3915)

The docs refer to `conn_id` while the actual argument is `http_conn_id`.

[AIRFLOW-1441] Fix inconsistent tutorial code (apache#2466)

[AIRFLOW-XXX] Add 90 Seconds to companies

[AIRFLOW-3096] Further reduce DaysUntilStale for probo/stale

[AIRFLOW-3072] Assign permission get_logs_with_metadata to viewer role (apache#3913)

[AIRFLOW-3090] Demote dag start/stop log messages to debug (apache#3920)

[AIRFLOW-2407] Use feature detection for reload() (apache#3298)

* [AIRFLOW-2407] Use feature detection for reload()

[Use feature detection instead of version detection](https://docs.python.org/3/howto/pyporting.html#use-feature-detection-instead-of-version-detection) is a Python porting best practice that avoids a flake8 undefined name error...

flake8 testing of https://github.com/apache/incubator-airflow on Python 3.6.3

[AIRFLOW-XXX] Fix a wrong sample bash command, a display issue & a few typos (apache#3924)

[AIRFLOW-3090] Make No tasks to consider for execution debug (apache#3923)

During normal operation, it is not necessary to see the message.  This
can only be useful when debugging an issue.

AIRFLOW-2952 Fix Kubernetes CI (apache#3922)

The current dockerised CI pipeline doesn't run minikube and the
Kubernetes integration tests. This starts a Kubernetes cluster
using minikube and runs k8s integration tests using docker-compose.

[AIRFLOW-2918] Fix Flake8 violations (apache#3931)

[AIRFLOW-3076] Remove preloading of MySQL testdata (apache#3911)

One of the things for tests is being self contained. This means that
it should not depend on anything external, such as loading data.

This PR will use the setUp and tearDown to load the data into MySQL
and remove it afterwards. This removes the actual bash mysql commands
and will make it easier to dockerize the whole testsuite in the future

[AIRFLOW-2918] Remove unused imports

[AIRFLOW-3090] Specify path of key file in log message (apache#3921)

[AIRFLOW-3067] Display www_rbac Flask flash msg properly (apache#3903)

The Flask flash messages are not displayed properly.

When we don't give a category for a flash message, defautl
value will be 'message'. In some cases, we specify 'error'
category.

Using Flask-AppBuilder, the flash message will be given
a CSS class 'alert-[category]'. But We don't have
'alert-message' or 'alert-error' in the current
'bootstrap-theme.css' file.

This makes the the flash messages in www_rbac UI come with
no background color.

This commit addresses this issue by adding 'alert-message'
(using specs of existing CSS class 'alert-info') and
'alert-error' (using specs of existing CSS class 'alert-danger')
into 'bootstrap-theme.css'.

[AIRFLOW-3109] Bugfix to allow user/op roles to clear task intance via UI by default

add show statements to hql filtering.

[AIRFLOW-3051] Change CLI to make users ops similar to connections

The ability to manipulate users from the  command line is a bit clunky.  Currently 'airflow create_user' and 'airflow delete_user' and 'airflow list_users'.  It seems that these ought to be made more like connections, so that it becomes 'airflow users list ...', 'airflow users delete ...' and 'airflow users create ...'

[AIRFLOW-3009] Import Hashable from collection.abc to fix Python 3.7 deprecation warning (apache#3849)

[AIRFLOW-3111] Fix instructions in UPDATING.md and remove comment (apache#3944)

artifacts in default_airflow.cfg

- fixed incorrect instructions in UPDATING.md regarding core.log_filename_template and elasticsearch.elasticsearch_log_id_template
- removed comments referencing "additional curly braces" from
default_airflow.cfg since they're irrelevant to the rendered airflow.cfg

[AIRFLOW-3117] Add instructions to allow GPL dependency (apache#3949)

The installation instructions failed to mention how to proceed with the GPL dependency. For those who are not concerned by GPL, it is useful to know how to proceed with GPL dependency.

[AIRFLOW-XXX] Add Square to the companies lists

[AIRFLOW-XXX] Add Fathom Health to readme

[AIRFLOW-XXX] Pin Click to 6.7 to Fix CI (apache#3962)

[AIRFLOW-XXX] Fix SlackWebhookOperator execute method comment (apache#3963)

[AIRFLOW-3100][AIRFLOW-3101] Improve docker compose local testing (apache#3933)

[AIRFLOW-3127] Fix out-dated doc for Celery SSL (apache#3967)

Now in `airflow.cfg`, for Celery-SSL, the item names are
"ssl_active", "ssl_key", "ssl_cert", and "ssl_cacert".
(since PR https://github.com/apache/incubator-airflow/pull/2806/files)

But in the documentation
https://airflow.incubator.apache.org/security.html?highlight=celery
or
https://github.com/apache/incubator-airflow/blob/master/docs/security.rst,
it's "CELERY_SSL_ACTIVE", "CELERY_SSL_KEY", "CELERY_SSL_CERT", and
"CELERY_SSL_CACERT", which is out-dated and may confuse readers.

[AIRFLOW-XXX] Fix PythonVirtualenvOperator tests (apache#3968)

The recent update to the CI image changed the default
python from python2 to python3. The PythonVirtualenvOperator
tests expected python2 as default and fail due to
serialisation errors.

[AIRFLOW-2952] Fix Kubernetes CI (apache#3957)

- Update outdated cli command to create user
- Remove `airflow/example_dags_kubernetes` as the dag already exists in `contrib/example_dags/`
- Update the path to copy K8s dags

[AIRFLOW-3104] Add .airflowignore info into doc (apache#3939)

.airflowignore is a nice feature, but it was not mentioned at all in the documentation.

[AIRFLOW-XXX] Add Delete for CLI Example in UPDATING.md

[AIRFLOW-3123] Use a stack for DAG context management (apache#3956)

[AIRFLOW-3125] Monitor Task Instances creation rates (apache#3966)

Montor Task Instances creation rates by Operator type.
These stats can provide some visibility on how much workload Airflow is
getting. They can be used for resource allocation in the long run (i.e.
to determine when we should scale up workers) and debugging in scenarios
like the creation rate of certain type of Task Instances spikes.

[AIRFLOW-3129] Backfill mysql hook unit tests. (apache#3970)

[AIRFLOW-3124] Fix RBAC webserver debug mode (apache#3958)

[AIRFLOW-XXX] Add Compass to companies list (apache#3972)

We're using Airflow at Compass now.

[AIRFLOW-XXX] Speed up DagBagTest cases (apache#3974)

I noticed that many of the tests of DagBags operate on a specific DAG
only, and don't need to load the example or test dags. By not loading
the dags we don't need to this shaves about 10-20s of test time.

[AIRFLOW-2912] Add Deploy and Delete operators for GCF (apache#3969)

Both Deploy and Delete operators interact with Google
Cloud Functions to manage functions. Both are idempotent
and make use of GcfHook - hook that encapsulates
communication with GCP over GCP API.

[AIRFLOW-1390] Update Alembic to 0.9 (apache#3935)

[AIRFLOW-2238] Update PR tool to remove outdated info (apache#3978)

[AIRFLOW-XXX] Don't spam test logs with "bad cron expression" messages (apache#3973)

We needed these test dags to check the behaviour of invalid cron
expressions, but by default we were loading them every time we create a
DagBag (which many, many tests to).

Instead we ignore these known-bad dags by default, and the test checking
those (tests/models.py:DagBagTest.test_process_file_cron_validity_check)
is already explicitly processing those DAGs directly, so it remains
tested.

[AIRFLOW-XXX] Fix undocumented params in S3_hook

Some function parameters were undocumented. Additional docstrings
were added for clarity.

[AIRFLOW-3079] Improve migration scripts to support MSSQL Server (apache#3964)

There were two problems for MSSQL.  First, 'timestamp' data type in MSSQL Server
is essentially a row-id, and not a timezone enabled date/time stamp. Second, alembic
creates invalid SQL when applying the 0/1 constraint to boolean values. MSSQL should
enforce this constraint by simply asserting a boolean value.

[AIRFLOW-XXX] Add DoorDash to README.md (apache#3980)

DoorDash uses Airflow https://softwareengineeringdaily.com/2018/09/28/doordash/

[AIRFLOW-3062] Add Qubole in integration docs (apache#3946)

[AIRFLOW-3129] Improve test coverage of airflow.models. (apache#3982)

[AIRFLOW-2574] Cope with '%' in SQLA DSN when running migrations (apache#3787)

Alembic uses a ConfigParser like Airflow does, and "%% is a special
value in there, so we need to escape it. As per the Alembic docs:

> Note that this value is passed to ConfigParser.set, which supports
> variable interpolation using pyformat (e.g. `%(some_value)s`). A raw
> percent sign not part of an interpolation symbol must therefore be
> escaped, e.g. `%%`

[AIRFLOW-3137] Make ProxyFix middleware optional. (apache#3983)

The ProxyFix middleware should only be used when airflow is running
behind a trusted proxy. This patch adds a `USE_PROXY_FIX` flag that
defaults to `False`.

[AIRFLOW-3004] Add config disabling scheduler cron (apache#3899)

[AIRFLOW-3103][AIRFLOW-3147] Update flask-appbuilder (apache#3937)

 [AIRFLOW-XXX] Fixing the issue in Documentation (apache#3998)

Fixing the operator name from DataFlowOperation  to DataFlowJavaOperator  in Documentation

[AIRFLOW-3088] Include slack-compatible emoji image

[AIRFLOW-3161] fix TaskInstance log link in RBAC UI

[AIRFLOW-3148] Remove unnecessary arg "parameters" in RedshiftToS3Transfer (apache#3995)

"Parameters" are used to help render the SQL command.
But in this operator, only "schema" and "table" are needed.
There is no SQL command to render.

By checking the code,we can also find argument
"parameters" is never really used.

(Fix a minor issue in the docstring as well)

[AIRFLOW-3159] Update GCS logging docs for latest code (apache#3952)

[AIRFLOW-XXX] Fix  airflow.models.DAG docstring mistake

Closes apache#4004 from Sambeth/sambeth

[AIRFLOW-XXX] Adding Home Depot as users of Apache airflow (apache#4013)

* Adding Home Depot as users of Apache airflow

[AIRFLOW-XXX] Added ThoughtWorks as user of Airflow in README (apache#4012)

[AIRFLOW-XXX] Added DataCamp to list of companies in README (apache#4009)

[AIRFLOW-3165] Document interpolation of '%' and warn (apache#4007)

[AIRFLOW-3099] Complete list of optional airflow.cfg sections (apache#4002)

[AIRFLOW-3162] Fix HttpHook URL parse error when port is specified (apache#4001)

[AIRFLOW-3055] add get_dataset and get_datasets_list to bigquery_hook (apache#3894)

* [AIRFLOW-3055] add get_dataset and get_datasets_list to bigquery_hook

[AIRFLOW-3141] Add missing missing sensor tests. (apache#3991)

[AIRFLOW-XXX] Fix wrong {{ next_ds }} description (apache#4017)

[AIRFLOW-XXX] Fix Typo in SFTPOperator docstring (apache#4016)

Addressed changes from comments made in the PR.

[AIRFLOW-3139] include parameters into log.info in SQL operators, if any (apache#3986)

For all SQL-operators based on DbApiHook, sql command itself is printed
into log.info. But if parameters are used for the sql command, the
parameters would not be included in the printing. This makes the log
less useful.

This commit ensures that the parameters are also printed into the
log.info, if any.

[AIRFLOW-XXX] Include Danamica in list of companies using Airflow (apache#4019)

[AIRFLOW-XXX] Update manage-connections.rst (apache#4020)

Explain how to connect with MySQL

[AIRFLOW-XXX] Add CarLabs to companies list (apache#4021)

[AIRFLOW-3175] Fix docstring format in airflow/jobs.py (apache#4025)

These docstrings could not parsed properly in Sphinx syntax

[AIRFLOW-3086] Add extras group for google auth to setup.py. (apache#3917)

To clarify installation instructions for the google auth backend, add an
install group to `setup.py` that installs dependencies google auth via
`pip install apache-airflow[google_auth]`.

[AIRFLOW-XXX] Include Pagar.me in list of users of Airflow (apache#4026)

[AIRFLOW-3173] Add _cmd options for password config options (apache#4024)

There were a few more "password" config options added over the last few
months that didn't have _cmd options. Any config option that is a
password should be able to be provided via a _cmd version.

[AIRFLOW-3078] Basic operators for Google Compute Engine (apache#4022)

Add GceInstanceStartOperator, GceInstanceStopOperator and GceSetMachineTypeOperator.

Each operator includes:
- core logic
- input params validation
- unit tests
- presence in the example DAG
- docstrings
- How-to and Integration documentation

Additionally, in GceHook error checking if response is 200 OK was added:

Some types of errors are only visible in the response's "error" field
and the overall HTTP response is 200 OK.

That is why apart from checking if status is "done" we also check
if "error" is empty, and if not an exception is raised with error
message extracted from the "error" field of the response.

In this commit we also separated out Body Field Validator to
separate module in tools - this way it can be reused between
various GCP operators, it has proven to be usable in at least
two of them now.

Co-authored-by: sprzedwojski <szymon.przedwojski@polidea.com>
Co-authored-by: potiuk <jarek.potiuk@polidea.com>

[AIRFLOW-3168] More resillient database use in CI (apache#4014)

Make sure mysql is available before calling it in CI

[AIRFLOW-3177] Change scheduler_heartbeat from gauge to counter (apache#4027)

This updates the scheduler_heartbeat metric from a gauge to a counter to
better support the statsd_exporter for usage with Prometheus. A counter
allows users to track the rate of the heartbeat, and integrates with the
exporter better. A crashing or down scheduler will no longer emit the
metric, but the statsd_exporter will continue to show a 1 for the metric
value. This fixes that issue because a counter will continually change,
and the lack of change indicates an issue with the scheduler.

Add statsd change notice in UPDATING.md

[AIRFLOW-2956] Add kubernetes tolerations (apache#3806)

[AIRFLOW-3183] Fix bug in DagFileProcessorManager.max_runs_reached() (apache#4031)

The condition is intended to ensure the function
will return False if any file's run_count is still smaller
than max_run. But the operator used here is "!=".
Instead, it should be "<".

This is because in DagFileProcessorManager,
there is no statement helping limit the upper
limit of run_count. It's possible that
files' run_count will be bigger than max_run.
In such case, max_runs_reached() method
may fail its purpose.

[AIRFLOW-3099] Don't ever warn about missing sections of config (apache#4028)

Rather than looping through and setting each config variable
individually, and having to know which sections are optional and which
aren't, instead we can just call a single function on ConfigParser and
it will read the config from the dict, and more importantly here, never
error about missing sections - it will just create them as needed.

[AIRFLOW-1837] Respect task start_date when different from dag's (apache#4010)

Currently task instances get created and scheduled based on the DAG's
start date rather than their own.  This commit adds a check before
creating a task instance to see that the start date is not after
the execution date.

[AIRFLOW-3089] Drop hard-coded url scheme in google auth redirect. (apache#3919)

The google auth provider hard-codes the `_scheme` in the callback url to
`https` so that airflow generates correct urls when run behind a proxy
that terminates tls. But this means that google auth can't be used when
running without https--for example, during local development. Also,
hard-coding `_scheme` isn't the correct solution to the problem of
running behind a proxy. Instead, the proxy should be configured to set
the `X-Forwarded-Proto` header to `https`; Flask interprets this header
and generates the appropriate callback url without hard-coding the
scheme.

[AIRFLOW-XXX] Add Grab to companies list (apache#4041)

[AIRFLOW-3178] Handle percents signs in configs for airflow run (apache#4029)

* [AIRFLOW-3178] Don't mask defaults() function from ConfigParser

ConfigParser (the base class for AirflowConfigParser) expects defaults()
to be a function - so when we re-assign it to be a property some of the
methods from ConfigParser no longer work.

* [AIRFLOW-3178] Correctly escape percent signs when creating temp config

Otherwise we have a problem when we come to use those values.

* [AIRFLOW-3178] Use os.chmod instead of shelling out

There's no need to run another process for a built in Python function.

This also removes a possible race condition that would make temporary
config file be readable by more than the airflow or run-as user
The exact behaviour would depend on the umask we run under, and the
primary group of our user, likely this would mean the file was readably
by members of the airflow group (which in most cases would be just the
airflow user). To remove any such possibility we chmod the file
before we write to it

[AIRFLOW-2216] Use profile for AWS hook if S3 config file provided in aws_default connection extra parameters (apache#4011)

Use profile for AWS hook if S3 config file provided in
aws_default connection extra parameters
Add test to validate profile set

[AIRFLOW-3001] Add index 'ti_dag_date' to taskinstance (apache#3885)

To optimize query performance

[AIRFLOW-2794] Add WasbDeleteBlobOperator (apache#3961)

Deleting Azure blob is now supported. Either single blobs can be
deleted, or one can choose to supply a prefix, in which case one
can match multiple blobs to be deleted.

[AIRFLOW-3138] Use current data type for migrations (apache#3985)

* Use timestamp instead of timestamp with timezone for migration.

[AIRFLOW-393] Add callback for FTP downloads (apache#2372)

[AIRFLOW-3119] Enable debugging with Celery(apache#3950)

This will enable --loglevel when launching a
celery worker and inherit that LOGGING_LEVEL
setting from airflow.cfg

[AIRFLOW-3112] Make SFTP hook to inherit SSH hook (apache#3945)

This is to aline the arguments of SFTP hook with SSH hook

[AIRFLOW-3195] Log query and task_id in druid-hook (apache#4018)

Log query and task_id in druid-hook

[AIRFLOW-3187] Update airflow.gif file with a slower version (apache#4033)

[AIRFLOW-2789] Create single node DataProc cluster (apache#4015)

Create single node cluster - infer from num_workers

Fokko added a commit that referenced this pull request Oct 13, 2018

sprzedwojski added a commit to PolideaInternal/airflow that referenced this pull request Oct 16, 2018

ashb added a commit to ashb/airflow that referenced this pull request Dec 15, 2018

aliceabe pushed a commit to aliceabe/incubator-airflow that referenced this pull request Jan 3, 2019

@seelmann

This comment has been minimized.

Copy link
Member

seelmann commented Jan 9, 2019

Thanks @ubermen for this fix. We experienced increasing load on our DB (Postgres 10, RDS m4.large, 1.7M rows in task_instance table) and slower task scheduling. After analysis this query was identified as cause. After creating the index load went down and tasks are scheduled fast again.

kaxil added a commit that referenced this pull request Jan 9, 2019

ashb added a commit to ashb/airflow that referenced this pull request Jan 10, 2019

cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Jan 23, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment