Skip to content

merge invalid_kubernetes_config rule and pod_template_file rule#12510

Closed
dimberman wants to merge 211 commits intov1-10-stablefrom
merge-invalid-k8s
Closed

merge invalid_kubernetes_config rule and pod_template_file rule#12510
dimberman wants to merge 211 commits intov1-10-stablefrom
merge-invalid-k8s

Conversation

@dimberman
Copy link
Contributor

@dimberman dimberman commented Nov 20, 2020

Since these two rules do very similar things, it would make sense to merge them to create less confusion for users.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

kaxil and others added 30 commits November 16, 2020 16:11
yamllint: v1.24.2 -> v1.25.0
isort: 5.5.3 -> 5.5.4
(cherry picked from commit b7183de)
The script was previously placed in scripts/ci which caused
a bit of a problem in 1-10-test branch where PRs were using
scripts/ci from the v1-10-test HEAD but they were missing
the ci script from the PR.

(cherry picked from commit 2a1be3c8408df708b675adbbc5edc734aa1dfc1c)
Adds new rule that will be run during airflow upgrade-check
and will look for old clasess and imports incompatible with
Airflow 2.0.
Github just released Github Code Scanning:
https://github.blog/2020-09-30-code-scanning-is-now-available/

(cherry picked from commit 7c66936)
It has been raised quite a few times that workflow added in forked
repositories might be pretty invasive for the forks - especially
when it comes to scheduled workflows as they might eat quota
or at least jobs for those organisations/people who fork
repositories.

This is not strictly necessary because Recently GitHub recognized this as being
a problem and introduced new rules for scheduled workflows. But for people who
are already forked, it would be nice to not run those actions. It is enough
that the CodeQL check is done when PR is opened to the "apache/airflow"
repository.

Quote from the emails received by Github (no public URL explaining it yet):

> Scheduled workflows will be disabled by default in forks of public repos and in
public repos with no activity for 60 consecutive days.  We’re making two
changes to the usage policy for GitHub Actions. These changes will enable
GitHub Actions to scale with the incredible adoption we’ve seen from the GitHub
community. Here’s a quick overview:

> * Starting today, scheduled workflows will be disabled by default in new forks of
public repositories.
> * Scheduled workflows will be disabled in public repos with
no activity for 60 consecutive days.

(cherry picked from commit 1b9e59c)
The PR builds are now better handled with regards to both
running (using merge-request) and canceling (with cancel notifications).

First of all we are using merged commit from the PR, not the original commit
from the PR.

Secondly - the workflow run notifies the original PR with comment
stating that the image is being built in a separate workflow -
including the link to that workflow.

Thirdly - when canceling duplicate PRs or PRs with failed
jobs, the workflow will add a comment to the PR stating the
reason why the PR is being cancelled.

Last but not least, we also add cancel job for the CodeQL duplicate
messages. They run for ~ 12 miinutes so it makes perfect sense to
also cancel those CodeQL jobs for which someone pushed fixups in a
quick succession.

Fixes: #10471
(cherry picked from commit a4478f5)
Replaces the annoying comments with "workflow_run" links
with Run Checks. Now we will be able to see the "Build Image"
checks in the "Checks" section including their status and direct
link to the steps running the image builds as "Details" link.

Unfortunately Github Actions do not handle well the links to
details - even if you provide details_url to link to the other
run, the "Build Image" checks appear in the original workflow,
that's why we had to introduce another link in the summary of
the Build Image check that links to the actual workflow.

(cherry picked from commit a33a919)
Github Actions deprecated the set-env action due to moderate security
vulnerability they found.

https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/

This commit replaces set-env with env file as explained in

https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-commands-for-github-actions#environment-files
(cherry picked from commit 975558b)
Co-authored-by: Ben Nadler <jbennadler@gmail.com>
When installing airflow 1.10 via breeze we now enable rbac
by default, but we can disable it with --no-rbac-ui flag.

This is useful to test different variants of 1.10 when testing
release candidataes in connection with the 'start-airflow'
command.

(cherry picked from commit 22c6a84)
Seems that the k8s cache for virtualenv got broken during the
recent problems. This commits bumps the cache version to make
it afresh

(cherry picked from commit 666e81a)
This PR needs to be merged first in order to handle the #11385
which requires .pypirc to be created before dockerfile gets build.

This means that the script change needs to be merged to master
first in this PR.

(cherry picked from commit e198077)
)

* Constraints and PIP packages can be installed from local sources

This is the final part of implementing #11171 based on feedback
from enterprise customers we worked with. They want to have
a capability of building the image using binary wheel packages
that are locally available and the official Dockerfile. This means
that besides the official APT sources the Dockerfile build should
not needd GitHub, nor any other external files pulled from outside
including PIP repository.

This change also includes documentation on how to prepare set of
such binaries ready for inspection and review by security teams
in Enterprise environment. Such sets of "known-working-binary-whl"
files can then be separately committed, tracked and scrutinized
in an artifact repository of such an Enterprise.

Fixes: #11171

* Update docs/production-deployment.rst

(cherry picked from commit 0497390)
The push and schedule builds should not be cancelled even if
they are duplicates. By seing which of the master merges
failed, we have better visibility on which merge caused
a problem and we can trace it's origin faster even if the builds
will take longer overall.

Scheduled builds also serve it's purpose and they should
be always run to completion.

(cherry picked from commit 401a579)
Wrong if query in the GitHub action caused upgrade to latest
constraints did not work for a while.

(cherry picked from commit a34f5ee)
A problem was introduced in #11397 where a bit too many "Build Image"
jobs is being cancelled by subsequent Build Image run. For now it
cancels all the Build Image jobs that are running :(.

(cherry picked from commit 076fe88)
We have started to experience "unknown_blob" errors intermittently
recently with GitHub Docker registry. We might eventually need
to migrate to GCR (which eventually is going to replace the
Docker Registry for GitHub:

The ticket is opened to the Apache Infrastructure to enable
access to the GCR and to make some statements about Access
Rights management for GCR https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20959
Also a ticket to GitHub Support has been raised about it
https://support.github.com/ticket/personal/0/861667 as we
cannot delete our public images in Docker registry.

But until this happens, the workaround might help us
to handle the situations where we got intermittent errors
while pushing to the registry. This seems to be a common
error, when NGINX proxy is used to proxy Github Registry so
it is likely that retrying will workaround the issue.

(cherry picked from commit f9dddd5)
* Add capability of customising PyPI sources

This change adds capability of customising installation of PyPI
modules via custom .pypirc file. This might allow to install
dependencies from in-house, vetted registry of PyPI

(cherry picked from commit 45d33db)
The SHA of cancel-workflow-action in #11397 was pointing to previous
(3.1) version of the action. This PR fixes it to point to the
right (3.2) version.

(cherry picked from commit 4de8f85)
* Modify helm chart to use pod_template_file

Since we are deprecating most k8sexecutor arguments
we should use the pod_template_file when launching airflow
using the KubernetesExecutor

* fix tests

* one more nit

* fix dag command

* fix pylint

(cherry picked from commit 56bd9b7)
This decreases scheduler delay between tasks by about 20% for larger DAGs,
sometimes more for larger or more complex DAGs.

The delay between tasks can be a major issue, especially when we have dags with
many subdags, figures out that the scheduling process spends plenty of time in
dependency checking, we took the trigger rule dependency which calls the db for
each task instance, we made it call the db just once for each dag_run

(cherry picked from commit 50efda5)
If you used context from git repo, the .piprc file was missing and
COPY in Dockerfile is not conditional.

This change copies the .pypirc conditionally from the
docker-context-files folder instead.

Also it was needlessly copied in the main image where it is not
needed and it was even dangerous to do so.

(cherry picked from commit 53e5d8f)
rchojn and others added 21 commits November 16, 2020 16:15
In secured cluster there is a need to run this job with specific user id

(cherry picked from commit 1266b29)
So that it can pull the specified image from a private registry.

(cherry picked from commit 473f506)
Co-authored-by: Flavien Dereume-Hancart <flavien@LL-PC0BE1K9-1.goiba.net>
(cherry picked from commit 5f403a8)
Co-authored-by: Kamil Olszewski <kamil.olszewski@polidea.com>
(cherry picked from commit 48ce4bd)
Co-authored-by: Daniel Imberman <daniel@astronomer.io>
(cherry picked from commit 0faa06e)
For Kubernetes tests all tests can be executed in the same python
version - default one - no matter which PYTHON_MAJOR_MINOR is
used. This is because we are testing Airflow which is deployed
via production image. Thanks to that we can fix the python version
to be default and avoid any python version problems (this is
especially important for cherry-picking to 1.10 where we have
python 2.7 and 3.5.

(cherry-picked from commit cbd6daf)
When building tagged image on DockerHub the build has been
failing as it was trying to pull cached version of prod image
but the tagged image should be built from scratch so cache should
be disabled.

Fixes #12263

(cherry picked from commit 0038660)
We've observed the tests for last couple of weeks and it seems
most of the tests marked with "quarantine" marker are succeeding
in a stable way (#10118)
The removed tests have success ratio of > 95% (20 runs without
problems) and this has been verified a week ago as well,
so it seems they are rather stable.

There are literally few that are either failing or causing
the Quarantined builds to hang. I manually reviewed the
master tests that failed for last few weeks and added the
tests that are causing the build to hang.

Seems that stability has improved - which might be casued
by some temporary problems when we marked the quarantined builds
or too "generous" way of marking test as quarantined, or
maybe improvement comes from the #10368 as the docker engine
and machines used to run the builds in GitHub experience far
less load (image builds are executed in separate builds) so
it might be that resource usage is decreased. Another reason
might be Github Actions stability improvements.

Or simply those tests are more stable when run isolation.

We might still add failing tests back as soon we see them behave
in a flaky way.

The remaining quarantined tests that need to be fixed:
 * test_local_run (often hangs the build)
 * test_retry_handling_job
 * test_clear_multiple_external_task_marker
 * test_should_force_kill_process
 * test_change_state_for_tis_without_dagrun
 * test_cli_webserver_background

We also move some of those tests to "heisentests" category
Those testst run fine in isolation but fail
the builds when run with all other tests:
 * TestImpersonation tests

We might find that those heisentest can be fixed but for
now we are going to run them in isolation.

Also - since those quarantined tests are failing more often
the "num runs" to track for those has been decreased to 10
to keep track of 10 last runs only.

(cherry picked from commit b746f33)
We've been mocking UUID badly and this cause tests with
new version of sentry to fail. This change fixes both - fixes the
mocking as well as removes the upper constraint on sentry.
By doing it this way we can add new features to upgrade_check, such as
the adding a flag to make changes automatically, without having to
release a new version of Airflow.
Adding a rule to check for undefined jinja variables when upgrading to Airflow2.0
When testing out the ugprade check command on 1.10 with the default
SQLite backend I got the following error:

```
[2020-11-20 12:10:28,248] {base.py:1372} ERROR - Error closing cursor
Traceback (most recent call last):
  File "/home/ash/.virtualenvs/airflow-1-10/lib/python3.7/site-packages/sqlalchemy/engine/result.py", line 1284, in fetchall
    l = self.process_rows(self._fetchall_impl())
  File "/home/ash/.virtualenvs/airflow-1-10/lib/python3.7/site-packages/sqlalchemy/engine/result.py", line 1230, in _fetchall_impl
    return self.cursor.fetchall()
sqlite3.ProgrammingError: Cannot operate on a closed database.
```

This was caused because the `@provide_session` decorator closed the
connection when the function returned, and since we were using a
generator expression, we hadn't yet fetched all the rows.

This changes it so the rows are fetched before returning.
The intent here is this content will be visible on the pypi page for
apache-airflow-upgrade-check (once it is published.)

This is far from perfect, but the _first_ step is the hardest :)
By default, the logs would appear in the middle of the status stream,
which makes it slightly harder to parse the output.

Before:

```
============================================= STATUS =============================================

Legacy UI is deprecated by default......................................................SUCCESS
Users must set a kubernetes.pod_template_file value.....................................FAIL
Changes in import paths of hooks, operators, sensors and others.........................FAIL
Remove airflow.AirflowMacroPlugin class.................................................SUCCESS
[2020-11-20 14:26:04,083] {__init__.py:50} INFO - Using executor SequentialExecutor
[2020-11-20 14:26:04,083] {dagbag.py:417} INFO - Filling up the DagBag from /home/ash/airflow/dags
Jinja Template Variables cannot be undefined............................................SUCCESS
```

After:

```
============================================= STATUS =============================================

Legacy UI is deprecated by default......................................................SUCCESS
Users must set a kubernetes.pod_template_file value.....................................FAIL
Changes in import paths of hooks, operators, sensors and others.........................FAIL
Remove airflow.AirflowMacroPlugin class.................................................SUCCESS
Jinja Template Variables cannot be undefined............................................SUCCESS
```
…11528)

* add upgrade rule to check for mesos config and flag to remove it.

* change from checking the mesos config section to core/executor config

* remove leading new line and indent in desc
(cherry picked from commit 5fac1e15c702faa9606c22390bf9b143193dfc79)
@github-actions
Copy link

The PR needs to run all tests because it modifies core of Airflow! Please rebase it to latest master or ask committer to re-run it!

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Nov 20, 2020
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@dimberman dimberman closed this Dec 14, 2020
@dimberman dimberman deleted the merge-invalid-k8s branch December 14, 2020 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.