Wabi uses Airflow!#14069
Closed
levihernan wants to merge 2971 commits into
Closed
Conversation
The cancel-workflow-runs action in version 4.1 got the capability of cancelling not only duplicates of own run (including future, queued duplicates) but also cancelling all duplicates from all running worklfows - regardless if they were triggered by my own PR or some other PRs. This will be even more helpful with handling the queues and optimising our builds, because in case ANY of the build image workflows starts to run, it will cancel ALL duplicates immediately. (cherry picked from commit 21350aa)
(cherry picked from commit 72f3fcb)
We had a lot of problems recently about the queues in Github Actions. This documentations explains the motivation and approach we have taken for optimizing our PR workflow. (cherry picked from commit d85a31f)
* Add contributor-targeted description of the PR workflow. Simpler/shorter description of the PR workflow targeted for new contributors and contributors who did not follow the recent changes in the PR workflow. * Update CONTRIBUTING.rst Co-authored-by: Xiaodong DENG <xd.deng.r@gmail.com> * Update CONTRIBUTING.rst Co-authored-by: Xiaodong DENG <xd.deng.r@gmail.com> Co-authored-by: Xiaodong DENG <xd.deng.r@gmail.com> (cherry picked from commit 6c8c308)
This change adds even more aggressive cancelling of duplicates of 'Build Image' jobs. it's not an obvious task to know which Build Image jobs are duplicates, we are matching those duplicates based on specially crafted "build-info" job names. We add Event, Branch, Repo to the job names and assume that two runs with the same event + branch + repo are duplicates. It also disables self-preservation for this step because it is perfectly ok to cancel itself in case there is a newer in-progress Build Image job. Unfortunately even this will not work perfectly well. Those job names are resolved only for the jobs that are runnning rather than the queued ones, so in case we have several duplicates of the same build image job in the queue, they will not be found/cancelled. The cancelling will only happen if both duplicates are already running. It's good enough for now and we cannot do much more until there is a missing feature added to GitHub API that allows to link the workflow_run with the run that triggered it. This issue has been raised to GitHub Support and internal engineering ticket has been apparently opened to add this feature. More detailed status for the missing feature is kept at #11294 (cherry picked from commit 1d14e74)
(cherry picked from commit 77ae32e)
The action to cancel workflow switched from deprecated method of retrieving jobs to a 'better' one but it caused some unexpected failures as some of the job data is not iterable and failures in 'failedJobs" matching Version 4.6 fixed the problem. (cherry picked from commit aa5213b)
The previous update for 4.3 version of the action also broke CodeQL cancelling. This PR fixes it. (cherry picked from commit 5f9792c)
Sometimes (quite often really) when PR gets approved, the PR gets merged rather quickly, without waiting for result of this action. Or a new PR gets pushed quickly. In those cases PR will not be found. But this is usually not a problem then and rather than failing, we should simply print a warning and exit. (cherry picked from commit a7a7cf2)
There was a problem that documentation-only checks triggered selective checks without docs build (they resulted in basic-checks-only and no images being built. This occured for example in #12025 This PR fixes it by adding image-build and docs-build as two separate outputs. (cherry picked from commit adbf764)
DockerHub uses `hooks/build` to build the image and it passes DOCKER_TAG variable when the script is called. This PR makes the DOCKER_TAG to provide the default valuei for tag that is calculated from sources (taking the default branch and python version). Since it is only set in the DockerHub build, it should be safe. Fixes #11937 (cherry picked from commit 5c199fb)
(cherry picked from commit d8aa24b)
The `exit` and `quit` functions are actually `site.Quitter` objects and are loaded, at interpreter start up, from site.py. However, if the interpreter is started with the `-S` flag, or a custom site.py is used then exit and quit may not be present. It is recommended to use `sys.exit()` which is built into the interpreter and is guaranteed to be present. (cherry picked from commit bec9f3b)
The SHA in check was not working for PRs from forks. (cherry picked from commit d559da1)
After Debian 9 and according to the manual https://manpages.debian.org/stretch/apt/apt-key.8.en.html, after Debian 9 instead of using "apt-key add" a keyring should be placed directly in the /etc/apt/trusted.gpg.d/ directory with a descriptive name and either "gpg" or "asc" as file extension. Also added better redirection on the apt-key list command. (cherry picked from commit ded3dbb)
* Add Kubernetes files to selective checks There are multiple kubernetes-related files that require running the k8s integration tests. This PR adds those to the run_selective_tests * Update scripts/ci/selective_ci_checks.sh Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> * Update scripts/ci/selective_ci_checks.sh Co-authored-by: Jarek Potiuk <jarek@potiuk.com> * Update scripts/ci/selective_ci_checks.sh Co-authored-by: Jarek Potiuk <jarek@potiuk.com> * Update scripts/ci/selective_ci_checks.sh Co-authored-by: Jarek Potiuk <jarek@potiuk.com> * Update scripts/ci/selective_ci_checks.sh Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com> (cherry picked from commit 054de07)
Those variables are defined in GitHub environment so when they were recently addded it was not obvious that they will fail when running kubernetes tests locally. This PR fixes that. (cherry picked from commit 5351f0d)
Switch out deprecated helm repo for new stable repo. - https://www.cncf.io/blog/2020/11/05/helm-chart-repository-deprecation-update/ - https://helm.sh/docs/faq/#i-am-getting-a-warning-about-unable-to-get-an-update-from-the-stable-chart-repository (cherry picked from commit 128c991)
There are few more variables that (if not defined) prevent from using the CI image directly without breeze or the CI scripts. With this change you can run: `docker run -it apache/airflow:master-python3.6-ci` and enter the image without errors. (cherry picked from commit c7f3410)
The flag was broken - bad cache parameter value was passed. This PR fixes it. (cherry picked from commit 5c60157)
When new Python version is released (bugfixes), we rebuild the CI image and replace it with the new one, however releasing of the python image and CI image is often hours or even days apart (we only release the CI image when tests pass in master with the new python image). We already use a better approach for Github - we simply push the new python image to our registry together with the CI image and the CI jobs are always pulling them from our registry knowing that the two - python and CI image are in sync. This PR introduces the same approach. We not only push CI image but also the corresponding Python image to our registry. This has no ill effect - DockerHub handles it automatically and reuses the layers of the image directly from the Python one so it is merely a label that is stored in our registry that points to the exact Python image that was used by the last pushed CI image. (cherry picked from commit 75bdfae)
The selective checks are run in "workflow_run" because they need to be able to set label and make comments, however status of those checks are not displayed in GitHub and in cases of small PRs the "merge" button might be green before the status complete. This PR adds additional check that is always completed after the "worfklow_run" finishes it's job. This will prevent accidental merges before the check completes. (cherry picked from commit 63ac07d)
The test_core.py has been used as example in Breeze and it's location changed to tests/core folder. This PR fixes references to the changed location. (cherry picked from commit 57b273a)
(cherry picked from commit e77867a)
(cherry picked from commit d8abee6)
The "tmp" directory is mounted from the host (from tmp folder in the source airflow directory). This is needed to get some of our docker-in-docker tools (such as gcloud/aws/java) and get them working on demand. Thanks to that we do not have to increase the size of CI image unnecessarily. Those tools were introduced and made to work in #9376 However this causes some of the standard tools (such as apt-get) to not work inside the container unless the mounted /tmp folder has write permission for groups/other. This PR fixes it. (cherry picked from commit a42bbe2)
(cherry picked from commit 08d67ad)
PIP upgrades itself after eager update, and since we (for now) stick with the 20.2.4 version we want to reset PIP to that version after eager upgrade. (cherry picked from commit c44092f)
For image build the python version is passed via PYTHON_MAJOR_MINOR_VERSION but there is a part of the build (preparing airflow package) that uses python installed on host. This is fine for Master/2.0 to use same version as the image but it should be unified (and in 1.10 when trying to build 2.7 image it would fail). (cherry picked from commit ba1111a)
(cherry picked from commit f5c4b24)
(cherry picked from commit 23a4787)
(cherry picked from commit a9d562e)
(cherry picked from commit a3cc78d)
* Allow webserver to read pod logs directly For users who are testing the KubernetesExecutor, allows users to read pod logs via the Kubernetes API. Worth noting that these logs will only be accessible while the worker is running. * fix tests (cherry picked from commit 9f28e41)
There was a change in Policy of ASF that only "Made by GitHub" actions and actions residing in Apache-owned repositories are allowed to be used for ASF projects. This was in response to a security incident. More details: Policy: * https://infra.apache.org/github-actions-secrets.html Discussion builds@apache.org: * https://lists.apache.org/thread.html/r435c45dfc28ec74e28314aa9db8a216a2b45ff7f27b15932035d3f65%40%3Cbuilds.apache.org%3E Discussion users@infra.apache.org: * https://lists.apache.org/thread.html/r900f8f9a874006ed8121bdc901a0d1acccbb340882c1f94dad61a5e9%40%3Cusers.infra.apache.org%3E (cherry picked from commit c6d66cd)
This PR disables persisting credentials in Github Actions checkout. This is a result of discussion in builds@apache.org https://lists.apache.org/thread.html/r435c45dfc28ec74e28314aa9db8a216a2b45ff7f27b15932035d3f65%40%3Cbuilds.apache.org%3E It turns out that contrary to the documentation actios (specifically checkout action) can use GITHUB_TOKEN without specifying it as input in the yaml file and the GitHub checkout action leaves the repository with credentials stored locally that enable pushing to Github Repository by any step in the same job. This was thought to be forbidden initially (and the documentation clearly says that the action must have the GITHUB_TOKEN passed to it in .yaml workflow in order to use it). But apparently it behaves differently. This leaves open an attack vector where for example any PIP package installed in the following steps could push any changes to GitHub Repository of Apache Airflow. Security incidents have been reported to both GitHub and Apache Security team, but in the meantime we add configuration to remove credentials after checkout step. https://docs.github.com/en/free-pro-team@latest/actions/reference/authentication-in-a-workflow#using-the-github_token-in-a-workflow > Using the GITHUB_TOKEN in a workflow > To use the GITHUB_TOKEN secret, you *must* reference it in your workflow file. Using a token might include passing the token as an input to an action that requires it, or making authenticated GitHub API calls. (cherry picked from commit d079b91)
Rather than having to mirror all the repos we can instead use git submodules to pull in the third party actions we want to use - with recent(ish) changes in review for submodules on GitHub we still get the same "review/audit" visibility for changes, but this way we don't have to either "pollute" our repo with the actions code, nor do we have to maintain a fork of the third party action. (cherry picked from commit f115983)
This rule is problematic because it hangs randomly - both in test and production. This PR removes the rule.
Master is failing on docs build because of the following error: ``` Failed to fetch inventory: https://api.mongodb.com/python/current/objects.inv ``` This is because the URL is changed to https://pymongo.readthedocs.io/en/stable/objects.inv (cherry picked from commit 7a5aafc)
This PR implements an upgrade check for SparkJDBCOperator. In Airflow 1.10.x, the default name for SparkJDBCOperator class conn_id is 'spark-default'. From Airflow 2.0, it has been changed to 'spark_default' to conform with the naming conventions of all other connection names. closes: #12918
This upgrade check inspects the version of the supported database backend (PostgreSQL, MySQL, and SQLite) so as to verify if the version is supported in Airflow 2.0 This is ease upgrade to Airflow 2.0 closes: #13850
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
|
Signed-off-by: Lord-Y <Lord-Y@users.noreply.github.com>
Member
|
@levihernan Can you create a new PR with This PR tries to merge v1-10-stable to master only. I guess you forgot there might be a mistake. :) |
Member
|
You can take a look at #13995 for an example -- let me know if you need any help |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://www.wabicasa.com/