Support for replacing table in BigQueryCreateEmptyTableOperator#12051
Closed
shaneikennedy wants to merge 195 commits intoapache:v1-10-testfrom
Closed
Support for replacing table in BigQueryCreateEmptyTableOperator#12051shaneikennedy wants to merge 195 commits intoapache:v1-10-testfrom
shaneikennedy wants to merge 195 commits intoapache:v1-10-testfrom
Conversation
apache#10732) * Create a script to migrate KubernetesExecutor airflow.cfg configs to pod_template_file * fix help for command * add test * address comments * pass for 2.7
Co-authored-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com> Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
* Fix breaking changes in Pod conversion for 1.10.13 * fix tests * fix flake8 * fix test * fix image secrets * Update airflow/kubernetes/pod_launcher.py Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
(cherry picked from commit f76936f)
* More fancy environment checking * fixup! More fancy environment checking (cherry picked from commit 88e5c35)
* Add redbubble link to Airflow merch * Update README.md Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com> Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com> (cherry picked from commit 558be73)
…ache#10360) (cherry picked from commit b51de98)
(cherry picked from commit 485ecc4)
Part of apache#10368 (cherry picked from commit a32e90a)
…#10387) Extracted from apache#10368 (cherry picked from commit 306a666)
…pache#10158) (cherry picked from commit 3f972a3)
The output of pre-commit builds on both CI and locally is now limited to only show errors, unless verbose variable is set. We are utilising aliases if possible but in case of pre-commits they are run in non-interactive shell which means that aliases do not work as expected so we have to run a few functions directly in other to show spinner. Extracted from apache#10368 (cherry picked from commit 77a635e)
The EMBEDDED dags were only really useful for testing but it required to customise built production image (run with extra --build-arg flag). This is not needed as it is better to extend the image instead with FROM and add dags afterwards. This way you do not have to rebuild the image while iterating on it. (cherry picked from commit e179853)
This allows for all the kinds of verbosity we want, including writing outputs to output files, and it also works out-of-the-box in git-commit non-interactive shell scripts. Also as a side effect we have mocked tools in bats tests, which will allow us to write more comprehensive unit tests for the bash scripts of ours (this is a long overdue task). Part of apache#10368 (cherry picked from commit db446f2)
Part of apache#10368 (cherry picked from commit 08fe5c4)
…ache#10344) (cherry picked from commit 8fcb93b)
Previosuly it was failing with `unbound variable AIRFLOW_PROD_BASE_TAG` and failing because it could not find "kind" binary (cherry picked from commit 5739ba2)
* CI Images are now pre-build and stored in registry With this change we utilise the latest pull_request_target event type from Github Actions and we are building the CI image only once (per version) for the entire run. This safes from 2 to 10 minutes per job (!) depending on how much of the Docker image needs to be rebuilt. It works in the way that the image is built only in the build-or-wait step. In case of direct push run or scheduled runs, the build-or-wait step builds and pushes to the GitHub registry the CI image. In case of the pull_request runs, the build-and-wait step waits until separate build-ci-image.yml workflow builds and pushes the image and it will only move forward once the image is ready. This has numerous advantages: 1) Each job that requires CI image is much faster because instead of pulling + rebuilding the image it only pulls the image that was build once. This saves around 2 minutes per job in regular builds but in case of python patch level updates, or adding new requirements it can save up to 10 minutes per job (!) 2) While the images are buing rebuilt we only block one job waiting for all the images. The tests will start running in parallell only when all images are ready, so we are not blocking other runs from running. 3) Whole run uses THE SAME image. Previously we could have some variations because the images were built at different times and potentially releases of dependencies in-between several jobs could make different jobs in the same run use slightly different image. This is not happening any more. 4) Also when we push image to github or dockerhub we push the very same image that was built and tested. Previously it could happen that the image pushed was slightly different than the one that was used for testing (for the same reason) 5) Similar case is with the production images. We are now building and pushing consistently the same images accross the board. 6) Documentation building is split into two parallel jobs docs building and spell checking - decreases elapsed time for the docs build. 7) Last but not least - we keep the history of al the images - those images contain SHA of the commit. This means that we can simply download and run the image locally to reproduce any problem that anyone had in their PR (!). This is super useful to be able to help others to test their problems. * fixup! CI Images are now pre-build and stored in registry * fixup! fixup! CI Images are now pre-build and stored in registry * fixup! fixup! fixup! CI Images are now pre-build and stored in registry * fixup! fixup! fixup! CI Images are now pre-build and stored in registry (cherry picked from commit de7500d)
(cherry picked from commit e92d50e)
Recent releases of FAB and Celery caused our installation to fail. Luckily we have protection so that regular PRs are not affected, however we need to update the setup.py to exclude those dependencies that cause the problem. Those are: * vine - which is used by Celery Sensor (via kombu) - 5.0.0 version breaks celery-vine feature * Flask-OauthLib and flask-login - combination of the current requirements caused a conflict by forcing flask login to be 0.5.0 which is not compatible with Flask Application Builder (cherry picked from commit f76ab1f)
Snakebite's kerberos support relied on a python-krbV which has been removed from PyPI. It did not work completely anyway due to snakebite not being officially supported in python3 (snakebite-py3 did not work with SSL which made Kerberos pretty much unusable. This commit removes the snakebite's kerberos support from setup.py so that you still can install kerberos as extra for other uses. (cherry picked from commit 35840ff)
(cherry picked from commit 86d8e34)
…he#10377) This cleans up the document building process and replaces it with breeze-only. The original instructions with `pip install -e .[doc]` stopped working so there is no point keeping them. Extracted from apache#10368 (cherry picked from commit 9228bf2)
Follow up after apache#10368 (cherry picked from commit 2c3ce8e)
Breeze failed after apache#10368 (cherry picked from commit dc27a2a)
Wrong if query in the GitHub action caused upgrade to latest constraints did not work for a while. (cherry picked from commit a34f5ee)
A problem was introduced in apache#11397 where a bit too many "Build Image" jobs is being cancelled by subsequent Build Image run. For now it cancels all the Build Image jobs that are running :(. (cherry picked from commit 076fe88)
We have started to experience "unknown_blob" errors intermittently recently with GitHub Docker registry. We might eventually need to migrate to GCR (which eventually is going to replace the Docker Registry for GitHub: The ticket is opened to the Apache Infrastructure to enable access to the GCR and to make some statements about Access Rights management for GCR https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20959 Also a ticket to GitHub Support has been raised about it https://support.github.com/ticket/personal/0/861667 as we cannot delete our public images in Docker registry. But until this happens, the workaround might help us to handle the situations where we got intermittent errors while pushing to the registry. This seems to be a common error, when NGINX proxy is used to proxy Github Registry so it is likely that retrying will workaround the issue. (cherry picked from commit f9dddd5)
* Add capability of customising PyPI sources This change adds capability of customising installation of PyPI modules via custom .pypirc file. This might allow to install dependencies from in-house, vetted registry of PyPI (cherry picked from commit 45d33db)
The SHA of cancel-workflow-action in apache#11397 was pointing to previous (3.1) version of the action. This PR fixes it to point to the right (3.2) version. (cherry picked from commit 4de8f85)
* Modify helm chart to use pod_template_file Since we are deprecating most k8sexecutor arguments we should use the pod_template_file when launching airflow using the KubernetesExecutor * fix tests * one more nit * fix dag command * fix pylint (cherry picked from commit 56bd9b7)
…che#4751) This decreases scheduler delay between tasks by about 20% for larger DAGs, sometimes more for larger or more complex DAGs. The delay between tasks can be a major issue, especially when we have dags with many subdags, figures out that the scheduling process spends plenty of time in dependency checking, we took the trigger rule dependency which calls the db for each task instance, we made it call the db just once for each dag_run (cherry picked from commit 50efda5)
If you used context from git repo, the .piprc file was missing and COPY in Dockerfile is not conditional. This change copies the .pypirc conditionally from the docker-context-files folder instead. Also it was needlessly copied in the main image where it is not needed and it was even dangerous to do so. (cherry picked from commit 53e5d8f)
apache#11911 This this makes for an easier idempotent create-empty-table workflow
Member
|
@shaneikennedy please work on master branch. We do not maintain operators in contrib module. See: |
c4c1cab to
91a1305
Compare
8bdd442 to
0122893
Compare
Contributor
|
Was this ever released in another in any other PR? |
Member
|
I think even eldest do not know after 3 years. But you have all the sources to look for it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR addresses #11911 and adds support for replacing an existing table when using the BigQueryCreateEmptyTableOperator.
It is a currently a WIP. Things that still need to be addressed:
There are two commits for now but once the points above are resolved I will squash everything to a single commit 👍
@manesioz I know you wanted to work on this one too so please feel free to help out and we can co-author this one!
@turbaszek I tried to follow the suggestion you left on the related issue, let me know what you think!