Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented May 6, 2021

<!--

Thank you for contributing! Please make sure that your code changes
are covered with tests. And in case of new features or big changes
remember to adjust the documentation.

Feel free to ping committers for the review!

In case of existing issue, reference it using one of the following:

closes: #ISSUE
related: #ISSUE

How to write a good git commit message:
http://chris.beams.io/posts/git-commit/
-->


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added area:dev-tools provider:cncf-kubernetes Kubernetes (k8s) provider related issues area:providers area:Scheduler including HA (high availability) scheduler provider:Apache labels May 6, 2021
Daniel Standish and others added 25 commits May 6, 2021 16:10
)

After PROD images were added, some of the flags had two meanings

These behaved differently in PROD image and CI image and were the
source of confusion especially when start-airflow command was used.

For PROD image, the image can be customized during image building,
and packages could be installed from .whl or .sdist packages
available in `docker-context-files`. This
is used at CI and dockerhub building time to produce image built
packages that were prepared using local sources.

The CI image is always built from local sources but airflow can
be removed and re-installed at runtime from pypi.
Both airflow and provider packages can be installed
from .whl or .sdist packages available in dist folder. This is
used in CI to test current provider packages with older
Airflow released (2.0.0) and to test provider packages locally.

After the change we have two sets of flags/variables:

PROD image (building image):

* install-airflow-version, install-airflow-reference,
  install-from-docker-context-files

CI image (runtime):

* use-airflow-version, use-packages-from-dist

That should avoid confusion and failures of commands such as
`start-airflow` that is used to test provider packages and
airflow itself.

(cherry picked from commit 36ba5b6)
This build is not really needed any more gathering stats
about quarantined builds was not very successful experiment.

(cherry picked from commit 63bec6f)
Since 2.0.2 was released yesterday, our guides and Breeze should point
to that.

(cherry picked from commit b314c71)
* Fixes constraint generation for pypi providers

The constraints generated from PyPI version of providers, missed
core requirements of Airflow, therefore the constraints were not
consistent with setup.py core requirements.

Fixes: apache#15463
(cherry picked from commit 5da74f6)
There are a number of places where we want the current Airflow version
to appear in the docs, and sphinx has this build in, `|version|`.

But sadly that only works for "inline text", it doesn't work in code
blocks or inline code. This PR also adds two custom plugins that make
this work inspired by
https://github.com/adamtheturtle/sphinx-substitution-extensions (but
entirely re-written as that module Just Didn't Work)

(cherry picked from commit 4c8a32c)
`ou` -> `you`

(cherry picked from commit 150f225)
…che#15438)

The Dockerfile is more "packed" and certain ARG/ENVs are in separate
parts of it but we save minutes in certain scenarios when the images
are built (especially when they are built in parallell, the
difference might be significant)

This change also removes some of the old, already unused CASS_DRIVER
ARGS and ENVS. They are not needed any more as cassandra drivers do
not require CPYTHON compilation any more.

(cherry picked from commit 043a88d)
In most cases these are the same -- the one exception is when
(re)opening an issue, in which case the actor is going to be someone
with commit rights to a repo, and we don't want the mere act of
re-opening to cause a PR to run on self-hosted infrastructure as that
would be surprising (and potentially unsafe)

(cherry picked from commit be8d2b1)
* Use Pip 21.* to install airflow officially

The PIP 20.2.4 was so far the only officially supported installation
mechanism for Airflow as there were some problems with conflicting
dependencies (which were ignored by previous versio of PIP).

This change attempts to solve this by removing a [gcp] extra
from `apache-beam` which turns out to be the major source of
the problem - as it contains requirements to the old version of
google client libraries (but apparently only used for tests).

The "apache-beam" provider migh however need the [gcp] extra
for other components so in order to not break the backwards
compatibility, another approach is used.

Instead of adding [gcp] as extra in the apache-beam extra,
the apache.beam provider's [google] extra is extended with
'apache-beam[gcp]' additional requirement so that whenever the
provider is installed, the apache-beam with [gcp] extra is installed
as well.

* Update airflow/providers/apache/beam/CHANGELOG.rst

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

* Update airflow/providers/apache/beam/CHANGELOG.rst

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

* Update airflow/providers/google/CHANGELOG.rst

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

* Update airflow/providers/google/CHANGELOG.rst

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
(cherry picked from commit e229f35)
)

The release manager, when reviewing providers to release might
make interactive decisions what to do:

1) mark certain provider as 'doc-only' change
2) decide whethere to generate documentation for the provider

In case the provider change is marked as 'doc-only' the next time
when providers are checked the doc-only change is not seen as
'change' and the provider is automatically skipped.

This saves time when preparing subsequent releases of providers
as all the "doc-only changes" from the previous release do not
have to be re-reviewed (unless there are some new changes).

(cherry picked from commit 40a2476)
Newer versions of hadolint hint about more Docker problems:

* consecutive RUN operation
* invalid labels

This PR fixes all the problems reported in our dockerfiles
by the latest hadolint and refreshes all our images used in CI
and chart so that corrected label names are included (one of
the errors in all our dockerfiles turned out to be camel-case
and - in label keys, which is not valid according to
Docker label key specification.

Fixes: apache#15544
(cherry picked from commit 6580a2c)
This change improves the process of image preparation in DockerHub
and manual version of it, in case the DockerHub automation does
not work. It introduces the following changes:

* The "nightly-master" builds were failing because they tried
  to prepare packages without the "dev" suffix (such packages
  are skipped now in case package with the same version has
  already been released). The "dev" suffix forces the packages
  to be build.

* VERBOSE_COMMAND variable is removed to get more readable output
  of the script.

* Image verification is now part of the process. The images are
  automatically tested after they are built and the scripts
  will not push the images if the images do not pass the
  verification.

* Documentation is updated for both RC and final image preparation
  (Previous update did not update the RC image preparation)

* Documentation is added to explain how to manually refresh the
  images in DockerHub in case the nightly builds are not running
  for a long time.

(cherry picked from commit 7f6ddda)
The image tagging now is fully automated within the build
dockerhub script including :<VERSION> and :latest tags.

(cherry picked from commit 3d227f2)
When building images for production we are using docker-context-files
where we build packages to install. However if those context files
are not cleaned up, they unnecessary increase size and time needed
to build image and they invalidate the COPY . layer of the image.

This PR checks if docker-context-files folder contains just readme
when Breeze build-image command is run (for cases where
images are not built from docker-context-files). Inversely it
also checks that there are some files in case the image is
built with --install-from-docker-context-files switch.

This PR also ads a --cleanup-docker-context-files switch to
clean-up the folder automatically. The error mesages also help
the user instructing the user what to do.

(cherry picked from commit bf81d2e)
…he#15592)

* Better description of UID/GID behaviour in image and quickstart

Following the discussion in
apache#15579
seems that the AIRFLOW_UID/GID parameters were not clearly
explained in the Docker Quick-start guide and some users could
find it confusing.

This PR attempts to clarify it.

* fixup! Better description of UID/GID behaviour in image and quickstart

(cherry picked from commit 4226f64)
Error because `webpack` is not install because `yarn install --frozen-lockfile` is not run:

```
root@f5fc5cfc9a43:/opt/airflow# cd /opt/airflow/airflow/www/; yarn dev
yarn run v1.22.5
$ NODE_ENV=dev webpack --watch --colors --progress --debug --output-pathinfo --devtool eval-cheap-source-map -
-mode development
/bin/sh: 1: webpack: not found
error Command failed with exit code 127.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
root@f5fc5cfc9a43:/opt/airflow/airflow/www#
```

This commits adds `yarn install --frozen-lockfile` to the command which fixes it.

This was missed in https://github.com/apache/airflow/pull/13313/files

(cherry picked from commit 60a3da6)
…nks` (apache#15673)

Without this change it is impossible for one of the providers to depend
upon the "dev"/current version of Airflow -- pip instead would try and
go out to PyPI to find the version (which almost certainly wont exist,
as it hasn't been released yet)

(cherry picked from commit 13faa69)
* Rename nteract-scrapbook to scrapbook

* fixup! Rename nteract-scrapbook to scrapbook

* Remove version pin given it's minimal version

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>

Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com>
(cherry picked from commit 9ba467b)
Deprecated provider aliases (e.g. kubernetes -> cncf.kubernetes) should
install the provider package (e.g.  apache-airflow-provider-cncf-kubernetes)
by default, not the requirements for the provider package. This behavior
was accidentally broken.

(cherry picked from commit fdea622)
It seems that the < 20.0 limit for gunicorn was added at some point
in time without actual reason. We are already using gunicorn in
1.10 line of Airflow, so it should not be a problem to bump the
version of gunicorn, especially that the 19. line is somewhat
deprecated already.

This change came after the discussion n apache#15570

(cherry picked from commit d7a14a8)
@potiuk potiuk force-pushed the prepare-2.0.3-release branch from 361869f to bf2f4ce Compare May 6, 2021 14:11
@potiuk potiuk force-pushed the prepare-2.0.3-release branch from 953eac1 to 7276123 Compare May 6, 2021 19:01
@potiuk potiuk closed this May 6, 2021
@potiuk potiuk reopened this May 6, 2021
@potiuk potiuk closed this May 6, 2021
@potiuk potiuk deleted the prepare-2.0.3-release branch July 29, 2022 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools area:providers area:Scheduler including HA (high availability) scheduler provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants