Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude yarn.lock from release packages #16577

Merged
merged 1 commit into from
Jun 22, 2021

Conversation

kaxil
Copy link
Member

@kaxil kaxil commented Jun 22, 2021

Same as #16494 - However that PR had to be reverted in #16518 as it failed building of PROD image, this PR/commit will fix it.

PROBLEM: Currently the airflow wheel and tar.gz is built with the yarn.lock which is not actually used by the airflow itself. Having this file in the docker image causes the clair and trivy scanners to fail.

FIX: The fix is to exclude the yarn.lock by specifying it in the manifest.in to exclude from both tar.gz and wheel -- source and binary distributions


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@kaxil kaxil requested a review from ashb June 22, 2021 00:43
@boring-cyborg boring-cyborg bot added area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues labels Jun 22, 2021
@uranusjr
Copy link
Member

FIX: The fix is to exclude the yarn.lock by specifying it in the manifest.in

This would also exclude the file from .tar.gz (aka sdist). Is this OK to do?

@uranusjr
Copy link
Member

I'm also curious why excluding the file from MANIFEST.in changes the wheel content. That file is for sdist inclusion and doesn't generally affect the wheel. There is something in the build steps I'm not understanding, probably.

@potiuk
Copy link
Member

potiuk commented Jun 22, 2021

@uraunsjr is right. MANIFEST.in is for sdist. For wheel package, we are including everything in the packages (yarn.lock is placed in airflow.www package (init.py file is in the same folder as yarn.lock). Therefore we need to exclude the file via exclude_package_data.

Closing in favour of #16580

@potiuk potiuk closed this Jun 22, 2021
@ashb ashb reopened this Jun 22, 2021
@potiuk
Copy link
Member

potiuk commented Jun 22, 2021

Sorry for closing it. I believe #16580 is the right fix (including prod Dockerfile fix), feel free to either approve that one or incorporate here. I don't mind either (or if you find a better way of excluding the lock file only for whl package I am happy as well).

@kaxil kaxil changed the title Exclude yarn.lock from built Python wheel file Exclude yarn.lock from release packages Jun 22, 2021
@kaxil
Copy link
Member Author

kaxil commented Jun 22, 2021

@potiuk @uranusjr Did you guys actually try out? I had, unless I missed something (which is also possible :) )

The same link that you used Jarek explain the three options:

https://setuptools.readthedocs.io/en/latest/userguide/datafiles.html

image

If a package has include_package_data as True, MANIFEST.in is also used. Also summarized somewhat in pypa/setuptools#511 (comment)

@potiuk I would want to continue with this PR and you can optimize it in yours. I have updated the PR title and description to say I want to exclude yarn.lock file from both tar.gz and .whl

airflow/www/compile_assets.sh Outdated Show resolved Hide resolved
Same as apache#16494 - However that PR had to be reverted in apache#16518 as it failed building of PROD image, this PR/commit will fix it.

PROBLEM: Currently the airflow wheel is built with the yarn.lock which is not actually used by the airflow itself. Having this file in the docker image causes the clair and trivy scanners to fail

FIX: The fix is to exclude the yarn.lock by specifying it in the manifest.in
@potiuk
Copy link
Member

potiuk commented Jun 22, 2021

@potiuk @uranusjr Did you guys actually try out? I had, unless I missed something (which is also possible :) )

Yep. I tried. I think the main point (and @urunsjr I believe agrees with me on that) is that we should NOT exclude yarn.lock from the sdist package. It's quite useful to have it ithere, because this will allow the users to repeatably build the package on their own using sources.

UPDATE: Stupid me. Mixed sources with sdist :(

@ashb
Copy link
Member

ashb commented Jun 22, 2021

If we include the compiled assets, then we should not include the yarn.lock.

Users don't rebuild form the sdist -- they rebuild from the source.

@kaxil
Copy link
Member Author

kaxil commented Jun 22, 2021

@potiuk @uranusjr Did you guys actually try out? I had, unless I missed something (which is also possible :) )

Yep. I tried. I think the main point (and @urunsjr I believe agrees with me on that) is that we should NOT exclude yarn.lock from the sdist package. It's quite useful to have it ithere, because this will allow the users to repeatably build the package on their own using sources.

Well that defeats the purpose of what this PR is solving, clair and trivy and other scanners will find it.

For ASF project, if someone wants to build from source they can use apache-airflow-VERSION-source.tar.gz, example https://dist.apache.org/repos/dist/release/airflow/2.1.0/apache-airflow-2.1.0-source.tar.gz

@potiuk
Copy link
Member

potiuk commented Jun 22, 2021

If we include the compiled assets, then we should not include the yarn.lock.

Users don't rebuild form the sdist -- they rebuild from the source.

For ASF project, if someone wants to build from source they can use apache-airflow-VERSION-source.tar.gz, example https://dist.apache.org/repos/dist/release/airflow/2.1.0/apache-airflow-2.1.0-source.tar.gz

Ah you are right. I forgot that in airflow sdist and sources are separate packages (in providers they are the same). You are right I withdraw my comment and close PR :)

@kaxil kaxil merged commit aa79bfe into apache:main Jun 22, 2021
@kaxil kaxil deleted the exclud-yarn-lock-file branch June 22, 2021 11:33
@uranusjr
Copy link
Member

uranusjr commented Jun 22, 2021

Thanks for the clarification. Excluding the file from sdist sounds reasonable to me from the justification above, so +1 to this PR on its own. The only remaining issue I have is why compile_assets.sh is currently included in wheels right now (it shouldn’t since it’s not listed in package_data), and why this PR can successfully exclude the file (it shouldn’t since MANIFEST.in is not supposed to affect wheel content without include_package_data set). There are probably some weird setuptools mechanisms I’m not understanding correctly, I will need to dig into it more.

Edit: Oh wait we do have include_package_data set. I missed that since that line was declared far away from package_data, and it is generally problematic to declare both package_data and include_package_data together so I naively assumed we didn’t have include_package_data = true when I saw package_data 🤦

I will probably work on cleaning up the configs then.

@potiuk
Copy link
Member

potiuk commented Jun 22, 2021

FYI the compile_assets.sh was not included. It was run during image build after the package has been installed. But it's now fixed.

@uranusjr
Copy link
Member

FYI the compile_assets.sh was not included.

Could you clarify this a bit? It’s not clear where compile_assets.sh was not included in from your sentence. (And compile_assets.sh is definitely included in wheels currently, I just downloaded the 2.1.0 wheel from PyPI.org, it contains airflow/www/compile_assets.sh.)

@potiuk
Copy link
Member

potiuk commented Jun 22, 2021

The error we had when running the compile assets was from the sources of airlflow. When we build the image from sources, we also run the compile script after running pip install .. This is only when you install airflow from sources (which happens in CI image and in Prod image when you build locally in breeze).

In CI, the images are build from packages generated using PR sources (because we want to run K8S tests using airflow production image build from PR sources AND we want to install prod image using packages similarly when we release the image).

So in CI those things happen:

  1. first I build airflow package (using sources from the incoming PR) - here assets are compiled during preparing the package
  2. then I build all provider packages (again using sources from the incoming PR)
  3. then PROD image is built using those locally built packages rather than PyPI ones (they are copied to docker-context-files and installed from there).

The problem was that during step 3, there was an extra step run to compile the assets again (that was a mistake - fixed in this PR). This was done using compile_assets scripts from docker folder: https://github.com/apache/airflow/blob/main/scripts/docker/compile_www_assets.sh . The "scripts/docker" folder contains all the scripts that are used by the Dockerfile build. It is not part of the package, and should not be added. So possibly this folder is added there by mistake :D. I don't think any of the scripts are needed in sdist package.

@potiuk
Copy link
Member

potiuk commented Jun 22, 2021

I hope it's clearer now :).

There are many things happening in our CI - and things are pretty complex: on one hand we want to use PR sources to verify if they work, on the other we want to build airflow from packages.

But when you want to test or run the PROD/CI image locally during development (for your tests) you cannot really go through the extra hoop of preparing the packages to install them because you want to mount the sources directly to the image in order to be able to modify the sources. Having this possibility of building the images either from packages or from sources makes it much easier to iterate over your tests inside the image without having to rebuild it every time - this is basically what breeze does - builds CI/PROD image locally in the way that you can mount the local sources and continue developing locally with mounted sources.

This saves A TON of time for iterations over your tests especially when you work with the integrations of Airflow (Kerberos/mongo/redis whatever) but also when you work with errors that manifest only for specific database versions - ability of having all that up-and-running and tests running with exactly the same configuration as on CI makes it super easy to reproduce any errors you see in CI as well (and you can even add --github-image-id parameter when your CI build fails and have the exact replication of what happened in CI locally - and you can iterate and reproduce the problems there.

@potiuk
Copy link
Member

potiuk commented Jun 22, 2021

Could you clarify this a bit? It’s not clear where compile_assets.sh was not included in from your sentence. (And compile_assets.sh is definitely included in wheels currently, I just downloaded the 2.1.0 wheel from PyPI.org, it contains airflow/www/compile_assets.sh.)

And yeah - this is a different 'compile assets" script - that's why it is included. This one is used when you are inside the docker container or in your local environment when you iterate over the .js files and want to recompile them. I tend to forget about this one because I almost never do that. For that much better option is start-airflow command in Breeze because it starts the development mode of the server which reloads and recompiles stuff dynamically when it's changed I believe (It was added by James some time ago I think).

It should probably be indeed excluded.

ashb pushed a commit that referenced this pull request Jun 22, 2021
Same as #16494 - However that PR had to be reverted in #16518 as it failed building of PROD image, this PR/commit will fix it.

PROBLEM: Currently the airflow wheel is built with the yarn.lock which is not actually used by the airflow itself. Having this file in the docker image causes the clair and trivy scanners to fail

FIX: The fix is to exclude the yarn.lock by specifying it in the manifest.in
(cherry picked from commit aa79bfe)
@ashb ashb added this to the Airflow 2.1.1 milestone Jun 22, 2021
kaxil added a commit to astronomer/airflow that referenced this pull request Jun 22, 2021
Same as apache#16494 - However that PR had to be reverted in apache#16518 as it failed building of PROD image, this PR/commit will fix it.

PROBLEM: Currently the airflow wheel is built with the yarn.lock which is not actually used by the airflow itself. Having this file in the docker image causes the clair and trivy scanners to fail

FIX: The fix is to exclude the yarn.lock by specifying it in the manifest.in
(cherry picked from commit aa79bfe)
kaxil added a commit to astronomer/airflow that referenced this pull request Jun 23, 2021
Same as apache#16494 - However that PR had to be reverted in apache#16518 as it failed building of PROD image, this PR/commit will fix it.

PROBLEM: Currently the airflow wheel is built with the yarn.lock which is not actually used by the airflow itself. Having this file in the docker image causes the clair and trivy scanners to fail

FIX: The fix is to exclude the yarn.lock by specifying it in the manifest.in
(cherry picked from commit aa79bfe)
(cherry picked from commit 8fcc68d)
kaxil added a commit to astronomer/airflow that referenced this pull request Jun 23, 2021
Same as apache#16494 - However that PR had to be reverted in apache#16518 as it failed building of PROD image, this PR/commit will fix it.

PROBLEM: Currently the airflow wheel is built with the yarn.lock which is not actually used by the airflow itself. Having this file in the docker image causes the clair and trivy scanners to fail

FIX: The fix is to exclude the yarn.lock by specifying it in the manifest.in
(cherry picked from commit aa79bfe)
(cherry picked from commit 8fcc68d)
(cherry picked from commit 99d5ebd)
Jorricks pushed a commit to Jorricks/airflow that referenced this pull request Jun 24, 2021
Same as apache#16494 - However that PR had to be reverted in apache#16518 as it failed building of PROD image, this PR/commit will fix it.

PROBLEM: Currently the airflow wheel is built with the yarn.lock which is not actually used by the airflow itself. Having this file in the docker image causes the clair and trivy scanners to fail

FIX: The fix is to exclude the yarn.lock by specifying it in the manifest.in
kaxil added a commit to astronomer/airflow that referenced this pull request Jul 14, 2021
Same as apache#16494 - However that PR had to be reverted in apache#16518 as it failed building of PROD image, this PR/commit will fix it.

PROBLEM: Currently the airflow wheel is built with the yarn.lock which is not actually used by the airflow itself. Having this file in the docker image causes the clair and trivy scanners to fail

FIX: The fix is to exclude the yarn.lock by specifying it in the manifest.in
(cherry picked from commit aa79bfe)
potiuk added a commit to potiuk/airflow that referenced this pull request Jul 19, 2021
The apache#16577 change removed yarn.lock from installed packages
and it removed the possibility of preparing assets after the
package is installed - so far that was the way it was done in
the PROD image built from sources. The asset compilation
was supposed to work after the change but it was not
performed in this case.

The change fixes it by:

* detecting properly if the PROD image is built from sources
  (INSTALLATION_METHOD)
* compiling the assets from sources, not from package
* installing airflow from sources AFTER assets were compiled

Fixes apache#16939
potiuk added a commit that referenced this pull request Jul 19, 2021
The #16577 change removed yarn.lock from installed packages
and it removed the possibility of preparing assets after the
package is installed - so far that was the way it was done in
the PROD image built from sources. The asset compilation
was supposed to work after the change but it was not
performed in this case.

The change fixes it by:

* detecting properly if the PROD image is built from sources
  (INSTALLATION_METHOD)
* compiling the assets from sources, not from package
* installing airflow from sources AFTER assets were compiled

Fixes #16939
kaxil added a commit to astronomer/airflow that referenced this pull request Jul 20, 2021
Same as apache#16494 - However that PR had to be reverted in apache#16518 as it failed building of PROD image, this PR/commit will fix it.

PROBLEM: Currently the airflow wheel is built with the yarn.lock which is not actually used by the airflow itself. Having this file in the docker image causes the clair and trivy scanners to fail

FIX: The fix is to exclude the yarn.lock by specifying it in the manifest.in
(cherry picked from commit aa79bfe)
(cherry picked from commit 6c80e3f)
kaxil added a commit to astronomer/airflow that referenced this pull request Jul 21, 2021
Same as apache#16494 - However that PR had to be reverted in apache#16518 as it failed building of PROD image, this PR/commit will fix it.

PROBLEM: Currently the airflow wheel is built with the yarn.lock which is not actually used by the airflow itself. Having this file in the docker image causes the clair and trivy scanners to fail

FIX: The fix is to exclude the yarn.lock by specifying it in the manifest.in
(cherry picked from commit aa79bfe)
(cherry picked from commit 6c80e3f)
(cherry picked from commit 25d46e4)
kaxil added a commit to astronomer/airflow that referenced this pull request Jul 21, 2021
Same as apache#16494 - However that PR had to be reverted in apache#16518 as it failed building of PROD image, this PR/commit will fix it.

PROBLEM: Currently the airflow wheel is built with the yarn.lock which is not actually used by the airflow itself. Having this file in the docker image causes the clair and trivy scanners to fail

FIX: The fix is to exclude the yarn.lock by specifying it in the manifest.in
(cherry picked from commit aa79bfe)
(cherry picked from commit 6c80e3f)
(cherry picked from commit 25d46e4)
kaxil added a commit to astronomer/airflow that referenced this pull request Jul 21, 2021
Same as apache#16494 - However that PR had to be reverted in apache#16518 as it failed building of PROD image, this PR/commit will fix it.

PROBLEM: Currently the airflow wheel is built with the yarn.lock which is not actually used by the airflow itself. Having this file in the docker image causes the clair and trivy scanners to fail

FIX: The fix is to exclude the yarn.lock by specifying it in the manifest.in
(cherry picked from commit aa79bfe)
(cherry picked from commit 6c80e3f)
(cherry picked from commit 25d46e4)
potiuk added a commit to potiuk/airflow that referenced this pull request Aug 2, 2021
…e#17086)

The apache#16577 change removed yarn.lock from installed packages
and it removed the possibility of preparing assets after the
package is installed - so far that was the way it was done in
the PROD image built from sources. The asset compilation
was supposed to work after the change but it was not
performed in this case.

The change fixes it by:

* detecting properly if the PROD image is built from sources
  (INSTALLATION_METHOD)
* compiling the assets from sources, not from package
* installing airflow from sources AFTER assets were compiled

Fixes apache#16939

(cherry picked from commit 660027f)
tzanko-matev pushed a commit to santiment/airflow that referenced this pull request Aug 17, 2021
…e#17086)

The apache#16577 change removed yarn.lock from installed packages
and it removed the possibility of preparing assets after the
package is installed - so far that was the way it was done in
the PROD image built from sources. The asset compilation
was supposed to work after the change but it was not
performed in this case.

The change fixes it by:

* detecting properly if the PROD image is built from sources
  (INSTALLATION_METHOD)
* compiling the assets from sources, not from package
* installing airflow from sources AFTER assets were compiled

Fixes apache#16939
kaxil pushed a commit that referenced this pull request Aug 17, 2021
The #16577 change removed yarn.lock from installed packages
and it removed the possibility of preparing assets after the
package is installed - so far that was the way it was done in
the PROD image built from sources. The asset compilation
was supposed to work after the change but it was not
performed in this case.

The change fixes it by:

* detecting properly if the PROD image is built from sources
  (INSTALLATION_METHOD)
* compiling the assets from sources, not from package
* installing airflow from sources AFTER assets were compiled

Fixes #16939

(cherry picked from commit 660027f)
jhtimmins pushed a commit that referenced this pull request Aug 17, 2021
The #16577 change removed yarn.lock from installed packages
and it removed the possibility of preparing assets after the
package is installed - so far that was the way it was done in
the PROD image built from sources. The asset compilation
was supposed to work after the change but it was not
performed in this case.

The change fixes it by:

* detecting properly if the PROD image is built from sources
  (INSTALLATION_METHOD)
* compiling the assets from sources, not from package
* installing airflow from sources AFTER assets were compiled

Fixes #16939

(cherry picked from commit 660027f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants