Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Opt-in Layer Caching #99

Merged
merged 3 commits into from
Aug 11, 2020
Merged

RFC: Opt-in Layer Caching #99

merged 3 commits into from
Aug 11, 2020

Conversation

sclevine
Copy link
Member

@sclevine sclevine commented Jul 27, 2020

Signed-off-by: Stephen Levine <stephen.levine@gmail.com>
@sclevine sclevine requested a review from a team as a code owner July 27, 2020 04:43
text/0000-opt-in-layer-caching.md Outdated Show resolved Hide resolved
[alternatives]: #alternatives

- Keep current behavior.
- Rename the layer directory to `<layers>/<layer>.restore/` in addition to renaming the `<layers>/<layer>.toml` files to `<layers>/<layer>.toml.restore`. This would allow the lifecycle build phase to ignore the layers instead of deleting them, (which may be desirable for performance). The rename would have to apply to both files, given that `launch = true` + `cache = false` layers don't have a `<layers>/<layer>` directory that is restored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the alternative toml filename meant to be this instead?

Suggested change
- Rename the layer directory to `<layers>/<layer>.restore/` in addition to renaming the `<layers>/<layer>.toml` files to `<layers>/<layer>.toml.restore`. This would allow the lifecycle build phase to ignore the layers instead of deleting them, (which may be desirable for performance). The rename would have to apply to both files, given that `launch = true` + `cache = false` layers don't have a `<layers>/<layer>` directory that is restored.
- Rename the layer directory to `<layers>/<layer>.restore/` in addition to renaming the `<layers>/<layer>.toml` files to `<layers>/<layer>.restore.toml`. This would allow the lifecycle build phase to ignore the layers instead of deleting them, (which may be desirable for performance). The rename would have to apply to both files, given that `launch = true` + `cache = false` layers don't have a `<layers>/<layer>` directory that is restored.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I have a strong opinion, but it seems slightly more complex to rename .restore.toml to .toml vs. removing the .restore suffix.

Maybe the right question is: do we intended for the file to be restored before it's used as a TOML file, or read/modified before it's restored?

Copy link
Contributor

@edmorley edmorley Jul 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sclevine Ah to clarify, I'd interpreted the "in addition" in:

Rename the layer directory to <layers>/<layer>.restore/ in addition to renaming the

...to mean "this alternative differs in two ways from the original proposal" (both renaming the layer directory and renaming the toml file), rather than "this alternative differs in one way, and the rest is the same".

And so thought the <layers>/<layer>.toml.restore was a typo.

Reading it back now I think the current wording is perhaps fine, I'd just misinterpreted it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Signed-off-by: Stephen Levine <stephen.levine@gmail.com>

Co-authored-by: Ed Morley <501702+edmorley@users.noreply.github.com>
[alternatives]: #alternatives

- Keep current behavior.
- Rename the layer directory to `<layers>/<layer>.restore/` in addition to renaming the `<layers>/<layer>.toml` files to `<layers>/<layer>.toml.restore`. This would allow the lifecycle build phase to ignore the layers instead of deleting them, (which may be desirable for performance). The rename would have to apply to both files, given that `launch = true` + `cache = false` layers don't have a `<layers>/<layer>` directory that is restored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't the builder ignore these layers based on the name of the toml file <layers>/<layer>.toml.restore vs <layers>/<layer>.toml without require a rename of the directory as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nebhale
Copy link
Contributor

nebhale commented Jul 27, 2020

An alternative strategy: all <layer>.toml restoration happens as it always has with the exception that none of launch, build, or cache keys is ever restored. This would imply that any untouched <layer>.toml file, behaves as the rest of the spec expects; with none of those keys set, the layer is ignored from all persistence. Opting to keep the layer could be a one-liner ($ printf "\nlaunch = true\n >> <layer>.toml). The usual suspect libraries likely already do this (re-write the metadata with the proper keys) and if not, can be update to hide this behavior from users.

@nebhale nebhale requested a review from a team July 27, 2020 17:24
@sclevine
Copy link
Member Author

@nebhale I like that idea, seems very clean. The builder would still have to remove layers though, due to: https://github.com/buildpacks/rfcs/pull/99/files#diff-3ce8e159528f9d13f76a83a12221fed0R29

@nebhale
Copy link
Contributor

nebhale commented Jul 27, 2020

Agreed on the deletion. If not persisting, delete it instead rather than just moving on as a no-op.

@sclevine
Copy link
Member Author

@nebhale Idea: what if the builder moved <layers>/<layer> to <layers>/<layer>.disabled when build = false? That would be maximally performant and solve all linkage edge cases (including some edge cases that exist today).

@nebhale
Copy link
Contributor

nebhale commented Jul 27, 2020

@sclevine I mean, if it's spec'd someone will exploit it, but I get the point and it seems like a good idea.

@sclevine
Copy link
Member Author

Not worried about intentional exploitation of anything, just edge cases.

For example, imagine you have a build = true + launch = true layer foo that sends a path to a subsequent buildpack via an environment variable, and the subsequent buildpack encodes that path in a layer bar that depends on foo.
If foo becomes build = false + launch = false on the next build, bar would still appear to work during build, even though it won't have access to the layer at launch. Renaming foo to foo.disabled during build would prevent this edge case.

Signed-off-by: Stephen Levine <stephen.levine@gmail.com>
@sclevine
Copy link
Member Author

Proposal updated to reflect @nebhale's suggested UX. I'm convinced that it's the best path forward, especially because it forces the buildpack to restate exactly how the layer should be cached/exposed/exported each time.

@nebhale
Copy link
Contributor

nebhale commented Jul 28, 2020

@sclevine My point was that I'm pretty sure someone will eventually call /layers/paketo-buildpacks_bellsoft-liberica/jdk.disabled/bin/javac because it's specified and predictable. This is not a reason to not do it, just one of those things I'm sure we'll see 🙄

@nebhale nebhale requested a review from a team July 28, 2020 15:02
@nebhale nebhale requested a review from a team July 28, 2020 15:04
@nebhale nebhale requested a review from a team August 3, 2020 14:37
@nebhale
Copy link
Contributor

nebhale commented Aug 3, 2020

Final Comment Period with merge disposition, closing on August 10, 2020.

@hone
Copy link
Member

hone commented Aug 5, 2020

I understand why we want this, but it definitely feels like an advanced feature for people who maintain more complex buildpacks. The current default is probably what most people expect. If we move through with this, it would definitely require doc changes.

@nebhale
Copy link
Contributor

nebhale commented Aug 5, 2020

@hone I wonder about the default being what they want though. In my experience proper generating and comparing enough metadata to properly determine if a layer is reused is surprisingly difficult. For example if you wanted to determine if you could re-use a compiled Java application, you'd have to compare

  1. Fingerprints of all source files
  2. The version of Java
  3. The version of Maven
  4. The arguments passed to Maven
  5. The environment variables that Maven is exposed to during execution
  6. The fingerprint of a settings.xml file that might optionally be provided.

What I suspect is happening is that many builds are re-using layers that shouldn't be reused out in the edge cases and users, rather than opening issue about too-aggressive caching just clear the application's entire cache and lose all the value of caching for the bits that were working correctly. Given that opt-in, even for simple bash-based buildpacks is a one-liner ($ printf "\nlaunch = true\n >> <layer>.toml) I think the value is pretty stacked towards the upside.

nebhale added a commit that referenced this pull request Aug 11, 2020
[#99]

Signed-off-by: Ben Hale <bhale@vmware.com>
@nebhale nebhale merged commit 69d685b into main Aug 11, 2020
@nebhale nebhale deleted the opt-in-layer-caching branch August 11, 2020 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants