Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for .packignore file #210

Closed
neerfri opened this issue Jun 10, 2019 · 13 comments
Closed

Add support for .packignore file #210

neerfri opened this issue Jun 10, 2019 · 13 comments
Labels
status/not-planned Issue or PR that is not planned to be supported (yet).

Comments

@neerfri
Copy link
Contributor

neerfri commented Jun 10, 2019

Hi,

I'm interested in implementing a feature of .packignore file similar to .gitignore, .dockerignore, etc.

While researching for existing solutions I found:

  1. https://github.com/zabawaba99/go-gitignore - which doesn't seem to be very maintained nor popular. Don't want to bet on this.
  2. Docker's implementation in here and here. Which is not as easy to pull and use as I'd hoped, also it turns out .dockerignore is VERY different from .gitignore. Well, at least I learned something new. but seriously, I'm not so sure which one is better for pack to go with. It sits right in the middle between taking your code from git and sending it to docker :). Some people don't use git. Some people don't use docker too. If I'm thinking that pack is the direct continuation of "git-push-deploy" then the .gitignore should probably win.
  3. I've read git's implementation of .gitignore mostly here, here, here, and a few more, all in the same file. It doesn't seem like rocket science to implement. will require some quiet nights though.
  4. There's helm's implementation, it's kind of to lose from both ends, cause it's not 100% compatible with either git or docker, but it's probably for the simplicity of the implementation, so it might be a good call. It's also the one that is easiest to just use via import

I'd really appreciate input here to know where to spend my time.
Thanks in advance!

@sclevine
Copy link
Member

Hi @neerfri,

While I believe all the core contributors agree that this functionality is necessary, there are some things we need to consider before it's implemented.

  1. pack is a just a CLI for building on a local workstation that has Docker installed. The core tool used to implement CNB (the lifecycle) doesn't require a Docker daemon (or privileges, or nested containers) when run on a container platform like k8s. Any solution to this should not be pack-specific, and "pack" should not be in the filename.

    However, the functionality needs to be implemented outside of the lifecycle, as the lifecycle is a series of unprivileged processes that run entirely inside the container, and the app bits need to be available before the lifecycle is executed. I'd suggest implementing this as a cross-platform Go package in the lifecycle repository that creates a tar stream excluding the files specified in the ignore file(s). This binary would be careful to avoid copying the app files locally before streaming them. The pack CLI could import this library, but we should also offer a binary version for other platforms.

  2. We've discussed adding a project descriptor file (here and here). Does the ignore list belong in that file? Maybe not if we intend to support the file in subdirectories?

  3. Cloud Foundry's .cfignore and Heroku's .slugignore formats are also potentially worth evaluating.

  4. What files should we ignore by default? Example.

@neerfri
Copy link
Contributor Author

neerfri commented Jun 10, 2019

@sclevine thanks for your input.

pack is a just a CLI for building on a local workstation

This is actually why I thought that the .packignore is only relevant to pack. I assume that in a k8s the code gets to the builder in a volume, probably checked out to the volume from version control or archives, so .gitignore & friends already did their job. Is that assumption correct?

I'd suggest implementing this as a cross-platform Go package in the lifecycle repository that creates a tar stream excluding the files specified in the ignore file(s).

Sounds like a good idea.

  1. We've discussed adding a project descriptor file (here and here). Does the ignore list belong in that file? Maybe not if we intend to support the file in subdirectories?

I don't know how fast these are going to be implemented, I see no reason why they won't play nice together in the future. The app.toml can be used instead of the --ignore-file and maybe in that case change it's behavior to be based of the project.root.

  1. Cloud Foundry's .cfignore and Heroku's .slugignore formats are also potentially worth evaluating.

the .cfignore implementation looks very simple. which is great, From first glance it should be trivial to implement using this.
The idea of a generic .slugignore for all types of software that builds artifacts from whole directories is a great idea too. heroku's .slugignore does not support negated patterns, so it's probably not enough for pack's use case. heroku's use case was designed to go through git's ignore logic first, and then ignore more on top of it.

I guess that what we can learn from heroku's .slugignore is that .gitignore itself is not enough. For example, teams are checking in their tests to git, but might not want to add these files to the built artifact.

  1. What files should we ignore by default? Example.

I believe the default behavior should be:

  • if a .packignore (or other) file exists in build root: use it.
  • if the user specified a --ignore-file=path/to/ignore/from/cwd/or/absolute: use it.
  • if the user did not specify --ignore-file and .packignore does not exist: print warning

If our implementation is similar to git we can also default to pick up .gitignore by default.
In fact, as heroku's .slugignore shows, it's possible that many users will want to use both .gitignore and .slugignore/.packignore file together to prevent unneeded files from reaching the final image while keeping them in version control.

It's now more obvious why we should not be using the .dockerignore format, cause the whole point is that pack users are not using docker to build, so they don't need a .dockerignore

Summing it up, I think the best course of action is to start with a .packignore file using the implementation of .cfignore.

Another very simple idea that can spare us the fear of decision is to annotate the ignore file using:
# packignore v1 in the first block of comments in the file (before any patterns started).
This will allow us to introduce changes to the ignore format while maintaining backward compatibility.

Any objections?

/cc @djoyahoy @ekcasey @ameyer-pivotal

@sclevine
Copy link
Member

This is actually why I thought that the .packignore is only relevant to pack. I assume that in a k8s the code gets to the builder in a volume, probably checked out to the volume from version control or archives, so .gitignore & friends already did their job. Is that assumption correct?

I'm not sure if this assumption is correct. Consider that some cloud platforms support pushing source code from a workstation or from CI, without using VCS. For example, the v2 Java buildpack on CF requires you to provide a JAR. I think a unified ignore file (maybe .cnbignore) that ignores the same files on different platforms that support the same buildpacks could be more ideal (but I'm not sure).

I don't know how fast these are going to be implemented, I see no reason why they won't play nice together in the future. The app.toml can be used instead of the --ignore-file and maybe in that case change it's behavior to be based of the project.root.

Personally, I'm okay defining a separate ignore file that is unrelated to the app descriptor. We'll see how other core contributors feel.

I believe the default behavior should be:

  • if a .packignore (or other) file exists in build root: use it.
  • if the user specified a --ignore-file=path/to/ignore/from/cwd/or/absolute: use it.
  • if the user did not specify --ignore-file and .packignore does not exist: print warning

If our implementation is similar to git we can also default to pick up .gitignore by default.

I was really trying to ask what files should be ignored even when they aren't specified in .cnbignore. E.g., https://github.com/cloudfoundry/cli/blob/master/cf/appfiles/cf_ignore.go#L81-L86 are ignored by default, even if they aren't in .cfignore. I think there's a set of filenames we can safely ignore in all cases (and perhaps force users to whitelist if they really need them).

For an MVP of this feature, I'd just expect .cnbignore in the app directory or any subdirectory to be respected (no need for a flag).

Another very simple idea that can spare us the fear of decision is to annotate the ignore file using:
# packignore v1
in the first block of comments in the file (before any patterns started).
This will allow us to introduce changes to the ignore format while maintaining backward compatibility.

Not sure if this is necessary given the simplicity of the file format (standard .gitignore syntax), but if we can think of cases where the behavior of syntactically valid .cnbignore files could change, then definitely worth considering.

@neerfri
Copy link
Contributor Author

neerfri commented Jun 11, 2019

@sclevine

I like the .cnbignore idea.

Regarding the default values I tend to favor explicit solutions over easy-to-onboard ones and then make them easy to onboard. For this we can introduce a one-time call to pack init that will generate a default .cnbignore file. This will also come handy when buildpack.toml in the app folder happens.

I think it's nice to warn if I'm not using any ignore file. Many docker beginners I meet work without a .dockerignore file for a long time and don't understand why it's so slow. This helps users learn gradually they are doing something wrong.

The reason I got to this issue is that as part of our development we have a tmp/webserver.sock file in the project that was failing pack from archiving the directory (see #208). While investigating it I ran into additional problems, for example if a build log is being written to the app root directory (on the host running pack) the tar archive will also fail because the reported size from the os.Stat is smaller than the actual data read (new log lines are written all the time).

I'd just expect .cnbignore in the app directory or any subdirectory

Agree. If the file is there with the proper name, we just use it. we don't have to implement the --ignore-file right now.

Any subdirectory? this might make this a bit more complex than I imagined...

given the simplicity of the file format (standard .gitignore syntax)

After reviewing 7 different implementations of .xignores, non of them is the same 😄
We can always introduce a way to detect a newer version of the ignore format when we introduce the chnage.

@neerfri
Copy link
Contributor Author

neerfri commented Jun 11, 2019

Including git's documentation about .gitignore: https://git-scm.com/docs/gitignore

Also see an example of an issue from helm due to the fact .helmignore does not work like .gitignore: helm/helm#3622
It seems many users are looking for a whitelist solution, for better lockdown of files leaking into production images, something in the lines of:

/*
!/src/*
!/vendor/*
!/Depsfile.lock
...

The git documentation has a similar pattern explained.

@ekcasey
Copy link
Member

ekcasey commented Jun 11, 2019

@neerfri @sclevine

I think I am in favor of implementing .cnbignore as a separate file that follows .gitignore syntax b/c it seems like a familiar and battle hardened format. It is unfortunate that nobody has a easily importable version of this and that there is so much sprawl among .xignore implementations.

On the topic of printing a warning when there is no .cnbignore file, I wouldn't expect every user to want or need a .cnbignore file and therefore I am not inclined to print a warning. I understand the argument around discoverability but it is annoying to receive a bunch of warning when the tool is behaving as expected and has received valid inputs.

I am also in favor of having directories like .git ignored by default (it's hard to think of a use case where these should be included) so that the simplest case of pack build does a simple but sane thing by default and users can learn about features like the .cnbignore and the app descriptor file as they become relevant.

@neerfri
Copy link
Contributor Author

neerfri commented Jun 11, 2019

@ekcasey thanks for the feedback. I share your thoughts on the .xignore.
I'll try to work on a sane implementation that mimics the code in git's codebase and use the tests there to drive the implementation.
Regarding the warning, I think that's a UX/DX issue that can also be left for a future improvement in case users will have an issue discovering the .cnbignore file functionality. If they will discover this too late after their secrets are publish in public docker repositories I assume they will ask for a warning. Git for comparison shows you each an every file that goes in to the commit, so these surprises can also happen to fat finger scenarios.

Where we disagree is about the defaults. I feel very strongly that pack nor .cnbignore should have no defaults.

I think the main goal here is to make pack viable & sane. As of today pack will build my rails app locally along with its temporary cache files, the test.log and development.log and node_modules (each about 500MB). These are way worse than .git and can not be considered sane defaults.

The way around it right now is to do git archive | tar xv and than pack build. Doable, but not fun.

As a counter example to the one you provided, using the .git dir during build is a very common thing, at least among beginners & small projects. It is normally used to provide runtime information on the currently running code and sometimes the branch it was built from. Some even use the git command during runtime to answer all sorts of introspection questions about the current running code (for example last N commit messages).

Defaults bring many new problems, such as how do I undo the defaults? how do I know that there are defaults and what are they? Is addition/removal of a default ignore pattern a breaking change (because it is...)? What is the downside to have your .git dir in your build when you don't need it?

I think these defaults are better left for a future improvement after we see which issues are created due to the lack of defaults.

As of today Linus has found no need to add any defaults to .gitignore and I tend to value his choices :-)

@sclevine
Copy link
Member

sclevine commented Jun 11, 2019

As of today pack will build my rails app locally along with its temporary cache files, the test.log and development.log and node_modules (each about 500MB). These are way worse than .git and can not be considered sane defaults.

Developers who build in offline containers often intentionally push their node_modules or even npm cache directory. To me, this is about user expectations for how the tool should behave, not a language-specific set of defaults.

I feel like users expect VCS artifacts not to be included in their build. I could certainly be wrong about this, but as far as I'm aware, excluding those artifact has never caused confusion on Cloud Foundry. Additionally, .git can be very large if you have lots of history, if you vendor your dependencies, or if you use git to track binary artifacts.

how do I undo the defaults?

Whitelist with !.

What is the downside to have your .git dir in your build when you don't need it?

In addition to the size, you could accidentally leak private source code into a public image.

As of today Linus has found no need to add any defaults to .gitignore and I tend to value his choices :-)

Except .git, of course 😄

Curious about what @hone, @jkutner, and @nebhale think.

@neerfri
Copy link
Contributor Author

neerfri commented Jun 12, 2019

Developers who build in offline containers often intentionally push their node_modules or even npm cache directory. To me, this is about user expectations for how the tool should behave, not a language-specific set of defaults.

Agree 100%. This is exactly what I was trying to say. This is also the reason why I think every codebase using pack should have a .cnbignore file. In the same way that every codebases that builds with docker should have a .dockerignore (for a sane ADD . .). This is also why I think pack init is a good direction. it shows me (via git status) that I should see this file. It allows pack to suggest sane defaults by using a common template embedded in the pack binary and refer to common ignore patterns via commented out lines.

In addition to the size, you could accidentally leak private source code into a public image.

That's true! You certainly convinced me it should be a pattern in the .cnbignore file generated by pack init :-D

Except .git, of course 😄

I know it was a juke. But it's a sneaky one cause .git is not really ignored. It is the singularity :-D

@jkutner
Copy link
Member

jkutner commented Jun 12, 2019

There’s a lot going on here, and it’s hard to distill what is being proposed. I’d like to see an RFC that formalizes things and includes the rational.

fwiw, I’d prefer ignore patterns in {app|project|project}.toml over a new file.

@jromero jromero added this to Icebox in Planning Board Aug 27, 2019
@josegonzalez
Copy link

There seems to be a golang-based implementation of gitignore here, which could be useful if implementing this.

I also agree that this should be in an existing toml file. Platforms such as Heroku or CF might generate that toml file based on other files in the app, and would likely include support for the existing .cfignore or.slugignore files as part of their own platform. Thats at least the tactic that Dokku will take.

@jkutner
Copy link
Member

jkutner commented Oct 23, 2019

Our hope is to resolve this by implementing buildpacks/rfcs#25. If that RFC is accepted and implemented, then I suspect we will not adopt a .packignore file.

@jromero jromero added the status/triage Issue or PR that requires contributor attention. label Feb 5, 2020
@natalieparellano natalieparellano added status/not-planned Issue or PR that is not planned to be supported (yet). and removed status/triage Issue or PR that requires contributor attention. labels Feb 5, 2020
@natalieparellano
Copy link
Member

Closing this issue as the pain point should be addressed with buildpacks/rfcs#25.

Planning Board automation moved this from Icebox to Ready to Accept Feb 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/not-planned Issue or PR that is not planned to be supported (yet).
Projects
No open projects
Planning Board
  
Ready to Accept
Development

No branches or pull requests

7 participants