Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to composer.json to remove files on production #5367

Closed
mlocati opened this issue May 23, 2016 · 56 comments
Closed

Add an option to composer.json to remove files on production #5367

mlocati opened this issue May 23, 2016 · 56 comments

Comments

@mlocati
Copy link
Contributor

mlocati commented May 23, 2016

Archives automatically generated by GIT (eg https://github.com/symfony/symfony/archive/v3.0.6.zip ) contain a copy of the repository.

These archives may contain tests/examples/docs that are not necessary on production machines (and could potentially lead to security issues).

Developers may tell git to exclude files/folders via an export-ignore option in the .gitattributes file.
BTW, someone may still want docs/tests included in packages during the development phase (that's for instance the main reason why big projects like Symfony are keeping tests/docs in packages).

What about improving Composer to handle this problem?

I mean, what I'm thinking about is to add something like this to to composer.json:

{
    "name": "symfony/class-loader",
    "type": "library",
    "devFiles": [
        "/Tests",
        "/CHANGELOG.md",
        "/phpunit.xml.dist",
        ".git*"
    ]
}

We could then add to composer a new option (eg --no-dev-files): if this option is specified, Composer should delete the files/directories that satisfy the rules specified in the devFiles section of composer.json.

This approach would make everyone happy...

@alcohol
Copy link
Member

alcohol commented May 23, 2016

First of all, composer should not be run on production (note, should not, not cannot).

Second, there have been many requests for options similar to .gitattributes export-ignore, etc. Composer is not going to provide this. This should be a part of your packaging process/job.

@alcohol alcohol closed this as completed May 23, 2016
@mlocati
Copy link
Contributor Author

mlocati commented May 23, 2016

@alcohol @kamilsk I think you misunderstood what I mean.

Let me explain the origin of this issue.

concrete5 uses a lot of composer packages. When building a new release archive to be downloaded by end-users, we run a composer install.

Doing so, we have a lot of unnecessary files (tests, docs, examples, ...) that increase the size of the archive, and it may results in some problems (security issues, version disclosures, ...).
So, since currently there's no automatic way to determine the production-only files, we need to manually parse all the files and directories of all the composer packages, trying to understand which are the unnecessary ones (see our cleanup-script).
This manual operation must be done every time a composer package gets updated.
Furthermore this is somehow dangerous: we need to deeply analyze every package, to determine the files to be removed.

Ideally, package maintainers could add export-ignore to the files that are not required for production, but many package developers don't want to do it because using git clone may be really slow and because they want tests to be run even with a composer install.

So, at the moment there's basically no way to tell which are the production-only files.

What I'm suggesting here is a solution of this problem.

Do you see any alternative?

@kamilsk
Copy link

kamilsk commented May 23, 2016

@mlocati, you can create plugin and use extra for this purpose. In any case, you will request the vendors of the packages to add a list of dev files. Whether it's devFiles or extra:devfiles, this list will need to fill out.

@mlocati
Copy link
Contributor Author

mlocati commented May 23, 2016

you can create plugin and use extra for this purpose

I think that if composer would officially adopt a solution like this, the maintainers of all the packages would be more positive about a request to keep such devFiles entries updated.

That's an issue that many have... For instance, here's a list of Symphony-related stuff:

As you can see, it's a very recurring issue, and an "official" solution could be appreciated by a lot of people...

@alcohol
Copy link
Member

alcohol commented May 23, 2016

It's only an issue to people making an issue out of it :-)

@mlocati
Copy link
Contributor Author

mlocati commented May 23, 2016

It's only an issue to people making an issue out of it :-)

...as everything 😉

@kamilsk
Copy link

kamilsk commented May 23, 2016

@mlocati
Copy link
Contributor Author

mlocati commented May 23, 2016

One NoToast exception may be fake, many of them could be a signal that something should be taken in account

@kamilsk
Copy link

kamilsk commented May 23, 2016

I think that if composer would officially adopt a solution like this

I don't think so. After that we have two options to resolve similar problem: .gitattributes and composer.json:devFiles. But first like "pre-hook" (remove before archiving), and second like "post-hook" (remove after delivering).

Do you know that the symfony maintainers ready to support this?

@mlocati
Copy link
Contributor Author

mlocati commented May 23, 2016

After that we have two options to resolve similar problem: .gitattributes and composer.json:devFiles. But first like "pre-hook" (remove before archiving), and second like "post-hook" (remove after delivering).

.gitattributes has already been taken in account, and discarded.

Do you know that the symfony maintainers ready to support this?

I don't know... Let's see what @fabpot @webmozart @nicolas-grekas @Tobion think about this devFiles alternative to .gitattributes...

@mlocati
Copy link
Contributor Author

mlocati commented May 23, 2016

PS: composer itself too has the need to strip out development stuff (see here)... In the case of composer it's easy to keep the list of production-useless stuff up to date.
In other projects it's quite hard (for concrete5 we have 100+ composer packages).

@alcohol
Copy link
Member

alcohol commented May 23, 2016

@mlocati but that is part of our packaging process/job (or "compile" step, if you prefer). It is maintained by us. This is not a Composer feature :-)

@mlocati
Copy link
Contributor Author

mlocati commented May 23, 2016

@mlocati but that is part of our packaging process/job (or "compile" step, if you prefer). It is maintained by us. This is not a Composer feature :-)

@alcohol That's the problem that my proposal tries to fix!

You (composer) have to keep the list of stuff to be excluded from .phar, I have to keep the list of stuff to be removed from our distribution .zip archive, ... It's a problem that every project has (or should have) when generating the final distribution files.

Imagine if you simply use something like php bin/composer install -q --no-dev --remove-dev here: it would simplify the packaging process a lot.
In your case it's simple (include .php files only, exclude tests dirs), in other big projects it's a hard, manual, recurring, time-expensive and error-proning process.

@alcohol
Copy link
Member

alcohol commented May 23, 2016

@mlocati what problem? I don't see a problem.

Composer has nothing to do with our compile/build step.

@mlocati
Copy link
Contributor Author

mlocati commented May 23, 2016

@mlocati what problem? I don't see a problem.

As I described above, the problem is to strip out files that are not needed for the production. When building composer, it's an easy step (keep only .php files, remove test dirs). For big projects it's a problem: we need to parse every package and determine the list of production-useless stuff.

Composer has nothing to do with our compile/build step.

In your compile step you have to keep the list of stuff to be excluded from .phar.
For concrete5 I have to keep the list of stuff to be removed from our distribution .zip archive.
What I'm suggesting here is a way to automatize this hard, manual, recurring, time-expensive and error-proning process.

@alcohol
Copy link
Member

alcohol commented May 23, 2016

I think you misunderstood. I don't consider it a problem, period.

Also, implementing your proposal in Composer would not change our compile/build process at all since we do not use composer for this.

I can understand that your use-case is more complicated. But I simply do not agree that this is actually an issue that warrants attention/time. Storage space is negligible in this day and age. The cost (time, resources) it would require to implement your suggestion (and get everyone onboard to use it) would far exceed any gains it might provide.

Anyway, unsubscribing from this topic as I (personally) have zero further interest in discussing this.

@mlocati
Copy link
Contributor Author

mlocati commented May 23, 2016

we do not use composer for this.

You do: https://github.com/composer/composer/blob/1.1.1/bin/compile#L14

@Seldaek
Copy link
Member

Seldaek commented May 24, 2016

@mlocati IMO if someone builds a plugin that works well it could well become a community standard that most packages accept and just add their data in the extra key.. We can't solve everything in composer itself, because not everybody agrees how things should be done, and because our time to maintain things is finite.

@kamilsk
Copy link

kamilsk commented May 24, 2016

@Seldaek, @mlocati I can try

@kamilsk
Copy link

kamilsk commented May 24, 2016

Potential syntax looks like this https://github.com/kamilsk/Common/blob/3.x/composer.yml#L75-L78
Project will be located here https://github.com/octolab/Cleaner/blob/master/package.meta

@mlocati, any suggestions?

@mlocati
Copy link
Contributor Author

mlocati commented May 24, 2016

@Seldaek Thanks for your clean and wise answer

@kamilsk Great!
For the dev-files option I'd adopt a pattern syntax compatible with the one of .gitattributes. It's well known and people that will want to switch from .gitattributes/export-ignore to composer.json/dev-files will have to do less work.
There's one more improvement I'm thinking of.
Let's take for instance doctrine/dbal.
It may be used via command line (executing the command in its bin directory) as well as a PHP-library only.
In some projects, its command line version may not be used (so it could be removed), in other projects this CLI command is required.
What about adding some details about the files that could be removed?
I mean, something like this:

{
    "devFiles": {
        "docs": ["/CHANGELOG.md"],
        "tests": ["/Tests", "/phpunit.xml.dist"],
        "bin": ["/bin"],
        "others": ".git*"
    }
}

That way, people that may want to keep the "bin" could delete all the files except the "bin" ones.
PS: It's just a raw and odd idea 😉

@kamilsk
Copy link

kamilsk commented May 24, 2016

@mlocati, I think you can close this issue, I open related here octolab/cleaner#1 and try to complete all works on MVP until 31 May https://github.com/octolab/Cleaner/milestones

@mlocati
Copy link
Contributor Author

mlocati commented May 24, 2016

@mlocati, I think you can close this issue

@alcohol closed this (even before understanding the issue)

@kamilsk
Copy link

kamilsk commented May 24, 2016

yep, my mistake

@alcohol
Copy link
Member

alcohol commented May 24, 2016

@mlocati I simply do not agree it is an issue. Please do not speak on my behalf. :-)

@mlocati
Copy link
Contributor Author

mlocati commented May 24, 2016

@mlocati I simply do not agree it is an issue.

Maybe we have some language barrier here (obviously on my side): for me a GitHub "issue" is not the same as "bug", it's also something that's worth discussing about.

Please do not speak on my behalf. :-)

Sorry if I somehow offended you. The problem is that seeing an issue closed before a deeper discussion (like we've had after you closed this issue) is somehow hurting. It's like if you told me "shut out, don't say stupid things". GitHub is a collaborative place, where we all (love to) speak about software and the ways to improve it, isn't it?

@alcohol
Copy link
Member

alcohol commented May 24, 2016

@mlocati I am simply saying that I do not agree that there is a "problem" to solve here. Composer is a dependency manager. Not a packaging tool.

@ivastly
Copy link

ivastly commented Jun 6, 2017

@alcohol could you please clarify (provide a link or keyword), how exactly it is possible as a part of integration pipeline?

@alcohol
Copy link
Member

alcohol commented Jun 6, 2017

Pretty sure you can run a simple set of find X --delete recursively. After cleanup, create your deployable archive (tar, gz, etc) and voila, it's as small as you aim it to be.

@mlocati
Copy link
Contributor Author

mlocati commented Jun 6, 2017

find X --delete

What should we use for X?

@ivastly
Copy link

ivastly commented Jun 6, 2017

@alcohol Yes it could work, but only if i know X. I do know X for my own codebase, but i dont know and i can not know X for my 9000 dependencies. That is exactly the purpose of this feature request!

@alcohol
Copy link
Member

alcohol commented Jun 7, 2017

Le sigh. I'm unsubscribing from this thread.

@dan-da
Copy link

dan-da commented Jul 16, 2018

+1 for this feature.

@kamilsk
Copy link

kamilsk commented Jul 16, 2018

@dan-da, you can contribute to octolab/cleaner instead of continuing this thread. The composer is flexible and can be extended by plugins - thanks to the authors.

@danon
Copy link

danon commented Oct 20, 2019

I can see @alcohol stated numerously that he didn't consider this issue a real issue, without 0 explaination to why he thinks so :/

The only real argument I found was:

Composer is a dependency manager. Not a packaging tool.

Yes, I agree. So why can't dependencies' authors decide what's part of a "dependecy", and what parts are unrelated noice that's required inside VCS for other reasons?

@alcohol
Copy link
Member

alcohol commented Oct 21, 2019

Because it's just that, noise. If your OCD can't handle that, then deal with it in your packaging process. But there is no reason for the Composer team to waste time on building nor maintaining a feature that provides zero actual gain.

@danon
Copy link

danon commented Oct 21, 2019

@alcohol Ok, perhaps a map of excluded files is indeed hard to implement, test and maintain, I can agree on that.

But how about, instead of specifying a map of excluded files, let's specify a baseDirectory in composer.json. We could set the directory to src, and composer would only download that one folder, and ignore the rest. That sounds like an easy feature to implement, right? Certainly simpler than a map of exluded files.

@stof
Copy link
Contributor

stof commented Oct 21, 2019

We could set the directory to src, and composer would only download that one folder, and ignore the rest. That sounds like an easy feature to implement, right? Certainly simpler than a map of exluded files.

How do you download only a subfolder ? If you use source install, you cannot do it (well, with SVN you can do it, but most of the packages are using Git today, not SVN, and Git cannot do it). And for dist install, we can only download what the dist archive contains. We cannot magically download a trimmed-down archive (and most open-source packages rely on downloads endpoints of Github and Gitlab for instance, which don't have such sub-folder download endpoints)

@danon
Copy link

danon commented Oct 21, 2019

@stof There is git clone --filter flag, that allows for downloading only a specified directory.

For dist install, we can handle it with .gitattributes, so for dist, composer could still download whole package.

@stof
Copy link
Contributor

stof commented Oct 21, 2019

@danon and supporting gitattributes or no is not something done by composer at all, but by the packaging tool (github respects it for instance when building the download archive, and many packages already uses it)

@alcohol
Copy link
Member

alcohol commented Oct 21, 2019

It all comes down to value in the end. There is no value here. This discussion itself has already cost more than you would ever gain from it.

@danon
Copy link

danon commented Oct 21, 2019

@stof So why can't we have baseDirectory that composer could use, utilizing git clone --filter, to install only one folder in the dependency downloaders?

@ivastly
Copy link

ivastly commented Oct 21, 2019

@danon what's wrong with https://github.com/octolab/cleaner? Looks exactly what we are looking for.
Note about your baseDirectory proposal - sadly the usage of it would be very limited, because a huge amount of libraries, frameworks, etc. use more than one folder which can not be deleted on prod (folders like src, config, templates, etc.) So, baseDirectory does not exist most of the time.

@danon
Copy link

danon commented Oct 22, 2019

@danon what's wrong with https://github.com/octolab/cleaner? Looks exactly what we are looking for.

@ivastly I don't care about my local or hosted environment, i could delete noise files on my computer with ease 🗡

I care about users of my dependencies. I have test, ReadMe.md, other unrelated files in my VCS, and users of my package will have those files downloaded by composer.

I don't want to tell users to use octolab/cleaner or to add --prefer-dist when require'ing. I would just like to specify myself what parts should be in packagist, and what not.

Note about your baseDirectory proposal - sadly the usage of it would be very limited, because a huge amount of libraries, frameworks, etc. use more than one folder which can not be deleted on prod (folders like src, config, templates, etc.) So, baseDirectory does not exist most of the time.

True, it wouldn't, but at least it would work for some projects (ones that have a single directory).

PS: I'm sure you could use git clone --filter to clone more than one directory, so perhaps composer could allow for a flat array of folders that should be downloaded?

@alcohol
Copy link
Member

alcohol commented Oct 22, 2019

I would just like to specify myself what parts should be in packagist

Packagist does not store any of your files, only metadata.

You could remove these "noise" files from your repository in your tags. That way, no matter if the user downloads dist or source, they will not get these "noise" files.

@alcohol
Copy link
Member

alcohol commented Oct 22, 2019

But seriously, as I said before, there is absolutely no value here to be gained. The amount of time spend discussing this has cost more in terms of $$ than you could ever gain back from removing these files.

@mlocati
Copy link
Contributor Author

mlocati commented Oct 22, 2019

You could remove these "noise" files from your repository in your tags

Or you can add those files in your .gitattributes files (marking them with export-ignore)

@danon
Copy link

danon commented Oct 22, 2019

You could remove these "noise" files from your repository in your tags

Or you can add those files in your .gitattributes files (marking them with export-ignore)

That requires users to use --pref-dist, and not everyone knows that

@stof
Copy link
Contributor

stof commented Oct 22, 2019

@danon using dists is the default behavior of composer for releases (only dev branches are installed from source by default).

@danon
Copy link

danon commented Oct 22, 2019

Oh.

Nice!

What does --pref-dist flag do then?

@ivastly
Copy link

ivastly commented Oct 22, 2019

@mlocati okay, export-ignore sounds good, so why you have created the issue then? 😃 What is missing in export-ignore approach, which you want to implement in composer? Is it advanced regex syntax?

@mlocati
Copy link
Contributor Author

mlocati commented Oct 23, 2019

What is missing in export-ignore approach

@ivastly Someone may need all the files in the git repository without using git clone (which is the only way to get the whole archive contents in case you use export-ignore in .gitattributes) - see for example maxmind/MaxMind-DB-Reader-php#55

@liborm85
Copy link

liborm85 commented Nov 2, 2019

Exists several plugins to remove unwanted files, but none works very universally. That's why I started to develop new plugin https://github.com/liborm85/composer-vendor-cleaner , that cleans files and directories in vendor directory by glob pattern syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants