New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chores: editorconfig and dockerfile #30
Conversation
Hey |
No. I say we should actually remove the dockerfile changes from this commit. First I have already implemented the removal of apt cache in my own PR and the rest actually conflicts with my own changes. Second, I disagree with the approach on reducing layers. The goal should not be to minimize layers at the expense of readability and also development needs. For example, in this commit, if anything in the code base changes the dockerfile will now copy the changes and then reinstall everything including things that are not dependent on code changes because we merged it into one line. As a comparison please take a look at what i did in my own PR. The only thing I grouped into a single layer was the apt stuff because that logically only needs to occur once. The rest, including the copying of files, is split into individual lines. So now if I change a file, only that and related files will be added because docker will now reuse previous layers that were unchanged. If we group it all into a single line then even if a part of a layer changes it will recompute the entire layer. This is a waste of resources and will actually INCREASE build time for developers which is the only build time that matters because users will only need to built it once whereas devs will build multiple times. Also, docker automatically will recompute any unchanged layers if they occur after a changed layer, thus order matters too and ideally copy commands should as late as possible. This why I copy the large .pbmm and .scorer files first and THEN the code from this project. This way for a minor kb sized change it won't copy the gigabyte length .pbmm file. Please remove the dockerfile from this commit and instead just use the dockerfile that I published. Basically, there is no point trying to optimize that which Docker is already trying to optimize for us. There is a benefit from having more layers not less. The only time fewer layers can potentially make sense is when you publish the image for others and you want to simplify the amount they have to download and process, in which case there are other tools available that will just "squash" multiple layers into one after the fact. But for a development utility, there is no point trying to marginally minimize layers like that. |
Just going to put my rational forward.
The image isn't just for development needs, it can be used by users.
However, you should then consider spacing out your image in development, not the final image that's pushed and going to be used by users in production.
I don't believe this Dockerfile is just for a "development utility".
Have fewer layers can be helpful, especially for future reference. The final image retains all changes of the layers before. As the build process gets larger, the more unnecessary changes get brought along to the final image. I'm the one who opened an issue suggesting adding a |
I understand what you are saying and if one chooses to follow your style of development then your changes certainly make sense. I just disagree with that style. To me end users should just expect to run a docker pull autosub command and get what they want. Users have no business building their own images. In which case, dockerfiles should be made for the developers since we are the ones running the build command every time we want to test a change. So, in my personal opinion Dockerfiles are a development utility exclusively. Docker images are for the user. Also, I don't understand what you are saying in that last part there? Who cares what extraneous things are included in intermediate layers as long as it saves me time during the development build process? And you said yourself the size differences here were marginal, so no problem the way I see it. |
The extra things in intermediate layers impact the size of the final image. (Including what is pushed to Docker Hub.) In this particular image, the different was marginal, I agree. I'd still consider it a good practice in general as for many images it can save a lot more. Rather than experimenting with "Is it worth it?" before deciding, it's faster to just reduce layers in the first place. For example, the change here reduces the image from 236 MB to 160 MB: This reduces the image from 2.91 GB to 1.96 GB: This reduces the image from 2.37 GB to 1.61 GB: For all of them, the main change is merging the |
I like what you were trying to achieve as optimizations should be made where possible and image size is great to reduce, but this type of optimization is just plain wrong ... in my opinion. You are entitled to yours. Take it or leave it. I personally don't like having to download an entire gigabyte file and reinstall entire packages that never changed every time I add a comma to one of the python files. |
Fixes extraneous colon character in SRT/VTT files, split run-on sentences, 3 digit milliseconds This is merge was done manually based on pull request abhirooptalasila#32, after removing PR abhirooptalasila#30 which was merged before it was ready. abhirooptalasila#30 contained many unnecessary format changes caused a problems when merging other branches, it's easier to remove the single commit from abhirooptalasila#30, and just re-implement the best parts as a commit on top. See [1] for more details. Rewriting public history is generally frowned upon, but it's the path of least resistance here. [1] abhirooptalasila#32 (comment)
Adds configuration file for EditorConfig [1] This commit was split out by @shasheene from the monolithic commit in PR abhirooptalasila#30, which contained .editorconfig changes, some unrelated Dockerfile changes and a large amount of formatting changes all in a single commit. [1] https://editorconfig.org/
This merge simply contains the .editorconfig changes from abhirooptalasila#30. The merge was done manually based on the commit in abhirooptalasila#30, after first removing it. abhirooptalasila#30 which was merged before it was ready, and contained a single monolithic commit with format changes causing when merging other more substantive branches. This commit was split out by @shasheene from the monolithic commit in PR abhirooptalasila#30, which contained .editorconfig changes, The unrelated Dockerfile changes and the formatting changes can be applied as separate commits on top, after other work has been merged. It was easier to remove the single commit from abhirooptalasila#30, and just re-implement the best parts as a commit on top. See [1] for more details. Rewriting public history is generally frowned upon, but it's the path of least resistance here. [1] abhirooptalasila#32 (comment)
OK, so as briefly discussed in this comment thread, this pull request was merged before it was ready and is causing some headaches, so it's been reverted and re-applyed. Basically, it's 3 logically distinct commits squashed into one:
It's the formatting/whitespace/layout changes have caused a huge number of conflicts (and wasted time for me) as I try to merge the several Pull Requests and forks from the past several months. The Dockerfile changes here are relatively trivial to reconstruct compared to @yash-fn's branch that adds GPU Dockerfile changes. So what I've done is I've split the commits into the .editorconfig commit (which I've applied already) and the Dockerfile changes. There is debate on the merits of the changes above, but the key point is the actual changes being done in the Dockerfile can be pretty trivially reconstructed on top of origin/master after @yash-fn has been merged. Also applying the EditorConfig formatting changes across the whole codebase can be done as a commit on top too, if necessary. I am fully aware that rewriting public git history is deeply frowned upon, but in this case it's the path of least resistance to save time resolving necessary git conflicts. |
Revert EditorConfig and clean up #30
Applies the .editorconfig auto-formatter updates now that all existing Pull Requests and tasks have been resolved. This was done done using Pycharm: for each Python file, add character, save (which applies the .editorconfig automatically), then delete the added character and save again. Ref abhirooptalasila#30
Applies the Jetbrain Pycharm feature of autoformatting whitespace now that all existing PR's have been merged. Ref abhirooptalasila#30
Uses Pycharm to remove unused imports (and re-order existing ones), which can be cleanly applied now that all existing PR's have been merged. Ref abhirooptalasila#30
I was just exploring the code but wanted to tidy up a bit while I was at it.
EditorConfig
Adds an EditorConfig file, so contributors can use consistent code styles with the rest of the project.
This makes it more convenient to work on the project when a developer's editor settings aren't the same as yours.
Optimizes the Dockerfile
This reduces the number of intermediate layers, and makes the build marginally faster.
Its good practice to minimize layers in a Docker image to make the final image smaller, especially if there's a lot of file system changes between layers. In this case, the difference in size is negligible.
Optimize apt-get
/var/lib/apt/lists
. (Reduces image size by 0.02 GB.)More info:
Before
After