Skip to content

Improve the criteria for adding multistage build #16

@duaraghav8

Description

@duaraghav8

Currently, if the Dockerfile being analysed is single-stage and AI is enabled for Dockershrink, DS will adopt Multistage by creating a final stage.
The original premise behind this was that it is always good to have a "final" stage in Dockerfile which only contains things required at runtime, ie, nodejs runtime + dependencies + app code.

We need to add another condition:

Only adopt multistage if the resultant Dockerfile is doing something meaningful during its build stage.

eg 1:

Original Dockerfile

FROM node:22-alpine

EXPOSE 5000

WORKDIR /app

RUN npm install --omit=dev

CMD ["npm", "start"]

There is no benefit in adding a new "final" stage in this dockerfile.
It is already using a light base image and is essentially only installing dependencies and running the app.

eg 2:

Original Dockerfile

FROM node:22-alpine

EXPOSE 5000

WORKDIR /app

RUN npm install && npm run build

CMD ["npm", "start"]

This dockerfile would genuinely benefit from multistage. The build stage would install all dependencies and run the build processes.
The final stage would perform a fresh dependency install, excluding devDependencies and only run the application.

To sum up, the additional check is:

If the Dockerfile is single-stage and it only installs dependencies and runs the application, then DO NOT adopt multistage.
But if additional tasks are being performed (eg- build, test, lint, format, merge/minify code, etc), then keep these in `build` and put prod dependencies and app run commands into final stage.

Projects analysed:

  1. Raneto
  2. Haste
  3. stiolabs
    (and many more projects! This is a very common pattern)

NOTE: Running DS over these projects without AI did a great job (in line with expectations)


Ideal solution for this would be to modify the prompt and tell LLM to follow this.
After some experimentation, it seems like gpt-4o doesn't follow this but o1 does.
Plus DS currently uses gpt 4o 08-06 model version. Upgrading to Nov release REALLY messed up the whole multistage change. So further tests needed.
Consider enhanving the prompt with examples. Input: ..., Output: ...
See branch with changes

Other solutions (more effort involved):
Rule engine or custom ML model to classify the stage is either "needs multistage" or "doesn't need multistage".

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestpriorityHigh priority issue

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions