Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assets: Handle transformation timeout errors #19684

Merged

Conversation

joselcvarela
Copy link
Member

Motivation

When requesting a asset transformation and if the transformation takes longer than ASSETS_TRANSFORM_TIMEOUT the response for the request will be 500 Internal Server Error.

Also, if the asset transformation fails, the transformation file is not deleted from storage ending with a 0 byte file.
This means if the transformation failed, for example, due to timeout and then if we increase the timeout and request the same transformation, it will always return the 0 byte transformation file.

Solution

Since we are using sharp module to handle assets transformations and since this is using libvips which is a C/C++ library to manipulate images, the option ASSETS_TRANSFORM_TIMEOUT is passed to this libvips library and when the timeout is reached, this library sends an error to sharp and at Directus level we just receive an Error object with message property equal to timeout: NN% complete\nVipsImage: killed for image YYYYY, where NN is the percentage at it was aborted and YYYYY is the temporary file name that libvips gave to the current file.

With that being said, the solution was just simply a try/catch block around the transformation pipe and respond with a 408 Request Timeout (which seemed the best suited status code).
Since 408 requests are retried, this can make the next request successful if server has more free resources for the new request.

Plus, we are deleting the transformation file in case the transformation went wrong and a 0 byte file was saved.
This way, server will try to do another transformation instead of returning an empty file.

@changeset-bot
Copy link

changeset-bot bot commented Sep 14, 2023

🦋 Changeset detected

Latest commit: e8f0ed4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
@directus/api Patch
directus Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@joselcvarela
Copy link
Member Author

In order to test this I have used these configurations, so I could test the transformation timeout:

docker-compose.yml
  directus-debugger:
    build:
      context: .
      dockerfile: Dockerfile-test

    ports:
      - 8055:8055
      - 9229:9229
    volumes:
      - ".:/app"
      - "./docker_node_modules:/app/node_modules"
    deploy:
      resources:
        limits:
          cpus: '0.4'
          memory: 512M
        reservations:
          cpus: '0.3'
          memory: 256M
    working_dir: /app/api
    command: node --inspect=0.0.0.0 dist/cli/run.js start
Dockerfile-test
FROM node:18-alpine

RUN apk add python3 make g++ vips-dev

RUN npm i -g pnpm@8.6.10

Copy link
Contributor

@azrikahar azrikahar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can confirm the try/catch works for me!

I'm just curious on others' thoughts on whether returning HTTP status code 408 is correct here?

I'm interpreting it as a status code to return when the user's request took too long to completely be sent to the server, especially since 4xx codes are generally for client side errors.

However in this transformation timeout case, it's technically the server, or rather libvips, that is timing out since it was not able to process the file-to-be-transformed in time.

That being said, I could be misinterpreting the status code entirely!

@paescuj
Copy link
Member

paescuj commented Sep 14, 2023

Can confirm the try/catch works for me!

I'm just curious on others' thoughts on whether returning HTTP status code 408 is correct here?

I'm interpreting it as a status code to return when the user's request took too long to completely be sent to the server, especially since 4xx codes are generally for client side errors.

However in this transformation timeout case, it's technically the server, or rather libvips, that is timing out since it was not able to process the file-to-be-transformed in time.

That being said, I could be misinterpreting the status code entirely!

Absolutely! If we do want to return another status code here, I guess it should be 503 Service Unavailable.

Copy link
Member

@br41nslug br41nslug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm it works 😄

However i do have my doubts about this 408 error. Browser do try to repeat the request which can be both a blessing and a curse 😬

  • If the timeout is a one-off then you want it to retry because the chance is high it will succeed the next attempt
  • If the timeout is due to constant highload/not enough resources then retrying the request will only make the problem worse for the server

I tend to agree with @azrikahar that 4xx errors are client errors and what we're throwing here is a server error.

@joselcvarela
Copy link
Member Author

After reading the comments and check the reference again, you guys are right.
I've dropped the RequestTimeoutError in favor of ServiceUnavailableError

api/src/services/assets.ts Outdated Show resolved Hide resolved
api/src/services/assets.ts Outdated Show resolved Hide resolved
api/src/services/assets.ts Outdated Show resolved Hide resolved
api/src/services/assets.ts Outdated Show resolved Hide resolved
@paescuj paescuj enabled auto-merge (squash) September 15, 2023 13:20
@paescuj paescuj merged commit 9819049 into directus:main Sep 15, 2023
7 checks passed
@github-actions github-actions bot added this to the Next Release milestone Sep 15, 2023
@joselcvarela joselcvarela deleted the assets/improve-transformation-timeout-error branch September 15, 2023 14:16
joselcvarela added a commit that referenced this pull request Sep 18, 2023
Co-authored-by: Azri Kahar <42867097+azrikahar@users.noreply.github.com>
Co-authored-by: Pascal Jufer <pascal-jufer@bluewin.ch>
Co-authored-by: ian <licitdev@gmail.com>
br-rafaelbarros pushed a commit to personal-forks/directus-source that referenced this pull request Nov 7, 2023
Co-authored-by: Azri Kahar <42867097+azrikahar@users.noreply.github.com>
Co-authored-by: Pascal Jufer <pascal-jufer@bluewin.ch>
Co-authored-by: ian <licitdev@gmail.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants