Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache related service degradation. (Flows stop working, updates start failing with "Maximum call stack size exceeded") #22386

Open
a14e opened this issue May 4, 2024 · 1 comment

Comments

@a14e
Copy link

a14e commented May 4, 2024

Describe the Bug

What is happening?

Under a specific small load (described in the "To Reproduce" section), the following symptoms are observed:

  1. Increasing response timeouts over time.
    image
  2. Significant CPU usage increase without a corresponding change in memory usage.
    image
  3. After approximately 20-30 minutes under load, flows stop initiating
  4. 10 minutes later, error "under pressure" starts appearing
  5. After another 10 minutes, the following error log is generated
{"level":50,"time":1714684803087,"pid":28,"hostname":"9e4d1f5243de","err":{
"type":"GraphQLError",
"message":"Maximum call stack size exceeded",
"stack":"""RangeError: Maximum call stack size exceeded\n   
 at validatePayload (file:///directus/node_modules/.pnpm/file+packages+utils_vue@3.4.23/node_modules/@directus/utils/dist/shared/index.js:1271:25)\n   
 at file:///directus/node_modules/.pnpm/file+packages+utils_vue@3.4.23/node_modules/@directus/utils/dist/shared/index.js:1277:16\n  
  at Array.map (<anonymous>)\n  
  at validatePayload (file:///directus/node_modules/.pnpm/file+packages+utils_vue@3.4.23/node_modules/@directus/utils/dist/shared/index.js:1276:21)\n  
  at file:///directus/node_modules/.pnpm/file+packages+utils_vue@3.4.23/node_modules/@directus/utils/dist/shared/index.js:1277:16\n 
   at Array.map (<anonymous>)\n  
  at validatePayload (file:///directus/node_modules/.pnpm/file+packages+utils_vue@3.4.23/node_modules/@directus/utils/dist/shared/index.js:1276:21)\n 
   at file:///directus/node_modules/.pnpm/file+packages+utils_vue@3.4.23/node_modules/@directus/utils/dist/shared/index.js:1277:16\n  
  at Array.map (<anonymous>)\n  
  at validatePayload (file:///directus/node_modules/.pnpm/file+packages+utils_vue@3.4.23/node_modules/@directus/utils/dist/shared/index.js:1276:21)""",
"path":["update_Single_Answer_Task_item"],"locations":[{"line":24,"column":5}],"extensions":{}},
"msg":"Maximum call stack size exceeded"
}

Workaround

  1. Setting GRAPHQL_SCHEMA_CACHE_CAPACITY: 1 resolves the timeouts and errors. CPU and memory usage remain relatively unchanged. (Initially, 'GRAPHQL_SCHEMA_CACHE_CAPACITY' has not been set)
  2. The following chart illustrates changes after setting this flag:
    image
  3. I tried various settings, and you can see that none of my previous attempts had any effect. However, after setting GRAPHQL_SCHEMA_CACHE_CAPACITY: 1, the system became more stable, and the errors were resolved.
  4. Attempts to disable caches and flows showed no significant effects
  5. Enabling flows at 3:54 caused an increase in latencies

To Reproduce

Load type

  1. I am running a script for initial data loading without any other load.
  2. I send requests sequentially (no parallelism).
  3. There are approximately 10,000 entities. For each, I attempt to find it first; if not found, I then create or update the entity.
  4. Currently, I am not using batch operations.
  5. The error has been reproduced with update requests. I have not attempted to reproduce it with create operations.

My data

  1. The data structure is nested with a reference to a parent object. The script user has access to the parent object, which is visible to GraphQL. I do not update the relation in update requests.
  2. It includes nested JSON objects.
  3. Most fields are not included in the update requests.

My requests

  1. I use GraphQL to send requests, with only one query or mutation per HTTP request.
  2. I use Rust and the graphql-client library for making requests.
  3. Each request looks like
    3.1 for search
{
       "variables": {
         "id": "aca441bc-3e76-4409-9f96-fa112d37f506"
       },
       "query": """
       query FindSingleAnswerTask($id: ID!) {
           Single_Answer_Task_by_id(id: $id) {
               id
               status
               user_created
               date_created
               user_updated
               date_updated
               task_type
               description
               solution
               test_options
               suggestions
               source
               description_variants
               priority
               unit_id
           }
       }""",
       "operationName": "FindSingleAnswerTask"
     }

3.2 for update

{
  "variables": {
    "id": "aca441bc-3e76-4409-9f96-fa112d37f506",
    "data": {
      "status": "published",
      "description": "I want to ${...} a professional football player in the future.",
      "solution": "be",
      "test_options": [
        {
          "value": "be"
        },
        {
          "value": "bee"
        },
        ...
      ],
      "source": "..."
    }
  },
  "query": """
       mutation UpdateSingleAnswerTask($id: ID!, $data: update_Single_Answer_Task_input!) {
           update_Single_Answer_Task_item(id: $id, data: $data) {
               id
               status
               user_created
               date_created
               user_updated
               date_updated
               task_type
               description
               solution
               test_options
               suggestions
               source
               description_variants
               priority
               unit_id
           }
       }""",
       "operationName": "UpdateSingleAnswerTask"
}

Directus Version

10.10.7

Hosting Strategy

Self-Hosted (Docker Image)

@a14e a14e changed the title Cache related service degradation. (Flows stop working, updates start fails with "Maximum call stack size exceeded") Cache related service degradation. (Flows stop working, updates start failing with "Maximum call stack size exceeded") May 4, 2024
@br41nslug
Copy link
Member

Feels related to #19664

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🆕 Needs Triage
Development

No branches or pull requests

2 participants