Improve logging experience #1840

bnjjj · 2022-09-20T09:20:01Z

When you need to debug the router or find why you received an error from the router it's often a pain and can take time to discover what's going on. For example if you have to debug an error reported by the router and just wants to know what was the request sent and raw response received by a subgraph, for now it needs to add logs directly in the router codebase.

Another example, sometimes we throw an error in the graphql errors array with less context and cannot find the equivalent error with more context in our logs.

What I would like is to have a simple mechanism to find related logs to a GraphQL error just by grepping our logs. We could add a unique identifier generated for a specific error and provide this ID in extensions

Example:

{
  "errors": [
    {
      "message": "HTTP fetch failed from 'inventory': service 'inventory' response was malformed: service 'inventory' response was malformed: invalid type: string \"/\", expected a sequence",
      "path": [
        "allProducts",
        "@"
      ],
      "extensions": {
        "error_id": "123e4567-e89b-12d3-a456-426614174000", // This error code can be used to grep into our logs
        "type": "SubrequestHttpError",
        "service": "inventory",
        "reason": "service 'inventory' response was malformed: service 'inventory' response was malformed: invalid type: string \"/\", expected a sequence"
      }
    }
  ]
}

2022-09-20T09:11:22.481856Z ERROR an error occured due to ... error_id="123e4567-e89b-12d3-a456-426614174000"

in logs error_id="123e4567-e89b-12d3-a456-426614174000"

Also we could add a logging section in telemetry to configure if we want to log requests and raw responses from subgraphs, if we want to add some redacting to not leak any sensible information.

telemetry:
  logging:
    format: pretty # Could be json or pretty
    on_graphql_error: true # Display a log when we return a GraphQL error
    on_http_error: true # Display a log if status code >200
    every_requests: false # If enabled we log every requests
    subgraphs:
      all:
        request: 
          enabled: true # Log all requests for subgraphs
          redact: # Redact some part of the log
          - match: secret*
            to: REDACTED
        response: 
          enabled: true # Log all responses from subgraphs
          redact: # Redact some part of the log
          - match: secret*
            to: REDACTED
        
      my_subgraph:
        request: 
          enabled: true # Log all requests for subgraphs
          redact: # Redact some part of the log
          - match: secret*
            to: REDACTED
        response:
          enabled: true # Log all requests for subgraphs

Sub tasks:

Request id Add id in logs in order to identify all the logs related to an error, to a request #1981
Support same kind of formatResponse from the gateway
Enable/disable specific logs (cf example of yaml file) Logging configuration to enable/disable specific logs #1998
Would love to have a log format feature like this

The text was updated successfully, but these errors were encountered:

abernix · 2022-09-20T09:28:11Z

This is great! A few things come to mind immediately:

We have enough other logging issues that I created a component/logging label for it. 🏷️
I don't know that we want to couple telemetry and logging coupled together, even if it is true that logs are attached to telemetry spans (which is very nice!). I suspect we might want to build a top-level concept around logging in our configuration. 🪵
Perhaps in a similar vein, we might want to have a top-level concept around data redaction, which could cover various avenues in a consistent way: telemetry, logging, studio, etc. 💼

abernix · 2022-09-20T09:28:59Z

Oh, could the title of this issue be a bit more specific?: maybe "subgraph request/response logging"?

bnjjj · 2022-09-20T09:37:28Z

It's not specific to subgraph request/response logging. It's more global, but yes indeed in this description I was mainly focus on that topic. But for example the field log_on_graphql_error is not related, also the error id is a global thing. I think it's still ok to not specify subgraph in the title.

abernix · 2022-09-20T09:40:53Z

Ok, fair! Let's leave it how it is for now and then consider breaking this into individual issues in the future.

(In the general sense "Improve logging" is, I think, too generic and I wouldn't want this to turn into an issue we can't concretely close just because it touches on too many things.)

garypen · 2022-09-20T15:28:52Z

The error_id is an obvious winner. With JSON, arguably better than grep since:

router <blah> | jq '.errors[] | select ( .extensions.error_id == "123e4567-e89b-12d3-a456-426614174000")'

will filter out all the errors with that exact ID.

piclemx · 2023-06-16T18:51:28Z

Will be great also to add other patterns for logging. Currently this goes directly inside stdout and for us that's not a viable option currently since we are using multi-tenant k8s cluster. We decided to pipe the log to a file but that doesn't take care of the file rotation and everything.

BrynCooke · 2023-06-20T10:51:24Z

Going to close this in favor of #3226

bnjjj added the enhancement An enhancement to an existing feature label Sep 20, 2022

abernix added the component/logging label Sep 20, 2022

This was referenced Oct 21, 2022

Add id in logs in order to identify all the logs related to an error, to a request #1981

Closed

Logging configuration to enable/disable specific logs #1998

Closed

BrynCooke assigned bnjjj Nov 18, 2022

BrynCooke mentioned this issue Nov 18, 2022

extend configuration to specify optional logging behaviour #1732

Closed

yanns mentioned this issue Feb 27, 2023

Personal data logged out in the json format #2695

Closed

abernix mentioned this issue Apr 25, 2023

define & document recommendations for common logging configurations #2997

Open

BrynCooke mentioned this issue Jun 6, 2023

Improve / formalize configuration of telemetry / logging / observability / metrics / tracing #3226

Open

28 tasks

BrynCooke closed this as not planned Won't fix, can't repro, duplicate, stale Jun 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve logging experience #1840

Improve logging experience #1840

bnjjj commented Sep 20, 2022 •

edited

Loading

abernix commented Sep 20, 2022

abernix commented Sep 20, 2022 •

edited

Loading

bnjjj commented Sep 20, 2022

abernix commented Sep 20, 2022

garypen commented Sep 20, 2022

piclemx commented Jun 16, 2023

BrynCooke commented Jun 20, 2023

Improve logging experience #1840

Improve logging experience #1840

Comments

bnjjj commented Sep 20, 2022 • edited Loading

abernix commented Sep 20, 2022

abernix commented Sep 20, 2022 • edited Loading

bnjjj commented Sep 20, 2022

abernix commented Sep 20, 2022

garypen commented Sep 20, 2022

piclemx commented Jun 16, 2023

BrynCooke commented Jun 20, 2023

bnjjj commented Sep 20, 2022 •

edited

Loading

abernix commented Sep 20, 2022 •

edited

Loading