Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve logging experience #1840

Closed
2 of 4 tasks
bnjjj opened this issue Sep 20, 2022 · 7 comments
Closed
2 of 4 tasks

Improve logging experience #1840

bnjjj opened this issue Sep 20, 2022 · 7 comments
Assignees
Labels
component/logging enhancement An enhancement to an existing feature

Comments

@bnjjj
Copy link
Contributor

bnjjj commented Sep 20, 2022

When you need to debug the router or find why you received an error from the router it's often a pain and can take time to discover what's going on. For example if you have to debug an error reported by the router and just wants to know what was the request sent and raw response received by a subgraph, for now it needs to add logs directly in the router codebase.

Another example, sometimes we throw an error in the graphql errors array with less context and cannot find the equivalent error with more context in our logs.

What I would like is to have a simple mechanism to find related logs to a GraphQL error just by grepping our logs. We could add a unique identifier generated for a specific error and provide this ID in extensions

Example:

{
  "errors": [
    {
      "message": "HTTP fetch failed from 'inventory': service 'inventory' response was malformed: service 'inventory' response was malformed: invalid type: string \"/\", expected a sequence",
      "path": [
        "allProducts",
        "@"
      ],
      "extensions": {
        "error_id": "123e4567-e89b-12d3-a456-426614174000", // This error code can be used to grep into our logs
        "type": "SubrequestHttpError",
        "service": "inventory",
        "reason": "service 'inventory' response was malformed: service 'inventory' response was malformed: invalid type: string \"/\", expected a sequence"
      }
    }
  ]
}
2022-09-20T09:11:22.481856Z ERROR an error occured due to ... error_id="123e4567-e89b-12d3-a456-426614174000"

in logs error_id="123e4567-e89b-12d3-a456-426614174000"

Also we could add a logging section in telemetry to configure if we want to log requests and raw responses from subgraphs, if we want to add some redacting to not leak any sensible information.

telemetry:
  logging:
    format: pretty # Could be json or pretty
    on_graphql_error: true # Display a log when we return a GraphQL error
    on_http_error: true # Display a log if status code >200
    every_requests: false # If enabled we log every requests
    subgraphs:
      all:
        request: 
          enabled: true # Log all requests for subgraphs
          redact: # Redact some part of the log
          - match: secret*
            to: REDACTED
        response: 
          enabled: true # Log all responses from subgraphs
          redact: # Redact some part of the log
          - match: secret*
            to: REDACTED
        
      my_subgraph:
        request: 
          enabled: true # Log all requests for subgraphs
          redact: # Redact some part of the log
          - match: secret*
            to: REDACTED
        response:
          enabled: true # Log all requests for subgraphs

Sub tasks:

@bnjjj bnjjj added the enhancement An enhancement to an existing feature label Sep 20, 2022
@abernix
Copy link
Member

abernix commented Sep 20, 2022

This is great! A few things come to mind immediately:

  1. We have enough other logging issues that I created a component/logging label for it. 🏷️
  2. I don't know that we want to couple telemetry and logging coupled together, even if it is true that logs are attached to telemetry spans (which is very nice!). I suspect we might want to build a top-level concept around logging in our configuration. 🪵
  3. Perhaps in a similar vein, we might want to have a top-level concept around data redaction, which could cover various avenues in a consistent way: telemetry, logging, studio, etc. 💼

@abernix
Copy link
Member

abernix commented Sep 20, 2022

Oh, could the title of this issue be a bit more specific?: maybe "subgraph request/response logging"?

@bnjjj
Copy link
Contributor Author

bnjjj commented Sep 20, 2022

It's not specific to subgraph request/response logging. It's more global, but yes indeed in this description I was mainly focus on that topic. But for example the field log_on_graphql_error is not related, also the error id is a global thing. I think it's still ok to not specify subgraph in the title.

@abernix
Copy link
Member

abernix commented Sep 20, 2022

Ok, fair! Let's leave it how it is for now and then consider breaking this into individual issues in the future.

(In the general sense "Improve logging" is, I think, too generic and I wouldn't want this to turn into an issue we can't concretely close just because it touches on too many things.)

@garypen
Copy link
Contributor

garypen commented Sep 20, 2022

The error_id is an obvious winner. With JSON, arguably better than grep since:

router <blah> | jq '.errors[] | select ( .extensions.error_id == "123e4567-e89b-12d3-a456-426614174000")'

will filter out all the errors with that exact ID.

@piclemx
Copy link

piclemx commented Jun 16, 2023

Will be great also to add other patterns for logging. Currently this goes directly inside stdout and for us that's not a viable option currently since we are using multi-tenant k8s cluster. We decided to pipe the log to a file but that doesn't take care of the file rotation and everything.

@BrynCooke
Copy link
Contributor

Going to close this in favor of #3226

@BrynCooke BrynCooke closed this as not planned Won't fix, can't repro, duplicate, stale Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/logging enhancement An enhancement to an existing feature
Projects
None yet
Development

No branches or pull requests

5 participants