Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: add parquet file name to PARQUET_SCAN trace in the profiler #2913

Closed
hamilton opened this issue Jan 12, 2022 · 2 comments
Closed
Labels

Comments

@hamilton
Copy link
Contributor

When profiling a query, duckdb gives a bit of useful information about what table / entity is being scanned via SEQ_SCAN, but does not provide comparable information for PARQUET_SCAN, making it difficult to inspect the query sources.

The trace for SEQ_SCAN has this:

   {
      "name": "SEQ_SCAN",
      "timing": 0.000004,
      "cardinality": 1,
      "extra_info": "tmp\n[INFOSEPARATOR]\na\nb",
      "timings": [],
      "children": []
    }

I can easily parse the extra_info field on [INFOSEPARATOR] to get the table name.

The PARQUET_SCAN entry does not, however, include the parquet file name that is being scanned:

{
          "name": "PARQUET_SCAN",
          "timing": 0.05395,
          "cardinality": 2048,
          "extra_info": "unique_key\ncreated_date\nclosed_date",
          "timings": [],
          "children": []
 }

Ideally we could add the file name to the extra_info entry like this:

        {
          "name": "PARQUET_SCAN",
          "timing": 0.05395,
          "cardinality": 2048,
          "extra_info": "'./data/download.parquet'\n[INFOSEPARATOR]\nunique_key\ncreated_date\nclosed_date",
          "timings": [],
          "children": []
        }
@github-actions
Copy link

github-actions bot commented Aug 1, 2023

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale label Aug 1, 2023
@github-actions
Copy link

github-actions bot commented Sep 1, 2023

This issue was closed because it has been stale for 30 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants