Skip to content

Add columnMappings to explain plan output#14187

Merged
zachjsh merged 12 commits intoapache:masterfrom
zachjsh:explain-plan-column-mappings
May 4, 2023
Merged

Add columnMappings to explain plan output#14187
zachjsh merged 12 commits intoapache:masterfrom
zachjsh:explain-plan-column-mappings

Conversation

@zachjsh
Copy link
Copy Markdown
Contributor

@zachjsh zachjsh commented Apr 29, 2023

This change adds columnMappings to the native explain query mode explain plan output. This field is a list containing the mapping of queryColumn name to outputColumn name.

As an example for this rollup query (as given as an example in the MSQ docs)

INSERT INTO "kttm_rollup"

WITH kttm_data AS (
SELECT * FROM TABLE(
  EXTERN(
    '{"type":"http","uris":["https://static.imply.io/example-data/kttm-v2/kttm-v2-2019-08-25.json.gz"]}',
    '{"type":"json"}',
    '[{"name":"timestamp","type":"string"},{"name":"agent_category","type":"string"},{"name":"agent_type","type":"string"},{"name":"browser","type":"string"},{"name":"browser_version","type":"string"},{"name":"city","type":"string"},{"name":"continent","type":"string"},{"name":"country","type":"string"},{"name":"version","type":"string"},{"name":"event_type","type":"string"},{"name":"event_subtype","type":"string"},{"name":"loaded_image","type":"string"},{"name":"adblock_list","type":"string"},{"name":"forwarded_for","type":"string"},{"name":"language","type":"string"},{"name":"number","type":"long"},{"name":"os","type":"string"},{"name":"path","type":"string"},{"name":"platform","type":"string"},{"name":"referrer","type":"string"},{"name":"referrer_host","type":"string"},{"name":"region","type":"string"},{"name":"remote_address","type":"string"},{"name":"screen","type":"string"},{"name":"session","type":"string"},{"name":"session_length","type":"long"},{"name":"timezone","type":"string"},{"name":"timezone_offset","type":"long"},{"name":"window","type":"string"}]'
  )
))

SELECT
  FLOOR(TIME_PARSE("timestamp") TO MINUTE) AS __time,
  session,
  agent_category,
  agent_type,
  browser,
  browser_version,
  MV_TO_ARRAY("language") AS "language", -- Multi-value string dimension
  os,
  city,
  country,
  forwarded_for AS ip_address,

  COUNT(*) AS "cnt",
  SUM(session_length) AS session_length,
  APPROX_COUNT_DISTINCT_DS_HLL(event_type) AS unique_event_types
FROM kttm_data
WHERE os = 'iOS'
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
PARTITIONED BY HOUR
CLUSTERED BY browser, session

The plan with added columnMappings field is given as:

[
  {
    "query": {
      "queryType": "groupBy",
      "dataSource": {
        "type": "external",
        "inputSource": {
          "type": "http",
          "uris": [
            "https://static.imply.io/example-data/kttm-v2/kttm-v2-2019-08-25.json.gz"
          ]
        },
        "inputFormat": {
          "type": "json",
          "keepNullColumns": false,
          "assumeNewlineDelimited": false,
          "useJsonNodeReader": false
        },
        "signature": [
          {
            "name": "timestamp",
            "type": "STRING"
          },
          {
            "name": "agent_category",
            "type": "STRING"
          },
          {
            "name": "agent_type",
            "type": "STRING"
          },
          {
            "name": "browser",
            "type": "STRING"
          },
          {
            "name": "browser_version",
            "type": "STRING"
          },
          {
            "name": "city",
            "type": "STRING"
          },
          {
            "name": "continent",
            "type": "STRING"
          },
          {
            "name": "country",
            "type": "STRING"
          },
          {
            "name": "version",
            "type": "STRING"
          },
          {
            "name": "event_type",
            "type": "STRING"
          },
          {
            "name": "event_subtype",
            "type": "STRING"
          },
          {
            "name": "loaded_image",
            "type": "STRING"
          },
          {
            "name": "adblock_list",
            "type": "STRING"
          },
          {
            "name": "forwarded_for",
            "type": "STRING"
          },
          {
            "name": "language",
            "type": "STRING"
          },
          {
            "name": "number",
            "type": "LONG"
          },
          {
            "name": "os",
            "type": "STRING"
          },
          {
            "name": "path",
            "type": "STRING"
          },
          {
            "name": "platform",
            "type": "STRING"
          },
          {
            "name": "referrer",
            "type": "STRING"
          },
          {
            "name": "referrer_host",
            "type": "STRING"
          },
          {
            "name": "region",
            "type": "STRING"
          },
          {
            "name": "remote_address",
            "type": "STRING"
          },
          {
            "name": "screen",
            "type": "STRING"
          },
          {
            "name": "session",
            "type": "STRING"
          },
          {
            "name": "session_length",
            "type": "LONG"
          },
          {
            "name": "timezone",
            "type": "STRING"
          },
          {
            "name": "timezone_offset",
            "type": "LONG"
          },
          {
            "name": "window",
            "type": "STRING"
          }
        ]
      },
      "intervals": {
        "type": "intervals",
        "intervals": [
          "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
        ]
      },
      "virtualColumns": [
        {
          "type": "expression",
          "name": "v0",
          "expression": "timestamp_floor(timestamp_parse(\"timestamp\",null,'UTC'),'PT1M',null,'UTC')",
          "outputType": "LONG"
        },
        {
          "type": "expression",
          "name": "v1",
          "expression": "mv_to_array(\"language\")",
          "outputType": "ARRAY<STRING>"
        },
        {
          "type": "expression",
          "name": "v2",
          "expression": "'iOS'",
          "outputType": "STRING"
        }
      ],
      "filter": {
        "type": "selector",
        "dimension": "os",
        "value": "iOS"
      },
      "granularity": {
        "type": "all"
      },
      "dimensions": [
        {
          "type": "default",
          "dimension": "v0",
          "outputName": "d0",
          "outputType": "LONG"
        },
        {
          "type": "default",
          "dimension": "session",
          "outputName": "d1",
          "outputType": "STRING"
        },
        {
          "type": "default",
          "dimension": "agent_category",
          "outputName": "d2",
          "outputType": "STRING"
        },
        {
          "type": "default",
          "dimension": "agent_type",
          "outputName": "d3",
          "outputType": "STRING"
        },
        {
          "type": "default",
          "dimension": "browser",
          "outputName": "d4",
          "outputType": "STRING"
        },
        {
          "type": "default",
          "dimension": "browser_version",
          "outputName": "d5",
          "outputType": "STRING"
        },
        {
          "type": "default",
          "dimension": "v1",
          "outputName": "d6",
          "outputType": "ARRAY<STRING>"
        },
        {
          "type": "default",
          "dimension": "v2",
          "outputName": "d7",
          "outputType": "STRING"
        },
        {
          "type": "default",
          "dimension": "city",
          "outputName": "d8",
          "outputType": "STRING"
        },
        {
          "type": "default",
          "dimension": "country",
          "outputName": "d9",
          "outputType": "STRING"
        },
        {
          "type": "default",
          "dimension": "forwarded_for",
          "outputName": "d10",
          "outputType": "STRING"
        }
      ],
      "aggregations": [
        {
          "type": "count",
          "name": "a0"
        },
        {
          "type": "longSum",
          "name": "a1",
          "fieldName": "session_length"
        },
        {
          "type": "HLLSketchBuild",
          "name": "a2",
          "fieldName": "event_type",
          "lgK": 12,
          "tgtHllType": "HLL_4",
          "round": true
        }
      ],
      "limitSpec": {
        "type": "default",
        "columns": [
          {
            "dimension": "d4",
            "direction": "ascending",
            "dimensionOrder": {
              "type": "lexicographic"
            }
          },
          {
            "dimension": "d1",
            "direction": "ascending",
            "dimensionOrder": {
              "type": "lexicographic"
            }
          }
        ]
      },
      "context": {
        "finalizeAggregations": false,
        "groupByEnableMultiValueUnnesting": false,
        "queryId": "665106f9-325a-4799-85a8-cc16855bca10",
        "sqlInsertSegmentGranularity": "\"HOUR\"",
        "sqlQueryId": "665106f9-325a-4799-85a8-cc16855bca10"
      }
    },
    "signature": [
      {
        "name": "d0",
        "type": "LONG"
      },
      {
        "name": "d1",
        "type": "STRING"
      },
      {
        "name": "d2",
        "type": "STRING"
      },
      {
        "name": "d3",
        "type": "STRING"
      },
      {
        "name": "d4",
        "type": "STRING"
      },
      {
        "name": "d5",
        "type": "STRING"
      },
      {
        "name": "d6",
        "type": "ARRAY<STRING>"
      },
      {
        "name": "d7",
        "type": "STRING"
      },
      {
        "name": "d8",
        "type": "STRING"
      },
      {
        "name": "d9",
        "type": "STRING"
      },
      {
        "name": "d10",
        "type": "STRING"
      },
      {
        "name": "a0",
        "type": "LONG"
      },
      {
        "name": "a1",
        "type": "LONG"
      },
      {
        "name": "a2",
        "type": "LONG"
      }
    ],
    "columnMappings": [
      {
        "queryColumn": "d0",
        "outputColumn": "__time"
      },
      {
        "queryColumn": "d1",
        "outputColumn": "session"
      },
      {
        "queryColumn": "d2",
        "outputColumn": "agent_category"
      },
      {
        "queryColumn": "d3",
        "outputColumn": "agent_type"
      },
      {
        "queryColumn": "d4",
        "outputColumn": "browser"
      },
      {
        "queryColumn": "d5",
        "outputColumn": "browser_version"
      },
      {
        "queryColumn": "d6",
        "outputColumn": "language"
      },
      {
        "queryColumn": "d7",
        "outputColumn": "os"
      },
      {
        "queryColumn": "d8",
        "outputColumn": "city"
      },
      {
        "queryColumn": "d9",
        "outputColumn": "country"
      },
      {
        "queryColumn": "d10",
        "outputColumn": "ip_address"
      },
      {
        "queryColumn": "a0",
        "outputColumn": "cnt"
      },
      {
        "queryColumn": "a1",
        "outputColumn": "session_length"
      },
      {
        "queryColumn": "a2",
        "outputColumn": "unique_event_types"
      }
    ]
  }
]

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@zachjsh zachjsh requested a review from gianm April 29, 2023 03:24
Comment on lines +435 to +437
objectNode.put(
"columnMappings",
jsonMapper.convertValue(QueryUtils.buildColumnMappings(relRoot.fields, druidQuery), ArrayNode.class));

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [ObjectNode.put](1) should be avoided because it has been deprecated.
* add tests
@zachjsh zachjsh marked this pull request as ready for review May 1, 2023 17:46
Copy link
Copy Markdown
Contributor

@abhishekrb19 abhishekrb19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. A few questions and comments:

  1. Will the new columnMappings be backwards compatible with existing clients? Currently, this info is added as part of PLAN column along with queries and signature -- I think this should be ok as it's a new map entry, but would like to double check.
  2. Can we update the EXPLAIN PLAN sample output in the following README to include the new info: https://github.com/apache/druid/blob/master/docs/querying/sql-translation.md#interpreting-explain-plan-output?
  3. Since columnMappings info is only available in the native explain query mode (and not in the legacy Calcite mode), it'd be worth clarifying that in the PR summary or title, so it's evident in the next release notes.

--type jacoco \
--line-coverage 50 \
--branch-coverage 50 \
--line-coverage 1 \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change intentional?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do revert this back to 50, 50.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah was intentional to get around the code coverage blocking the ITs from running, I will revert back for sure.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this change is to get around the coverage failure and run everything, I believe you could achieve that by running workflows directly from the source repo's branch, i.e., your fork

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic idea LGTM. I generally agree with @abhishekrb19's review comments.

--type jacoco \
--line-coverage 50 \
--branch-coverage 50 \
--line-coverage 1 \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do revert this back to 50, 50.

@zachjsh
Copy link
Copy Markdown
Contributor Author

zachjsh commented May 3, 2023

LGTM overall. A few questions and comments:

1. Will the new `columnMappings` be backwards compatible with existing clients? Currently, this info is added as part of `PLAN` column along with `queries` and `signature` -- I think this should be ok as it's a new map entry, but would like to double check.

2. Can we update the `EXPLAIN PLAN` sample output in the following `README` to include the new info: https://github.com/apache/druid/blob/master/docs/querying/sql-translation.md#interpreting-explain-plan-output?

3. Since `columnMappings` info is only available in the native explain query mode (and not in the legacy Calcite mode),  it'd be worth clarifying that in the PR summary or title, so it's evident in the next release notes.

thanks @abhishekrb19 , I addressed your concerns here. Adding a new property to existing returned object should not break existing clients, unless they are bad clients 👿

Copy link
Copy Markdown
Contributor

@abhishekrb19 abhishekrb19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@zachjsh
Copy link
Copy Markdown
Contributor Author

zachjsh commented May 3, 2023

dont merge until I revert the code coverage threshold. Will do so when all tests pass.

@zachjsh zachjsh merged commit 48cde23 into apache:master May 4, 2023
@zachjsh zachjsh deleted the explain-plan-column-mappings branch May 4, 2023 17:36
@zachjsh zachjsh restored the explain-plan-column-mappings branch May 4, 2023 18:03
@zachjsh zachjsh deleted the explain-plan-column-mappings branch May 4, 2023 18:27
@abhishekagarwal87 abhishekagarwal87 added this to the 27.0 milestone Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants