DruidSegmentReader should work if timestamp is specified as a dimension by suneet-s · Pull Request #9530 · apache/druid

suneet-s · 2020-03-17T21:40:29Z

DruidInputSource does not support a dimension or metric having the name "timestamp", however this was supported by the ingestSegment firehose which was deprecated in favor of the Druid InputSource. If you try, you will see an exception like

org.apache.druid.indexing.common.task.IndexTask - Encountered exception in BUILD_SEGMENTS. java.lang.ClassCastException: java.lang.String cannot be cast to org.joda.time.DateTime

This change makes it so that you can re-index and compact datasources when a column is explicitly called timestamp and adds integration tests for them.
This PR has:

been self-reviewed.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths.
added integration tests.
been tested in a test Druid cluster.

Tests for compaction and re-indexing a datasource with the timestamp column

jihoonson · 2020-03-25T18:06:18Z

indexing-service/src/main/java/org/apache/druid/indexing/input/DruidSegmentReader.java

        }
      }
+
+      /*


Could you change this to single-line comments? We don't usually use multi-line comment.

jihoonson · 2020-03-25T18:09:59Z

indexing-service/src/main/java/org/apache/druid/indexing/input/DruidSegmentReader.java

+       * Timestamp is added last because we expect that the time column will always be a date time object.
+       * If it is added earlier, it can be overwritten by metrics or dimenstions with the same name.
+       *
+       * If a user names a metric or dimension `__time` it will be overwritten. This case should be rare since


Hmm, I think this overwriting should never happen, but it could happen for some reason in practice, e.g., user mistake. How about logging a warning if there are duplicate column names? Doc could say some kind of warning messages can be printed if there are duplicates.

I'm worried about log explosion. since this is done per row. I'd have to add explicit checking outside of this next block. Maybe in the constructor? Would that be visible enough in the logs?

Ah good point. Now I think we need some schema validation for ingestion which could probably be done in DataSchema. But this would be a larger issue than the bug this PR fixes, and I'm ok with adding it later.

jihoonson · 2020-03-25T18:26:33Z

integration-tests/src/test/java/org/apache/druid/tests/indexer/ITIndexerTest.java

  private static final String INDEX_DATASOURCE = "wikipedia_index_test";

+  private static final String INDEX_WITH_TIMESTAMP_TASK = "/indexer/wikipedia_with_timestamp_index_task.json";
+  // TODO: add queries that validate timestamp is different from the __time column since it is a dimension


Would you open an issue for this instead of TODO?

jihoonson

+1 after CI.

…on (apache#9530) * DruidSegmentReader should work if timestamp is specified as a dimension * Add integration tests Tests for compaction and re-indexing a datasource with the timestamp column * Instructions to run integration tests against quickstart * address pr

…on (#9530) (#9566) * DruidSegmentReader should work if timestamp is specified as a dimension * Add integration tests Tests for compaction and re-indexing a datasource with the timestamp column * Instructions to run integration tests against quickstart * address pr

DruidSegmentReader should work if timestamp is specified as a dimension (apache#9530)

suneet-s force-pushed the timestamp-dim branch from f149800 to aa06c58 Compare March 18, 2020 21:50

DruidSegmentReader should work if timestamp is specified as a dimension

14ccdba

suneet-s force-pushed the timestamp-dim branch from aa06c58 to 3a7db2c Compare March 25, 2020 00:03

Add integration tests

1c98f04

Tests for compaction and re-indexing a datasource with the timestamp column

suneet-s force-pushed the timestamp-dim branch from 3a7db2c to 1c98f04 Compare March 25, 2020 00:19

Instructions to run integration tests against quickstart

b0b9f32

jihoonson reviewed Mar 25, 2020

View reviewed changes

address pr

f71e8fc

jihoonson approved these changes Mar 25, 2020

View reviewed changes

ccaominh approved these changes Mar 25, 2020

View reviewed changes

ccaominh merged commit 55c08e0 into apache:master Mar 25, 2020

suneet-s mentioned this pull request Mar 25, 2020

[0.18.0] DruidSegmentReader should work if timestamp is specified as a dimension (#9530) #9566

Merged

suneet-s mentioned this pull request Mar 25, 2020

DruidSegmentReader should work if timestamp is specified as a dimension (#9530) implydata/druid-public#74

Merged

jihoonson added this to the 0.18.0 milestone Mar 25, 2020

jihoonson added the Bug label Mar 25, 2020

suneet-s deleted the timestamp-dim branch March 26, 2020 00:17

jihoonson added a commit to implydata/druid-public that referenced this pull request Mar 26, 2020

Merge pull request #74 from suneet-s/imply-0.17

6aba8fe

DruidSegmentReader should work if timestamp is specified as a dimension (apache#9530)

jihoonson mentioned this pull request Sep 10, 2020

DruidSegmentReader uses wrong timestampSpec column #10108

Closed

Aditi8007 mentioned this pull request Dec 14, 2023

[Snyk] Fix for 1 vulnerabilities Aditi8007/druid#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DruidSegmentReader should work if timestamp is specified as a dimension#9530

DruidSegmentReader should work if timestamp is specified as a dimension#9530
ccaominh merged 4 commits intoapache:masterfrom
suneet-s:timestamp-dim

suneet-s commented Mar 17, 2020 •

edited

Loading

Uh oh!

jihoonson Mar 25, 2020

Uh oh!

suneet-s Mar 25, 2020

Uh oh!

jihoonson Mar 25, 2020

Uh oh!

suneet-s Mar 25, 2020

Uh oh!

jihoonson Mar 25, 2020

Uh oh!

jihoonson Mar 25, 2020

Uh oh!

suneet-s Mar 25, 2020

Uh oh!

jihoonson left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

                       }
                     }
+                    /*

Conversation

suneet-s commented Mar 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jihoonson Mar 25, 2020

Choose a reason for hiding this comment

Uh oh!

suneet-s Mar 25, 2020

Choose a reason for hiding this comment

Uh oh!

jihoonson Mar 25, 2020

Choose a reason for hiding this comment

Uh oh!

suneet-s Mar 25, 2020

Choose a reason for hiding this comment

Uh oh!

jihoonson Mar 25, 2020

Choose a reason for hiding this comment

Uh oh!

jihoonson Mar 25, 2020

Choose a reason for hiding this comment

Uh oh!

suneet-s Mar 25, 2020

Choose a reason for hiding this comment

Uh oh!

jihoonson left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

suneet-s commented Mar 17, 2020 •

edited

Loading