Add is_overshadowed column to sys.segments table #7425

surekhasaharan · 2019-04-08T19:13:30Z

Addresses #7233

This PR adds a col is_overshadowed to sys-segments table
Add a new class SegmentWithOvershadowedStatus to capture overshadowed info for segment in coordinator.
Add optional new queryParam /druid/coordinator/v1/metadata/segments?includeOvershadowedStatus to coordinator API
Modify tests and docs

vogievetsky · 2019-04-08T23:10:46Z

Does it make sense to add this to the web console within this PR?

surekhasaharan · 2019-04-10T00:16:20Z

Does it make sense to add this to the web console within this PR?

I think it'd be better to do as a separate PR, which should be merged after this one.

…rshadowed-segments-fix

jon-wei · 2019-04-10T23:54:23Z

Can you update the PR description so that it refers to includeOvershadowedStatus and SegmentWithOvershadowedStatus?

surekhasaharan · 2019-04-10T23:57:01Z

Yes, missed here, done.

jon-wei · 2019-04-10T23:59:42Z

core/src/main/java/org/apache/druid/timeline/SegmentWithOvershadowedStatus.java

+ */
+public class SegmentWithOvershadowedStatus implements Comparable<SegmentWithOvershadowedStatus>
+{
+  private final boolean isOvershadowed;


nit: suggest calling this overshadowed

jon-wei · 2019-04-11T00:00:55Z

docs/content/querying/sql.md

@@ -609,6 +609,7 @@ Note that a segment can be served by more than one stream ingestion tasks or His
 |is_published|LONG|Boolean is represented as long type where 1 = true, 0 = false. 1 represents this segment has been published to the metadata store with `used=1`|
 |is_available|LONG|Boolean is represented as long type where 1 = true, 0 = false. 1 if this segment is currently being served by any process(Historical or realtime)|
 |is_realtime|LONG|Boolean is represented as long type where 1 = true, 0 = false. 1 if this segment is being served on any type of realtime tasks|
+|is_overshadowed|LONG|Boolean is represented as long type where 1 = true, 0 = false. 1 if this segment is published and is overshadowed by some other published segments. Currently, is_overshadowed is always false for unpublished segments, although this may change in the future. You can filter for segments that "should be published" by filtering for `is_published = 1 AND is_overshadowed = 0`. Segments can briefly be both published and overshadowed if they were recently replaced, but have not been unpublished yet.


1 if this segment is published and is overshadowed by some other published segments.

I think this should mention that this returns 1 only for fully overshadowed segments

yes, i think it's important to mention that, changed.

jon-wei · 2019-04-11T01:27:18Z

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

+      final Set<SegmentId> overshadowedSegments = findOvershadowedSegments(druidDataSources);
+      //transform DataSegment to SegmentWithOvershadowedStatus objects
+      final Stream<SegmentWithOvershadowedStatus> segmentsWithOvershadowedStatus = metadataSegments.map(segment -> {
+        if (overshadowedSegments.contains(segment.getId())) {


This block could be

return new SegmentWithOvershadowedStatus( segment, overshadowedSegments.contains(segment.getId()) );

yes, that looks nicer, thanks.

…rshadowed-segments-fix

jon-wei · 2019-04-12T00:47:36Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/SystemSchema.java

@@ -92,6 +93,9 @@
  private static final String SERVER_SEGMENTS_TABLE = "server_segments";
  private static final String TASKS_TABLE = "tasks";

+  private static final long AVAILABLE_IS_OVERSHADOWED_VALUE = 0L;


hm, if we use these, I think IS_OVERSHADOWED_FALSE would be clearer, and it should have IS_OVERSHADOWED_TRUE as well

ok, yeah, i was having hard time coming up with good constant names for these.

jon-wei · 2019-04-12T00:47:52Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/SystemSchema.java

                  isAvailable,
                  isRealtime,
+                  val.isOvershadowed() ? 1L : 0L,


this should use the constants instead of 1L and 0L

jon-wei · 2019-04-12T00:50:05Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/SystemSchema.java

@@ -295,6 +302,7 @@ public TableType getJdbcTableType()
                  val.getValue().isPublished(),
                  val.getValue().isAvailable(),
                  val.getValue().isRealtime(),
+                  AVAILABLE_IS_OVERSHADOWED_VALUE,


i think this should have a comment that this assumes unpublished segments are never overshadowed.

added comment

leventov · 2019-04-12T15:57:10Z

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

-    Response.ResponseBuilder builder = Response.status(Response.Status.OK);
-    return builder.entity(stream).build();
+      final Function<DataSegment, Iterable<ResourceAction>> raGenerator = segment -> Collections.singletonList(
+          AuthorizationUtils.DATASOURCE_READ_RA_GENERATOR.apply(segment.getDataSource()));


Not formatted properly

Not addressed

manually changed

leventov · 2019-04-12T15:58:16Z

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

+  }
+
+  /**
+   * find fully overshadowed segments


Please write proper sentences in Javadocs.

sure, changed.

leventov · 2019-04-12T16:00:55Z

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

+   *
+   * @return set of overshadowed segments
+   */
+  private Set<SegmentId> findOvershadowedSegments(Collection<ImmutableDruidDataSource> druidDataSources)


How this method is similar or different from findOvershadowed()?

This this method belong to MetadataResource, is there a better place for it?

you mean the VersionedIntervalTimeline#findOvershadowed(), I think that one finds partially overshadowed segments as well. This method only looks for fully overshadowed segments. Also that method returns a TimelineObjectHolder.

If this methods belongs here, yeah I also thought about it, thought of adding it to VersionedIntervalTimeline, but since that's in core package, there was dependency issue to take ImmutableDruidDataSource as argument. May be can work around that by passing DataSegment object instead, if it makes sense to move this to VersionedIntervalTimeline and if it's going to be used by other code.

Maybe it could be a static method on ImmutableDruidDataSource that accepts a collection, or an instance method on ImmutableDruidDataSource like getFullyOvershadowedSegments()

thanks, moved this method to ImmutableDruidDataSource as an instance method.

leventov · 2019-04-12T16:01:44Z

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

+   */
+  private Set<SegmentId> findOvershadowedSegments(Collection<ImmutableDruidDataSource> druidDataSources)
+  {
+    final Stream<DataSegment> segmentStream = druidDataSources


What's the point of extracting segmentStream variable?

it was used for building timelines, and is still used to pass to the new method i created in VersionedIntervalTimeline

leventov · 2019-04-12T16:03:33Z

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

+    final Stream<DataSegment> segmentStream = druidDataSources
+        .stream()
+        .flatMap(t -> t.getSegments().stream());
+    final Set<DataSegment> usedSegments = segmentStream.collect(Collectors.toSet());


What's the point of the creation of this collection instead of iterating existing ImmutableDruidDataSource objects?

i can get rid of this one now.

leventov · 2019-04-12T16:09:25Z

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

+              segment -> new SegmentWithOvershadowedStatus(
+                  segment,
+                  overshadowedSegments.contains(segment.getId())
+              )).collect(Collectors.toList()).stream();


Not formatted properly

Why .collect(Collectors.toList()).stream() part is needed?

fixed formatting and that part is actually not needed.

leventov · 2019-04-12T16:10:36Z

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

@@ -155,7 +158,8 @@ public Response getDatabaseSegmentDataSource(@PathParam("dataSourceName") final
  @Produces(MediaType.APPLICATION_JSON)
  public Response getDatabaseSegments(


I think this method is too big now, it should be split into smaller methods.

Not addressed

I looked at this, not sure best way to split it, took out parts of finding and authorizing SegmentWithOvershadowedStatus into a helper method.

leventov · 2019-04-12T16:12:37Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/MetadataSegmentView.java

@@ -73,8 +73,10 @@
  private final BrokerSegmentWatcherConfig segmentWatcherConfig;

  private final boolean isCacheEnabled;
+  // Use ConcurrentSkipListMap so that the order of segments is deterministic and


Use Javadocs for commenting fields

changed to javadocs

leventov · 2019-04-12T16:16:43Z

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

-        AuthorizationUtils.DATASOURCE_READ_RA_GENERATOR.apply(segment.getDataSource()));
+    if (includeOvershadowedStatus != null) {
+      final Set<SegmentId> overshadowedSegments = findOvershadowedSegments(druidDataSources);
+      //transform DataSegment to SegmentWithOvershadowedStatus objects


This comment doesn't add meaning

removed the comment

leventov · 2019-04-12T16:17:13Z

core/src/main/java/org/apache/druid/timeline/SegmentWithOvershadowedStatus.java

+ *
+ * SegmentWithOvershadowedStatus's {@link #compareTo} method considers only the {@link SegmentId} of the DataSegment object.
+ */
+public class SegmentWithOvershadowedStatus implements Comparable<SegmentWithOvershadowedStatus>


Did you explore the possibility for this class to extend DataSegment for memory saving purposes?

Yes, in fact I started with extends DataSegment, but in order to call the super(), I had to pass the DataSegment reference to SegmentWithOvershadowedStatus, so that I get the properties for the super constructor call, something like this

@JsonCreator public SegmentWithOvershadowedStatus( @JsonProperty("dataSegment") DataSegment segment, @JsonProperty("overshadowed") boolean overshadowed ) { super( segment.getDataSource(), segment.getInterval(), segment.getVersion(), segment.getLoadSpec(), segment.getDimensions(), segment.getMetrics(), segment.getShardSpec(), segment.getBinaryVersion(), segment.getSize() ); this.dataSegment = segment; this.overshadowed = overshadowed; }

which didn't seem correct to me, as I am both extending the class and passing the reference to same in sub-class and decided to just keep DataSegment as member of this class. Is there a better way of doing this ?

I don't see why SegmentWithOvershadowedStatus should have a "dataSegment" field rather than all fields deconstructed. In fact, it would allow saving a little of serialization/deserialization and the number of bytes sent over the network as well.

If I deconstruct the DataSegment object, then we might save few bytes on a reference to DataSegment, but then the memory savings in broker where interned DataSegment is used would be lost. (Getting rid of interning or not is another issue, which should be addressed outside of this PR). If the concern is bytes sent over the network, then moving to smile format instead of json can provide considerable reduction in size of bytes transferred, which I plan to do in a follow-up PR later.

Is SegmentWithOvershadowedStatus stored somewhere for a long time? Intering an object upon deserialization and throwing it away soon doesn't make a lot of sense.

And even if the overshadowed status should be kept around on some node for a long time, you would better off apply mapping techniques such as described in #7395 instead of using plain Guava's interners. When you do this, you can insert the "overshadowed" flag wherever you want, or have something like ConcurrentHashMap<DataSegment, SegmentWithOvershadowedStatus> for storage, etc.

The coordinator API is evolvable, and is already evolving in this patch via request parameters: its response structure is different based on whether or not the includeOvershadowedStatus parameter is provided. If it needs to evolve further, then that would be okay and doable. (Although if all we do is switch to Smile, I don't think structural evolution is needed, since I imagine we would do that switch by making various APIs support both JSON and Smile based on a client header.)

By the way, we could get rid of the old formats after a few releases if we want, by deprecating them and then introducing a hard barrier that rolling updates cannot cross. We usually try to avoid doing this too often but it can be done.

BTW, I think we should have something like @ClusterInternalAPI annotation for this.

sounds like a good idea to add an annotation for all internal API's in a separate PR

…rshadowed-segments-fix

jon-wei

LGTM

…rshadowed-segments-fix

gianm · 2019-04-19T00:47:17Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/MetadataSegmentView.java

@@ -195,7 +204,7 @@ private void poll()
        sb.append("datasources=").append(ds).append("&");
      }
      sb.setLength(sb.length() - 1);
-      query = "/druid/coordinator/v1/metadata/segments?" + sb;
+      query = "/druid/coordinator/v1/metadata/segments?includeOvershadowedStatus?" + sb;


I'm not sure if doing two ? in the same URL works, but even if it does, it's poor form; the second one should be a &.

yes, didn't realize i have two ? , will change

gianm · 2019-04-19T01:36:37Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/MetadataSegmentView.java

      // timestamp is used to filter deleted segments
-      publishedSegments.put(interned, timestamp);


This introduces a bug: since there are two possible SegmentWithOvershadowedStatus for each underlying DataSegment, now the same segment can be in publishedSegments twice for a period of time. There's a few ways to deal with this:

Make publishedSegments a TreeSet<SegmentWithOvershadowedStatus> and update the entire map atomically. This is a super clean solution but would burst to higher memory usage (it would need to keep two entire copies of the map in memory when replacing them).

Make publishedSegments a ConcurrentSkipListMap<DataSegment, CachedSegmentInfo> where CachedSegmentInfo is some static class, defined in this file, containing the updated timestamp and the overshadowed boolean. If you do this, the SegmentWithOvershadowedStatus won't be stored long term anymore. You could minimize memory footprint of CachedSegmentInfo, if you want, by making the timestamp a long rather than DateTime.

Make publishedSegments a ConcurrentSkipListSet<SegmentWithOvershadowedStatus>, make SegmentWithOvershadowedStatus mutable (in a thread-safe way), make its equals, hashCode, and compareTo methods only based on the dataSegment field, let its overshadowed field be modified, and add a timestamp field to it. When syncing the cache, get the current object and mutate the overshadowed field if necessary. Btw, a ConcurrentSkipListSet uses a ConcurrentSkipListMap under the hood, so the memory footprint savings of this aren't as much as you might expect relative to (2).

(2) is the variant that's closest to what the code was doing before this patch. One thing I don't love about it is that it is racey: it means that if a set of segments is overshadowed all at once, callers will not necessarily see a consistent view, because the map is being concurrently updated. They'll see the overshadowed flag get set for the underlying segments one at a time. But the same problem existed in the old code, so fixing it could be considered out of scope for this patch.

If I was you I would probably do (2) and then consider if there's a way to address the raciness in a future patch.

omg, thanks for catching this issue, actually I feel like going with option (1), i didn't do it initially because of memory bump it'd cause like you said, but that would have avoided such bugs and updates the cache atomically. I tried option (2) as well, one thing it would cause is, the return type of getPublishedSegments() changes, so CachedSegmentInfo cannot be private to this class. The getPublishedSegments() might need to be split into 2 methods getPublishedSegments() and getCachedPublishedSegments() or create yet another wrapper class and return that. Another potential ugliness it might have is clients need to know if cache is enabled and call the right method to get published segments.

okay, i added changes to do (2) now as we discussed.

I don't have numbers, but I'm concerned about a ConcurrentSkipListMap of all-segments-in-system cardinality. Maybe take Gian's approach 1), but instead of TreeMap using sorted arrays of DataSegment objects. (Or Guava's ImmutableSortedMap, which uses the same approach underneath).

See https://webcache.googleusercontent.com/search?q=cache:csIsYj1a5oAJ:https://gist.github.com/gaul/7108880+&cd=2&hl=en&ct=clnk&gl=es&lr=lang_en%7Clang_ru:

ConcurrentSkipListMap 36 ImmutableSortedMap 8

So I'm pretty sure even temporarily having two ImmutableSortedMap in memory will well beat one ConcurrentSkipListMap.

Using sorted arrays directly, even that two maps can be avoided materializing in memory.

I used ImmutableSortedSet, now the cache gets replaced atomically. @gianm @leventov let me know if you see any issues with this.

Option 1 (what you went with) sounds good to me, especially in list of ConcurrentSkipListMap's additional overhead.

…rshadowed-segments-fix

leventov · 2019-04-22T17:22:21Z

server/src/main/java/org/apache/druid/client/ImmutableDruidDataSource.java

+   *
+   * @return set of overshadowed segments
+   */
+  public Set<SegmentId> getFullyOvershadowedSegments()


The name of an expensive computation method should start with get-, which usually implies cheap, no-allocation method in Java. It can be compute-, determine-, or find-.

renamed to determineOvershadowedSegments

leventov · 2019-04-22T17:23:44Z

server/src/main/java/org/apache/druid/client/ImmutableDruidDataSource.java

@@ -109,6 +112,27 @@ public long getTotalSizeOfSegments()
    return totalSizeOfSegments;
  }

+  /**
+   * This method finds the fully overshadowed segments in this datasource


What does the prefix "fully" mean in this context? Can a segment be "just" overshadowed?

A segment can be partially overshadowed.

The term "overshadowed" is used throughout the codebase in the fully overshadowed meaning. If partially overshadowed concept is ever used in the codebase, it should be called partiallyOvershadowed.

Compare with determineOvershadowedSegments() method. We should either have "FullyOveshadowed" everywhere or nowhere. I think we should have it nowhere.

Sure, I think it makes sense to just use the word "overshadowed" here.

renamed "fully overshadowed" -> "overshadowed"

leventov · 2019-04-22T17:24:24Z

server/src/main/java/org/apache/druid/client/ImmutableDruidDataSource.java

+  {
+    final Collection<DataSegment> segments = this.getSegments();
+    final Map<String, VersionedIntervalTimeline<String, DataSegment>> timelines = VersionedIntervalTimeline.buildTimelines(
+        segments);


Not formatted properly.

not sure if I am missing something, I again reimported druid_intellij_formatting.xml, and this is what I get if I reformat code. I tried to add a manual line break and reformat, see if it looks ok now?

leventov · 2019-04-22T17:25:20Z

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorRuleRunner.java

-    }
-
+    final Map<String, VersionedIntervalTimeline<String, DataSegment>> timelines = VersionedIntervalTimeline.buildTimelines(
+        params.getAvailableSegments());


Not formatted properly

same here, reformat doesn't change the formatting

leventov · 2019-04-22T17:26:06Z

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorRuleRunner.java

@@ -141,13 +139,8 @@ public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params)

  private Set<DataSegment> determineOvershadowedSegments(DruidCoordinatorRuntimeParams params)


Looks like this method is the same as getFullyOvershadowedSegments(), can they be merged?

yeah, it's similar, except input args and return type. I think getFullyOvershadowedSegments would be useful as it's public and can be used by anyone with a datasource object, i don't see how i can use that one in here though.

It seems to me there can be a static method determineOvershadowedSegments(Iterable<DataSegment> segments). DataSource's method can delegate to that static method for the convenience of API. DruidCoordinatorRuleRunner's can be removed and determineOvershadowedSegments(params.getAvailableSegments()) is used instead.

ok, made determineOvershadowedSegments static in ImmutableDruidDataSource and removed the one from here

leventov · 2019-04-22T17:27:22Z

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

-    final Iterable<DataSegment> authorizedSegments =
-        AuthorizationUtils.filterAuthorizedResources(req, metadataSegments::iterator, raGenerator, authorizerMapper);
+      final Function<SegmentWithOvershadowedStatus, Iterable<ResourceAction>> raGenerator = segment -> Collections.singletonList(
+          AuthorizationUtils.DATASOURCE_READ_RA_GENERATOR.apply(segment.getDataSegment().getDataSource()));


Not addressed

leventov · 2019-04-22T17:27:39Z

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

-    Response.ResponseBuilder builder = Response.status(Response.Status.OK);
-    return builder.entity(stream).build();
+      final Function<DataSegment, Iterable<ResourceAction>> raGenerator = segment -> Collections.singletonList(
+          AuthorizationUtils.DATASOURCE_READ_RA_GENERATOR.apply(segment.getDataSource()));


Not addressed

leventov · 2019-04-22T17:28:15Z

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

@@ -155,7 +158,8 @@ public Response getDatabaseSegmentDataSource(@PathParam("dataSourceName") final
  @Produces(MediaType.APPLICATION_JSON)
  public Response getDatabaseSegments(


Not addressed

…rshadowed-segments-fix

surekhasaharan · 2019-04-23T18:00:20Z

thanks @leventov for the review, I believe I have addressed all the comments which are in scope for this PR, let me know if you have more comments.

leventov · 2019-04-24T17:28:37Z

core/src/main/java/org/apache/druid/timeline/SegmentWithOvershadowedStatus.java

+/**
+ * DataSegment object plus the overshadowed status for the segment. An immutable object.
+ *
+ * SegmentWithOvershadowedStatus's {@link #compareTo} method considers only the {@link SegmentId} of the DataSegment object.


Line longer 120 cols

leventov · 2019-04-24T17:35:18Z

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorRuleRunner.java

@@ -141,13 +139,8 @@ public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params)

  private Set<DataSegment> determineOvershadowedSegments(DruidCoordinatorRuntimeParams params)


It seems to me there can be a static method determineOvershadowedSegments(Iterable<DataSegment> segments). DataSource's method can delegate to that static method for the convenience of API. DruidCoordinatorRuleRunner's can be removed and determineOvershadowedSegments(params.getAvailableSegments()) is used instead.

leventov · 2019-04-24T17:39:08Z

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

+      Stream<DataSegment> metadataSegments
+  )
+  {
+    final Set<SegmentId> overshadowedSegments = new HashSet<>();


Please add the comment like the following:

This is fine to add all overshadowed segments to a single collection because only a small fraction of the segments in the cluster are expected to be overshadowed, so building this collection shouldn't generate a lot of garbage.

sure, added the comment

leventov · 2019-04-24T17:39:41Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/MetadataSegmentView.java

@@ -73,8 +72,12 @@
  private final BrokerSegmentWatcherConfig segmentWatcherConfig;

  private final boolean isCacheEnabled;
+  /**
+   * Use {@link ConcurrentSkipListMap} so that the order of segments is deterministic and sys.segments queries return the segments in sorted order based on segmentId


Line longer than 120 cols

leventov · 2019-04-24T17:42:00Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/MetadataSegmentView.java

@@ -195,7 +206,7 @@ private void poll()
        sb.append("datasources=").append(ds).append("&");
      }
      sb.setLength(sb.length() - 1);
-      query = "/druid/coordinator/v1/metadata/segments?" + sb;
+      query = "/druid/coordinator/v1/metadata/segments?includeOvershadowedStatus&" + sb;


End & intended?

yes, i think so, this will be only used if there are non empty watchedDataSources set (let's say it contains a datasource name "dummy") then the URL would look like /druid/coordinator/v1/metadata/segments?includeOvershadowedStatus&datasources=dummy etc.

leventov · 2019-04-24T18:00:58Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/MetadataSegmentView.java

      // timestamp is used to filter deleted segments
-      publishedSegments.put(interned, timestamp);


I don't have numbers, but I'm concerned about a ConcurrentSkipListMap of all-segments-in-system cardinality. Maybe take Gian's approach 1), but instead of TreeMap using sorted arrays of DataSegment objects. (Or Guava's ImmutableSortedMap, which uses the same approach underneath).

See https://webcache.googleusercontent.com/search?q=cache:csIsYj1a5oAJ:https://gist.github.com/gaul/7108880+&cd=2&hl=en&ct=clnk&gl=es&lr=lang_en%7Clang_ru:

ConcurrentSkipListMap 36 ImmutableSortedMap 8

So I'm pretty sure even temporarily having two ImmutableSortedMap in memory will well beat one ConcurrentSkipListMap.

Using sorted arrays directly, even that two maps can be avoided materializing in memory.

…rshadowed-segments-fix

gianm · 2019-04-26T21:03:16Z

Is this patch ready to merge? Anything I can help with?

surekhasaharan · 2019-04-26T21:09:01Z

@leventov any more comments, specifically here

jihoonson · 2019-04-26T21:22:37Z

core/src/main/java/org/apache/druid/timeline/VersionedIntervalTimeline.java

@@ -109,6 +109,15 @@ public static void addSegments(
    );
  }

+  public static Map<String, VersionedIntervalTimeline<String, DataSegment>> buildTimelines(Iterable<DataSegment> segments)


This interface looks somewhat less intuitive because as this method also implies, VersionedIntervalTimeline is for each dataSource. I think it would be better to first group segments by their dataSources and then call VersionedIntervalTimeline.forSegments per dataSource. What do you think?

you're right, it should not belong here, moved it to ImmutableDruidDataSource as that's the only place it's used from.

…rshadowed-segments-fix

leventov · 2019-04-29T16:57:00Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/MetadataSegmentView.java

  {
    if (isCacheEnabled) {
      Preconditions.checkState(
          lifecycleLock.awaitStarted(1, TimeUnit.MILLISECONDS) && cachePopulated.get(),
          "hold on, still syncing published segments"
      );
-      return publishedSegments.keySet().iterator();
+      synchronized (lock) {


This lock is not needed. If publishedSegments was a mutable collection, such lock wouldn't prevent a race: https://github.com/code-review-checklists/java-concurrency#unsafe-concurrent-iteration. But since it's immutable, you don't need a lock at all. You can just make publishedSegments field volatile if you wish.

removed lock and made publishedSegments volatile

leventov · 2019-04-29T16:59:16Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/MetadataSegmentView.java

    cachePopulated.set(true);
  }

-  public Iterator<DataSegment> getPublishedSegments()
+  public Iterator<SegmentWithOvershadowedStatus> getPublishedSegments()
  {
    if (isCacheEnabled) {
      Preconditions.checkState(
          lifecycleLock.awaitStarted(1, TimeUnit.MILLISECONDS) && cachePopulated.get(),


This type of "wait" which always throws IllegalStateException intimidates me. Can you replace cachePopulated with CountDownLatch(1), call cachePopulated.countDown() instead of cachePopulated.set(true) and call Uninterruptibles.awaitUninterruptibly(cachePopulated) here before accessing publishedSegments.iterator(), exception-free?

replaced AtomicBoolean with CountDownLatch here, no exception on wait.

leventov · 2019-04-29T17:00:11Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/MetadataSegmentView.java

+  /**
+   * Use {@link ImmutableSortedSet} so that the order of segments is deterministic and
+   * sys.segments queries return the segments in sorted order based on segmentId
+   */
  @Nullable


Please annotate @MonotonicNonNull instead of @Nullable

ok, good to know about this annotation.

leventov · 2019-04-29T17:20:47Z

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

+    // so building this collection shouldn't generate a lot of garbage.
+    final Set<DataSegment> overshadowedSegments = new HashSet<>();
+    for (ImmutableDruidDataSource dataSource : druidDataSources) {
+      overshadowedSegments.addAll(ImmutableDruidDataSource.determineOvershadowedSegments(dataSource.getSegments()));


What if there are 20 brokers querying this endpoint on Coordinator? They all recompute overshadowed status (which is expensive and memory-intensive, because requires to build a VersionedIntervalTimeline) again and again.

I suggest the following:

isOvershadowed becomes a non-final field of DataSegment object itself, not participating in equals() and hashCode().

Add interface SegmentsAccess { ImmutableDruidDataSource prepare(String dataSource); Iterable<DataSegment> iterateAll(); } (strawman naming)

Add DataSourceAccess computeOvershadowed() method to SQLSegmentMetadataManager, which performs this computation for every snapshot of SQLSegmentMetadataManager.dataSources (which is updated in poll()) at most once, lazily.

Both endpoints in MetadataResource and Coordination balancing logic (which currently computes isOvershadowed status on its own, too) use this API.

On the side of MetadataSegmentView, maintain something like a Map<DataSegment, DataSegment> and update overshadowed status like map.get(segmentFromCoordinator).setOvershadowed(segmentFromCoordinator.isOvershadowed()).

Result: we don't do any repetitive computations of overshadowed segments for every SQLSegmentMetadataManager.dataSegments snapshot whatsoever.

Thanks for your suggestion, but this code change is not contained in this method and would affect other places in the code, some of which are not part of original PR( eg coordinator balancing logic). I would prefer to do this change separately as it suggests changing DataSegment object and adding new interfaces, so there may be more follow-up discussions. Created #7571 to address this comment.

this code change is not contained in this method and would affect other places in the code, some of which are not part of original PR( eg coordinator balancing logic

I don't see how the files touched in the original PR are special. If you have implemented the above suggestion from the beginning those files would be part of the "original" PR.

Touching relatively unrelated files is normal when you do refactoring, in fact, that's one of the objectives of refactoring - to gather functionality that accidentally happens to scatter unrelated places in a single place.

I would prefer to do this change separately as it suggests changing DataSegment object and adding new interfaces, so there may be more follow-up discussions.

I won't block this PR from merging if other reviewers of this PR (@gianm @jihoonson @jon-wei) agree with that design on a high level (or propose another solution that solves the same problem) and it's being implemented just after this PR. Because the current design doesn't seem reasonable to me at this point. (So there won't be much difference from as if you just do the implementation right in this PR, but if you wish you can separate in two PRs.)

@leventov I am working on #7571. Agree the current API is not most efficient and I acknowledge your concern. While I am not sure what's the most appropriate way to avoid recalculating overshadowed segments yet, I looked at the suggested changes and I have some questions, which I have asked in #7571. Could we agree to discuss the design there, I think it'll make it easier for you and me and others to review those changes as this PR is getting crowded, and we may miss some parts about the new changes as they get mixed up with existing changes.

I haven't really formed an opinion on DataSegment mutability presently, but I think @leventov's suggestion for lazily computing the overshadowed view at most once per SQLSegmentMetadataManager poll() and sharing that view with the metadata retrieval APIs and the coordinator balancing logic makes a lot of sense.

Because the current design doesn't seem reasonable to me at this point. (So there won't be much difference from as if you just do the implementation right in this PR, but if you wish you can separate in two PRs.)

I agree with making the adjustment to the overshadowed view computation as an immediate follow on, I think a separate PR is a bit better:

The coordinator balancing logic is a pretty "core" part of the system, and I feel like it would be better to change that in a separate PR that calls attention more explicitly to that/isolates that change more

This PR is getting a bit long, a little tedious to navigate

…rshadowed-segments-fix

Add is_overshadowed column to sys.segments table

1f342cf

surekhasaharan added the Area - SQL label Apr 8, 2019

update docs

635921d

Surekha Saharan added 2 commits April 10, 2019 10:55

Rename class and variables

f59e8cc

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

9bb9419

…rshadowed-segments-fix

jon-wei self-assigned this Apr 10, 2019

jon-wei reviewed Apr 11, 2019

View reviewed changes

Surekha Saharan added 2 commits April 11, 2019 14:01

PR comments

f899c0d

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

831752c

…rshadowed-segments-fix

jon-wei reviewed Apr 12, 2019

View reviewed changes

leventov requested changes Apr 12, 2019

View reviewed changes

Surekha Saharan added 9 commits April 12, 2019 16:34

PR comments

4afb4d5

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

b442c77

…rshadowed-segments-fix

remove unused variables in MetadataResource

75f69f1

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

e6d404d

…rshadowed-segments-fix

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

1212cef

…rshadowed-segments-fix

move constants together

9022399

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

0e7fe38

…rshadowed-segments-fix

add getFullyOvershadowedSegments method to ImmutableDruidDataSource

fd5c8b7

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

f3f95a4

…rshadowed-segments-fix

jon-wei approved these changes Apr 18, 2019

View reviewed changes

Surekha Saharan added 2 commits April 18, 2019 13:19

Fix compareTo of SegmentWithOvershadowedStatus

95ad416

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

3caa904

…rshadowed-segments-fix

gianm reviewed Apr 19, 2019

View reviewed changes

Surekha Saharan added 2 commits April 19, 2019 14:31

PR comment

cd7f468

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

7e25898

…rshadowed-segments-fix

Surekha Saharan added 2 commits April 20, 2019 17:19

PR comments

48f95c0

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

5bffa38

…rshadowed-segments-fix

leventov reviewed Apr 22, 2019

View reviewed changes

Surekha Saharan added 2 commits April 22, 2019 17:37

PR comments

0c6d27f

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

5af66b5

…rshadowed-segments-fix

leventov requested changes Apr 24, 2019

View reviewed changes

Surekha Saharan added 6 commits April 24, 2019 11:55

PR comments

3ad8e34

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

d6d0ad7

…rshadowed-segments-fix

PR comments

d6bc303

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

014f4dc

…rshadowed-segments-fix

fix issue with already consumed stream

2c692bb

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

3366b88

…rshadowed-segments-fix

shuqi7 mentioned this pull request Apr 25, 2019

Add is_overshadowed column to segment table #7555

Merged

jihoonson reviewed Apr 26, 2019

View reviewed changes

Surekha Saharan added 2 commits April 26, 2019 16:10

minor refactoring

75183d0

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

e1eecd2

…rshadowed-segments-fix

leventov reviewed Apr 29, 2019

View reviewed changes

surekhasaharan mentioned this pull request Apr 29, 2019

Optimize coordinator API to retrieve segments with overshadowed status #7571

Open

Surekha Saharan added 2 commits April 29, 2019 12:43

PR comments

52e297d

Merge branch 'master' of github.com:druid-io/druid into sys-table-ove…

08f627f

…rshadowed-segments-fix

leventov approved these changes May 1, 2019

View reviewed changes

leventov merged commit 15d19f3 into apache:master May 1, 2019

surekhasaharan mentioned this pull request May 7, 2019

Add "is_overshadowed" column to sys.segments table #7233

Closed

jihoonson added this to the 0.15.0 milestone May 16, 2019

jihoonson mentioned this pull request Jun 9, 2019

0.15.0-incubating release notes #7854

Closed

		@@ -155,7 +158,8 @@ public Response getDatabaseSegmentDataSource(@PathParam("dataSourceName") final
		@Produces(MediaType.APPLICATION_JSON)
		public Response getDatabaseSegments(

		// timestamp is used to filter deleted segments
		publishedSegments.put(interned, timestamp);

		@@ -141,13 +139,8 @@ public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params)

		private Set<DataSegment> determineOvershadowedSegments(DruidCoordinatorRuntimeParams params)

Add is_overshadowed column to sys.segments table #7425

Add is_overshadowed column to sys.segments table #7425

Conversation

surekhasaharan commented Apr 8, 2019 • edited Loading

vogievetsky commented Apr 8, 2019

surekhasaharan commented Apr 10, 2019

jon-wei commented Apr 10, 2019 • edited Loading

surekhasaharan commented Apr 10, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

surekhasaharan Apr 16, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leventov Apr 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-wei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

surekhasaharan commented Apr 8, 2019 •

edited

Loading

jon-wei commented Apr 10, 2019 •

edited

Loading

surekhasaharan Apr 16, 2019 •

edited

Loading

leventov Apr 18, 2019 •

edited

Loading