Improve concurrency between DruidSchema and BrokerServerView #11457

jihoonson · 2021-07-16T19:16:55Z

Description

A concurrency issue is recently found between DruidSchema and BrokerServerView, which is, refreshing a DruidTable in DruidSchema can block query processing. This can happen in the scenario described below.

The DruidSchema-Cache thread locks DruidSchema#lock and calls DruidSchema.buildDruidTable() to rebuild the rowSignature of druidTable.
A new segment is added to BrokerServerView. The BrokerServerView thread will lock BrokerServerView#lock, process the new segment added, and call timelineCallbacks of DruidSchema.
The timeline callback of DruidSchema is executed by the same thread used in the step 2. This thread will wait for DruidSchema.buildDruidTable() in the step 1 to be done and DruidSchema#lock is released.
A new query is issued and the query thread will call BrokerServerView.getTimeline(). This call will wait for the timelineCallbacks to be done in the step 2 and BrokerServerView#lock is released.

The following flame graphs show what those threads were doing when this happened in our cluster. The metrics for these flame graphs were collected for 30 seconds.

DruidSchema was calling refreshSegmentsForDataSource() and buildDruidTable(). refreshSegmentsForDataSource issues a SegmentMetadataQuery per segment and locks DruidSchema#lock per row in the result to update segment metadata in memory. buildDruidTable locks DruidSchema#lock while it iterates all columns in all segments in the datasource. When the datasource has 481200 segments and 774 columns in each segment, buildDruidTable took about 25 seconds (!!) on my desktop.

BrokerServerView was blocked in the DruidSchema.addSegment callback.

Finally, there were 2 timeseries queries that were blocked in BrokerServerView.getTimeline() for 30 seconds.

Currently, this can happen whenever DruidSchema needs to refresh DruidTable, which is whenever a new segment is added to the cluster or a segment is completely removed from the cluster. Moving segments should not cause this issue because a new segment is always loaded first in the new server before it is removed from the previous server. As a result, moving segments does not require to update the RowSignature of DruidTable.

To fix this issue, this PR improves the concurrency of DruidSchema by not holding DruidSchema#lock to process expensive operations such as refreshing DruidTables.

A new thread, DruidSchema-Callback, is added in DruidSchema to asynchronously process the timeline callbacks.
The segmentMetadataInfo map is changed to ConcurrentHashMap<String, ConcurrentSkipListMap<SegmentId, AvailableSegmentMetadata>>, so that updating the map doesn't have to lock DruidSchema#lock. Instead, the concurrency control is delegated to ConcurrentMaps. This could potentially make querying the segments table faster because getSegmentMetadataSnapshot() no longer requires to lock DruidSchema#lock.

Finally, this PR does not fix the expensive logic in buildDruidTable(). Incremental updates on DruidTable could be better but it requires that, for each column, we should be able to fall back to the column type in the second most recent segment when the most recent segment disappears. We can research more how we can track those column types of segments efficiently as a follow-up.

Key changed/added classes in this PR

DruidSchema

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

lgtm-com · 2021-07-16T20:27:31Z

This pull request fixes 1 alert when merging 1efbe13 into 8729b40 - view on LGTM.com

fixed alerts:

1 for Dereferenced variable may be null

lgtm-com · 2021-07-16T23:50:29Z

This pull request fixes 1 alert when merging 700f31a into 8729b40 - view on LGTM.com

fixed alerts:

1 for Dereferenced variable may be null

yuanlihan · 2021-07-28T10:04:48Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchema.java

+              if (segmentsMap.remove(segment.getId()) == null) {
+                log.warn("Unknown segment[%s] was removed from the cluster. Ignoring this event.", segment.getId());
+              }
+              totalSegments--;


is it better to change the count only when it's a known segment?

Good catch. I will fix it.

lgtm-com · 2021-07-29T06:33:54Z

This pull request fixes 1 alert when merging 5facb33 into 280c080 - view on LGTM.com

fixed alerts:

1 for Dereferenced variable may be null

clintropolis

👍

clintropolis · 2021-08-05T19:21:15Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchema.java

    final Optional<DruidServerMetadata> historicalServer = servers
        .stream()
-        .filter(metadata -> metadata.getType().equals(ServerType.HISTORICAL))
+        .filter(metadata -> metadata.getType().equals(ServerType.HISTORICAL)
+                            || metadata.getType().equals(ServerType.BROKER))


This is correct, but currently broker segments are not tracked in segment metadata, on the assumption that any segment a broker has is also going to be somewhere on a historical, and if it isn't on any historicals then it either will be soon or, dropped from the broker soon, and it doesn't look like that behavior has changed in this PR.

This isn't great to ignore them, but it does save on potentially complicated logic of not querying segment metadata to ourself and only other brokers (we would probably want to get that information in a different way locally?).

Anyway, it might be worth adding a comment about this, even though it is mentioned in a few other places, like addSegment.

Yeah, I knew the current behavior but thought this could be better. But, on the second thought, probably better to keep the code logic consistent, so I reverted this change and added some comment.

clintropolis · 2021-08-05T20:18:28Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchema.java

-  protected DruidTable buildDruidTable(final String dataSource)
+  /**
+   * This is a helper method for unit tests to emulate heavy work done with {@link #lock}.
+   * It must be used only in unit tests.


nit: maybe nice to move all of these 'only for testing' methods to the end of this file or something to get them out of the way.

clintropolis · 2021-08-05T20:24:34Z

sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchema.java

  @GuardedBy("lock")
  private boolean isServerViewInitialized = false;

+  private int totalSegments = 0;


At first I was wondering if this be volatile, but I don't think it matters since it is only called whenever sys segments scan is run, and not in a loop or anything where it would probably matter. Maybe we should add a comment that it is ok to be neither guarded by, volatile, or a concurrent type?

Sure, added.

lgtm-com · 2021-08-06T02:54:28Z

This pull request fixes 1 alert when merging 5c3d8a3 into 257bc5c - view on LGTM.com

fixed alerts:

1 for Dereferenced variable may be null

jihoonson · 2021-08-06T21:07:24Z

@clintropolis thank you for the review!

Improve concurrency between DruidSchema and BrokerServerView

1efbe13

jihoonson added Performance Area - Querying Area - SQL labels Jul 16, 2021

unused imports and workaround for error prone faiure

700f31a

yuanlihan reviewed Jul 28, 2021

View reviewed changes

count only known segments

5facb33

clintropolis approved these changes Aug 5, 2021

View reviewed changes

add comments

5c3d8a3

jihoonson merged commit e9d964d into apache:master Aug 6, 2021

clintropolis added this to the 0.22.0 milestone Aug 12, 2021

clintropolis mentioned this pull request Sep 3, 2021

[Draft] 0.22.0 Release Notes #11657

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve concurrency between DruidSchema and BrokerServerView #11457

Improve concurrency between DruidSchema and BrokerServerView #11457

jihoonson commented Jul 16, 2021

lgtm-com bot commented Jul 16, 2021

lgtm-com bot commented Jul 16, 2021

yuanlihan Jul 28, 2021

jihoonson Jul 28, 2021

lgtm-com bot commented Jul 29, 2021

clintropolis left a comment

clintropolis Aug 5, 2021

jihoonson Aug 6, 2021

clintropolis Aug 5, 2021

clintropolis Aug 5, 2021

jihoonson Aug 6, 2021

lgtm-com bot commented Aug 6, 2021

jihoonson commented Aug 6, 2021

Improve concurrency between DruidSchema and BrokerServerView #11457

Improve concurrency between DruidSchema and BrokerServerView #11457

Conversation

jihoonson commented Jul 16, 2021

Description

Key changed/added classes in this PR

lgtm-com bot commented Jul 16, 2021

lgtm-com bot commented Jul 16, 2021

yuanlihan Jul 28, 2021

Choose a reason for hiding this comment

jihoonson Jul 28, 2021

Choose a reason for hiding this comment

lgtm-com bot commented Jul 29, 2021

clintropolis left a comment

Choose a reason for hiding this comment

clintropolis Aug 5, 2021

Choose a reason for hiding this comment

jihoonson Aug 6, 2021

Choose a reason for hiding this comment

clintropolis Aug 5, 2021

Choose a reason for hiding this comment

clintropolis Aug 5, 2021

Choose a reason for hiding this comment

jihoonson Aug 6, 2021

Choose a reason for hiding this comment

lgtm-com bot commented Aug 6, 2021

jihoonson commented Aug 6, 2021