fix DruidSchema incorrectly listing tables with no segments#10660
Conversation
| // Add missing segments back to the refresh list. | ||
| segmentsNeedingRefresh.addAll(Sets.difference(segmentsToRefresh, refreshed)); | ||
|
|
||
| // remove any tables which have been completely dropped before refresh |
There was a problem hiding this comment.
This looks like it's undoing a mistake in removeSegment — the datasource should never have been added to dataSourcesNeedingRebuild in the first place. IMO it'd be better to make the change there. Doing it here works, though, so I'm OK with it if you don't want to move the logic around.
There was a problem hiding this comment.
Ah, the removal of segments from the broker could also mean that a datasource is no longer global/joinable and so also could require a rebuild, so i'm not sure we can remove it anymore without rethinking/reworking some other stuff. Since there were multiple places that we add to dataSourcesNeedingRebuild, but only one place we add to tables, it seemed less room for error to do the filtering here in the only place we populate it.
There was a problem hiding this comment.
removeSegment is only called when the last replica of a segment is gone, though.
There was a problem hiding this comment.
Oh good point, the broker thing happens in removeServerSegment, not removeSegment. Ok, i'll remove the dataSourcesNeedingRebuild.add from removeSegment 👍
There was a problem hiding this comment.
I think it should be there, but only called if there's some other segments still remaining.
There was a problem hiding this comment.
Looks like one of the integration tests failed after the latest modifications, so more than just moving dataSourcesNeedingRebuild.add to the else is needed. I suspect it is the removeServerSegment which should also probably check if there are any segments before adding to the list in case it is triggered by the broker (broker segments are not currently tracked in segmentMetadataInfo since they will always currently be duplicated on a historical)
| .build(); | ||
| knownSegments.put(segment.getId(), metadataWithNumReplicas); | ||
| final Map<SegmentId, AvailableSegmentMetadata> knownSegments = segmentMetadataInfo.get(segment.getDataSource()); | ||
| if (knownSegments != null && !knownSegments.isEmpty()) { |
There was a problem hiding this comment.
How could knownSegments be null or empty here? If removeServerSegment is being called, doesn't that suggest we should have previously had an addSegment call, and therefore the segment should be known?
There was a problem hiding this comment.
Hmm, I couldn't work out in my mind that it definitively would be there so added this check as a precaution. Note that segments on brokers are not added to segmentMetadataInfo (or currently stored in the broker server view timeline), so my fear was cases where the segment is dropped from all historicals before the broker.
That said, I don't really like this if statement either, let me see if i can determine for sure if this can happen.
There was a problem hiding this comment.
Oh, I see. I was assuming that segments get added to segmentMetadataInfo no matter where they're served from.
Do you know why that's not the case? I mean, why not add segments served by Brokers to segmentMetadataInfo?
At any rate, if it's not the case, then I think knownSegments could be null here if it's dropped from historicals before the broker, so it makes sense to check for it. There should be a comment about why this can happen.
There was a problem hiding this comment.
Do you know why that's not the case? I mean, why not add segments served by Brokers to segmentMetadataInfo?
Hmm, trying to remember exactly. I know BrokerServerView doesn't track broker served segments on the timeline because it would require changes to query server selection logic to either filter out brokers, or, make it smart enough to query locally through the brokers segment manager directly whenever it picked itself (so that it doesn't make an infinite query loop by submitting another query to itself and so on). Side note, fixing this would be a cool improvement because it would allow for brokers to query other brokers, which with some other work could potentially allow for cool broker trees where not every broker has to track segments of every historical, but could still query every segment.
I think that DruidSchema actually could probably track them though, since it doesn't use the broker server views timelines, just the events fired from the timeline (which still do happen for segments served by brokers to allow things like DruidSchema to act on them). I suspect that I skipped broker segment tracking simply because it wasn't really necessary to compute the schema itself, only whether the datasource is globally available, but it doesn't seem like it would be harmful to add them. I'll look into making the change, because that would be less magical behavior.
At any rate, if it's not the case, then I think knownSegments could be null here if it's dropped from historicals before the broker, so it makes sense to check for it. There should be a comment about why this can happen.
I guess the null check really only belongs inside of if (server.getType().equals(ServerType.BROKER)) clause since that is the only case it can be null. If changing to just track broker segments is problematic for some reason, I'll make this change and add a comment.
There was a problem hiding this comment.
Oh, segmentRemoved does not get triggered from broker segments by BrokerServerView since it is ignoring the broker segments (segmentRemoved is fired only when segment is completely removed from timeline which doesn't contain broker segments).
I'll just keep it this way for now I think, and add some comments about why the null check and how someday DruidSchema could consider tracking the broker segments when BrokerServerView is modified to track broker served segments.
|
This pull request introduces 1 alert when merging 32ecbf7 into 753fa6b - view on LGTM.com new alerts:
|
…0660) * fix race condition with DruidSchema tables and dataSourcesNeedingRebuild * rework to see if it passes analysis * more better * maybe this * re-arrange and comments
…0660) * fix race condition with DruidSchema tables and dataSourcesNeedingRebuild * rework to see if it passes analysis * more better * maybe this * re-arrange and comments
Description
This PR fixes an issue where tables with all of their segments dropped could still incorrectly show up in the
INFORMATION_SCHEMAtables when queries with SQL. The underlying issue was caused by a race condition betweenDruidSchema.tablesbeing populated upon rebuilding the tables, segment removal handlers removing segments fromtableswhen no other segments were left (but notdataSourcesNeedingRebuild), andDruidSchema.dataSourcesNeedingRebuildnot considering if the table had been removed fromtablesdue to all the segments being dropped in the time between being added to the refresh list and actually doing the refresh.The added an additional step to the
ITBroadcastJoinQueryTestintegration test that would fail prior to the changes inDruidSchema, and chose this test in particular to cover the most surface area since there are segments on both the historical and the broker.This PR has:
Key changed/added classes in this PR
DruidSchema