Skip to content

fix DruidSchema incorrectly listing tables with no segments#10660

Merged
clintropolis merged 5 commits intoapache:masterfrom
clintropolis:fix-information-schema-table-removal
Dec 11, 2020
Merged

fix DruidSchema incorrectly listing tables with no segments#10660
clintropolis merged 5 commits intoapache:masterfrom
clintropolis:fix-information-schema-table-removal

Conversation

@clintropolis
Copy link
Member

Description

This PR fixes an issue where tables with all of their segments dropped could still incorrectly show up in the INFORMATION_SCHEMA tables when queries with SQL. The underlying issue was caused by a race condition between DruidSchema.tables being populated upon rebuilding the tables, segment removal handlers removing segments from tables when no other segments were left (but not dataSourcesNeedingRebuild), and DruidSchema.dataSourcesNeedingRebuild not considering if the table had been removed from tables due to all the segments being dropped in the time between being added to the refresh list and actually doing the refresh.

The added an additional step to the ITBroadcastJoinQueryTest integration test that would fail prior to the changes in DruidSchema, and chose this test in particular to cover the most surface area since there are segments on both the historical and the broker.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Key changed/added classes in this PR
  • DruidSchema

// Add missing segments back to the refresh list.
segmentsNeedingRefresh.addAll(Sets.difference(segmentsToRefresh, refreshed));

// remove any tables which have been completely dropped before refresh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's undoing a mistake in removeSegment — the datasource should never have been added to dataSourcesNeedingRebuild in the first place. IMO it'd be better to make the change there. Doing it here works, though, so I'm OK with it if you don't want to move the logic around.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the removal of segments from the broker could also mean that a datasource is no longer global/joinable and so also could require a rebuild, so i'm not sure we can remove it anymore without rethinking/reworking some other stuff. Since there were multiple places that we add to dataSourcesNeedingRebuild, but only one place we add to tables, it seemed less room for error to do the filtering here in the only place we populate it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeSegment is only called when the last replica of a segment is gone, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point, the broker thing happens in removeServerSegment, not removeSegment. Ok, i'll remove the dataSourcesNeedingRebuild.add from removeSegment 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be there, but only called if there's some other segments still remaining.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like one of the integration tests failed after the latest modifications, so more than just moving dataSourcesNeedingRebuild.add to the else is needed. I suspect it is the removeServerSegment which should also probably check if there are any segments before adding to the list in case it is triggered by the broker (broker segments are not currently tracked in segmentMetadataInfo since they will always currently be duplicated on a historical)

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

.build();
knownSegments.put(segment.getId(), metadataWithNumReplicas);
final Map<SegmentId, AvailableSegmentMetadata> knownSegments = segmentMetadataInfo.get(segment.getDataSource());
if (knownSegments != null && !knownSegments.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could knownSegments be null or empty here? If removeServerSegment is being called, doesn't that suggest we should have previously had an addSegment call, and therefore the segment should be known?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I couldn't work out in my mind that it definitively would be there so added this check as a precaution. Note that segments on brokers are not added to segmentMetadataInfo (or currently stored in the broker server view timeline), so my fear was cases where the segment is dropped from all historicals before the broker.

That said, I don't really like this if statement either, let me see if i can determine for sure if this can happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I was assuming that segments get added to segmentMetadataInfo no matter where they're served from.

Do you know why that's not the case? I mean, why not add segments served by Brokers to segmentMetadataInfo?

At any rate, if it's not the case, then I think knownSegments could be null here if it's dropped from historicals before the broker, so it makes sense to check for it. There should be a comment about why this can happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why that's not the case? I mean, why not add segments served by Brokers to segmentMetadataInfo?

Hmm, trying to remember exactly. I know BrokerServerView doesn't track broker served segments on the timeline because it would require changes to query server selection logic to either filter out brokers, or, make it smart enough to query locally through the brokers segment manager directly whenever it picked itself (so that it doesn't make an infinite query loop by submitting another query to itself and so on). Side note, fixing this would be a cool improvement because it would allow for brokers to query other brokers, which with some other work could potentially allow for cool broker trees where not every broker has to track segments of every historical, but could still query every segment.

I think that DruidSchema actually could probably track them though, since it doesn't use the broker server views timelines, just the events fired from the timeline (which still do happen for segments served by brokers to allow things like DruidSchema to act on them). I suspect that I skipped broker segment tracking simply because it wasn't really necessary to compute the schema itself, only whether the datasource is globally available, but it doesn't seem like it would be harmful to add them. I'll look into making the change, because that would be less magical behavior.

At any rate, if it's not the case, then I think knownSegments could be null here if it's dropped from historicals before the broker, so it makes sense to check for it. There should be a comment about why this can happen.

I guess the null check really only belongs inside of if (server.getType().equals(ServerType.BROKER)) clause since that is the only case it can be null. If changing to just track broker segments is problematic for some reason, I'll make this change and add a comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, segmentRemoved does not get triggered from broker segments by BrokerServerView since it is ignoring the broker segments (segmentRemoved is fired only when segment is completely removed from timeline which doesn't contain broker segments).

I'll just keep it this way for now I think, and add some comments about why the null check and how someday DruidSchema could consider tracking the broker segments when BrokerServerView is modified to track broker served segments.

@lgtm-com
Copy link

lgtm-com bot commented Dec 10, 2020

This pull request introduces 1 alert when merging 32ecbf7 into 753fa6b - view on LGTM.com

new alerts:

  • 1 for Dereferenced variable may be null

@clintropolis clintropolis merged commit 64f97e7 into apache:master Dec 11, 2020
@clintropolis clintropolis deleted the fix-information-schema-table-removal branch December 11, 2020 22:14
harinirajendran pushed a commit to harinirajendran/druid that referenced this pull request Dec 15, 2020
…0660)

* fix race condition with DruidSchema tables and dataSourcesNeedingRebuild

* rework to see if it passes analysis

* more better

* maybe this

* re-arrange and comments
@jihoonson jihoonson added this to the 0.21.0 milestone Jan 4, 2021
JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jan 22, 2021
…0660)

* fix race condition with DruidSchema tables and dataSourcesNeedingRebuild

* rework to see if it passes analysis

* more better

* maybe this

* re-arrange and comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants