-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update DimTableSegmentAssignment to include both OFFLINE and REALTIME servers #6530
Update DimTableSegmentAssignment to include both OFFLINE and REALTIME servers #6530
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
(Not related to this PR) For easier management of the cluster, I would recommend having a separate tenant for the dimension table so that dimension table can be shared by multiple tenants. You can configure the server with multiple tags, and the server can belong to multiple tenants.
Since this is the first time we have queries across tables, I think it's a good time to discuss the policy. There are two options: join tables within the tenant, and join tables across tenants. Personally I prefer a default constraint that the tables to join are within the same tenant for better isolation. But given the broadcast join nature, the dimension table is in fact copied to all tenants. Nevertheless, I feel it's good to have this high-level consideration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
|
||
@Override | ||
public Map<String, Map<String, String>> rebalanceTable(Map<String, Map<String, String>> currentAssignment, | ||
Map<InstancePartitionsType, InstancePartitions> instancePartitionsMap, @Nullable List<Tier> sortedTiers, | ||
@Nullable Map<String, InstancePartitions> tierInstancePartitionsMap, Configuration config) { | ||
String serverTag = TagNameUtils.extractOfflineServerTag(_tenantConfig); | ||
List<String> instances = HelixHelper.getInstancesWithTag(_helixManager, serverTag); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
Currently there is no broadcast happening right? I thought the design was to have full dimension table colocated with fact table servers so that each server does a local join with part of fact table and full dimension table. For the eventual implementation of fact-dimension using broadcast, I don't think we should have cross tenant joins since the data movement will take longer. We can probably allow it but the recommendation should probably be joins within the same tenant. Joins across tenant essentially become a cross cluster join which is likely to be expensive (but depends on the topology I guess) |
Thanks for the reviews @Jackie-Jiang , @yupeng9 .
@yupeng9, small correction on the last part, currently dimension tables are not copied to all tenants, but all servers in a single tenant. I also agree with your opinion here to default to restricting joins to in-tenant only. Otherwise, besides isolation, unnecessary memory usage will be another issue since dim tables are loaded into heap directly. I think @Jackie-Jiang's tag based cross-tenant solution could be useful for cases where folks have set up different tenants for REALTIME and OFFLINE servers of the same table and would like to share a dimension table.
You're right, there is no data movement at query time. Dimension tables are distributed only at segment placement time, according to the segment assignment policy as updated in this PR. |
Yes, that's right: all servers of the same tenant.
|
I think we are on the same page: the dimension table is distributed (i.e. broadcasted) to all servers, though not at the query time.
|
@Jackie-Jiang , the current design is based on pre placement of dimension table with fact table servers so there is no broadcast at query time. With this setup, how can we have dimension table in a separate tenant? On a related note, we can possibly have dimension table assigned to multiple tags/tenants. So for joining a hybrid fact table with offline dimension table, the dim table will be present in both offline tenant and realtime tenant servers of the fact table. So, there will be one tenant local join on the offline side and one tenant local join on the realtime side. I believe this is what you mean by saying dimension table can be shared by multiple tenants ? |
@siddharthteotia if we have 5 tables on different tenant, all of them need to join with the same dimension table. If we always colocate them using the same tenant, we need to have 5 copies of the dimension table stored in the deep store. If we use a separate tenant for the dimension table, we only need to store one copy. Each server can be configured with multiple tenants, so that both the fact table and dimension table are still colocated. |
Agreed that it's outside the scope of this PR. But for better cluster management, I suggest we make 5 copies (though can be soft copies) for better isolation. In particular, I suggest we consider using |
Description
Fixes a bug with the DimTableSegmentAssignment logic where we skipped REALTIME servers when distributing segments.
We want dimension tables to be distributed to REALTIME servers as well as OFFLINE servers to enable "Realtime Fact Table to Dimension Table Joins".
Testing
Unit tests are updated accordingly.