-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-8874. Use OM Service ID instead of OM Node ID for container ownership #4922
Conversation
Hi @fapifta @ChenSammi, could you help take a look and let me know what you think? Thank you. |
I think the omID needs to be the same for all om instances in a HA group. There should not be a need to close all containers on SCM just because there is an OM leader switch. I am also not sure about the need to have different containers for different OM HA groups sharing a single SCM. OM just deals with blocks, SCM groups blocks into containers. I am not sure if we need to be sure that the blocks from OM group 1 do not mix with blocks from OM group 2 in the same containers. We don't have storage policies in Ozone at the moment, and even when we do, there is going to be a mix of "hot" and "cold" data in the same containers. Having blocks from two different OM HA groups in the same container doesn't sound much different to that. |
hi, @ivandika3 , thanks for work this. For example, for a 3-node om ha, does the omId change when the current om changes in the leader? |
@sodonnel Thank you for your input.
I agree. Maybe instead of coming up with a new unique id I can find a way to synchronize the omID instead and use it as the OM HA service identifier. Please let me know what you think.
I also have doubts regarding the necessity of this. Since I'm not really well-versed in the SCM-side yet, I decided to take the conservative approach and try to maximize the resource isolation (i.e. distinct containers for OM HA groups). I did some tests manually in my cluster, currently there was no apparent issue. Anyway I think the single unique ID for a single OM HA group is still a worthwhile endeavor. |
@guohao-rosicky Thank you for addressing this.
Do you mean the omId passed to the |
As things stand, EC containers will not use the owner passed in when selecting a pipeline. @guohao-rosicky has a PR up to address it, but I feel the solution is more complex, as EC pipelines and containers are 1 to 1, and essentially the same thing. With EC it would mean maintaining a complete set of new pipelines per owner and per EC scheme. |
@sodonnel I see, currently our cluster does not use EC yet. Therefore, I'm not really familiar with it. So you mean that one EC pipeline only maps to one EC container? Does that mean that the owner should be on the pipeline, instead of the container level? |
Yes, a pipeline is created and a container assigned immediately. When that container is full, the pipeline and container are both closed. EC pipelines creation is cheap, as its really just picking a set of nodes to use. There are also limits on the number of EC pipelines. An EC key for 3-2 cannot go into the same container as a 6-3 container, so there is a group of pipelines per EC scheme. If we add owner, then there will need to be a group of pipelines per owner and per EC scheme, so each owner would double the EC pipelines and potentially open containers in the system. That is why we should question if the separation is really needed before jumping into doing it. |
@sodonnel Thank you for the info regarding EC containers and pipelines. Yes, I don't think moving the owner to pipeline level should be done for now.
Coming back to this, currently omID (UUID) is used to uniquely differentiate between the OMs as certificate clients. Therefore, I don't think this should be set to the same value across all the OMs. This might require us to have a new UUID representing the OM HA Service. The way I see it, there are two possible issues we are trying to resolve with regards SCM container owners:
I believe that by changing the owner to omID (UUID), it will resolve the first issue. If we only want to resolve the second issue, the easiest way is to use the OM Service ID for the container owner. If we want to resolve both, I think it requires a new unique UUID to represent the OM HA Service, which requires to come up with a more involved solution (e.g. how to synchronize the UUID between OMs, etc). We can limit the scope for this patch to achieve the first goal. @sodonnel @guohao-rosicky Please let me know what you think. |
From the code comment regarding Container Owners in
It seems the intent is to have one owner per Name service, which is comparable to OM Service ID, instead of OM Node ID. We can change it in OM Service ID instead. In our cluster, we finally decided to use OM Service ID instead of OM ID (uuid) for a few reasons (only with respect to Ratis containers):
|
@ivandika3 @sodonnel what is the next step for this PR? Do we need more reviews, a design doc, test coverage, upgrade strategy, or something else? |
I am not too sure about the use cases around multiple OMs sharing a single SCM. However from the discussion here it does sound as though the owner passed from OM to SCM even in a single HA OM group using SCM is wrong, as SCM will see a different ID for each OM in the HA group. What we really want is a single ID shared by all 3 OMs in a HA group. Therefore we should probably fix that part in this PR. |
@adoroszlai @sodonnel I think OM Service ID should suffice as the container owner for now. FYI, our cluster have two OM services due to stability issues when sharing a single OM service among different workloads which will affect the stability of one critical use case. Therefore, we spinned up a new OM service to isolate a specific critical use case from being affected by other workloads. Container owner is just a possible contention that we identified when we were analyzing the safety of having two OM services sharing a single SCM. Regarding the plan to support the ec container to the owner, I currently do not have enough prerequisite EC knowledge and bandwidth to address this. This can be addressed in different ticket after this is merged, or we can close this ticket and change the container's owner once the solution of "support the ec container to the owner" is already finalized. I'll leave it to your discretions @adoroszlai @sodonnel @guohao-rosicky. |
The ability for multiple OM replication groups to use the same SCM and datanode set is exciting and is a good step towards sharding OM. OM Service ID should be a good proxy for the replicated OM set. I will go over the PR in the coming days. |
LGTM. I think using OM service ID as the container owner makes sense. There's been a fair amount of discussion here, so I'll leave it open if others have further comments. Just one minor comment, in |
Thank you for the review @errose28.
Updated. |
Thanks @ivandika3 for updating the patch. There is a checkstyle error:
https://github.com/ivandika3/ozone/actions/runs/6597636911/job/17924816077#step:7:11 |
@adoroszlai Apologies, I have fixed the checkstyle failure. Thanks for checking it. |
Thanks @ivandika3 for the patch, @errose28, @guohao-rosicky, @kerneltime, @sodonnel for the review. |
What changes were proposed in this pull request?
Currently, OM passes OzoneManager#getOmNodeId as the owner argument in OMKeyRequest#allocateBlock. This means that the created container's owner is based on the OM HA leader's Node ID specified in the ozone configuration (i.e. ozone.om.node.id).
In the case where there are multiple OMs using the same SCM. If the Node IDs configured are not unique across the SCM cluster, there might be a situation where a single container contains blocks owned by multiple OMs. This should be fine for now, but in the future if we want to separate the containers based on the Storage Type, this collision might raise tricky situations
For example for configuration (omitting om addresses)
If the OM leader of both service1 and service2 has the same node ID (e.g. om1), both OM will use the configured node ID as the owner when allocating block, and SCM will assume that the request comes from the same owner and allocate the same container for the blocks.
We should find a way to pick a unique ID to pass to the allocateBlock call. Currently a valid option is to use omUuid since the chance of collisions would be smaller. This patch uses omUuid as the owner of SCM container.
However, a better option for the container owner is to pick a unique ID for a single OM HA service (instead of node ID). This has possible additional advantages:
When the OM leader change, the SCM does not need to create a new container for the new blocks
Fewer number of OPEN containers and all containers in SCM overall
Unfortunately, I have not found such an ID available. The candidates I have considered:
Another consideration is to migrate the existing containers's owner to the new naming conventions. This might be able to be done by using some container task. This issue can be handled in another ticket.
Any further suggestions are greatly appreciated.
Edit: We decided to use OM Service ID for the container ownership since they are supposed to be unique, albeit not fully enforced.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8874
How was this patch tested?
Existing unit tests.