Skip to content

Set hosting goals to ALWAYS in getDiskUsage test#3393

Closed
jmark99 wants to merge 1 commit intoapache:elasticityfrom
jmark99:diskusage
Closed

Set hosting goals to ALWAYS in getDiskUsage test#3393
jmark99 wants to merge 1 commit intoapache:elasticityfrom
jmark99:diskusage

Conversation

@jmark99
Copy link
Contributor

@jmark99 jmark99 commented May 11, 2023

Ran into an issue where the getDiskUsage test was timing out in the TableOperationsIT class. Time to load the table from ONDEMAND to being fully loaded was taking too much time and resulting in the test randomly failing.

This PR sets the hosting goal to ALWAYS for the tables within the getDiskUsage test. The update allows the test to complete within the default timeout value.

Ran into an issue where the getDiskUsage test was timing out in
TableOperationsIT class. Time to transition from ONDEMAND to being fully
loaded was causing the test to randomly fail. Setting the hosting goal
to ALWAYS allows the compactions to complete within the default timeout
time.
@dlmarion
Copy link
Contributor

dlmarion commented May 11, 2023

We did not fix the ITs when merging the OnDemand changes because we thought that it made sense that they continue to fail until we moved the tablet management code from the TabletServer to the Manager. For getting disk usage, TableDiskUsage.getDiskUsage scans either the Root or Metadata tables, both of which should always be hosted. While this fix makes the test pass, I'm not sure that it's the right one. Do you know where in the codepath it's waiting for the tables to be hosted? I think that's what we want to fix.

Edit: Nevermind, I see now that it's not the getDiskUsage, it's the compact that is happening in the test. I think it would be good to set them back to ONDEMAND and wait for them to unload before continuing with the getDiskUsage call, to confirm that the operation does work with tablets that are not hosted. See this and this for an example on how to do that.

@jmark99
Copy link
Contributor Author

jmark99 commented May 11, 2023

@dlmarion gotcha! I haven't looked into to it too deeply at this point. I know the timeout occurs when the compaction of the cloned table is called.

I will close this PR as not going to complete as we wait for other updates to be made in the Elasticity code.

@EdColeman
Copy link
Contributor

EdColeman commented May 11, 2023

Alternatively - would it be worth while to have test for both ALWAYS host and another for EVENTUALLY?

@dlmarion
Copy link
Contributor

Alternatively - would it be worth while to have test for both ALWAYS host and another for EVENTUALLY?

I don't think so. When we move the compaction logic from the TabletServer to the Manager, then this should not time out. It should just work regardless of tablet hosting state.

@jmark99
Copy link
Contributor Author

jmark99 commented May 12, 2023

Closing as not going to fix.

@jmark99 jmark99 closed this May 12, 2023
@jmark99
Copy link
Contributor Author

jmark99 commented May 12, 2023

Not going to fix at this time as this functionality will be moving to the manager and be addressed at that time.

@jmark99 jmark99 deleted the diskusage branch May 12, 2023 13:06
@ctubbsii ctubbsii added this to the 4.0.0 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants