New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLOUDSTACK-9796 - Fix NPE in VirtualMachineManagerImpl.java #1956
Conversation
Thanks for the code that prevents NULL pointer exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nathanejohnson I have a question here before we proceed.
_workDao.updateStep(work, step); | ||
Step previousStep = null; | ||
if (work != null) { | ||
previousStep = work.getStep(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can " work.getStep()" return null?
I see that you add a check at line 757 previousStep != null
. Why would we need that check there, and not need it here (line750)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rafaelweingartner if work is null, previousStep will stay null. Maybe not the clearest way to handle this, but this prevents a null work from being passed down below. In other words, if work is null, previousStep will be guaranteed null, and if previousStep is not null, then work is guaranteed to be not null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now I think I am starting to get it.
But I am still not sure about some things here, would you mind continue discussing?
If the work is not null, you get the previous step (let’s assume it is not null) and call the method _workDao.updateStep(work, step)
. After this, you call stateTransitTo(vm, event, hostId)
. Why do we need to call _workDao.updateStep(work, previousStep)
again at line 758 that is executed when the method finishes? The previousStep
continues to be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this code I didn't write, but I can make some guesses:
_workDao.updateStep(work, previousStep) line is in the finally block, which will execute even if an exception is thrown in stateTransitTo (like NoTransitException for instance). So if stateTransitTo a) returns a false, or b) throw an exception, then result will be false, and line 758 will run. So if something happens that the state isn't transitioned, someone wanted the work reverted to its previous step value. Sort of a rollback maybe?
In the case of the VM hung in starting, my desired side effect is I want stateTransitTo to be called and set the state to Stopped , i.e., Event.AgentReportStopped -> State.Stopped . The work has already expired at this point, so it is null. I was trying to preserve the same behavior as before when work was not null.
Sorry if this wasn't very clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and to your point earlier, getStep() generally shouldn't ever return a null I don't think , because the step column in the op_it_work table is marked not null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not asking to make a distinction between exception or not. What I tried to say is that, if the intent/purpose of the finally
block was only to revert the step to a previous state when exceptions occur, we could do that using a catch
block. I think the finally here is meant to revert the state of work step even if an exception does not happen, for instance when stateTransitTo
returns false
.
I think you already answered my doubt; when you said that the previousStep
is most likely never to be null
. I thought we could have cases where previousStep == null
, and then if the stateTransitTo
returns false, with the newly added check at line 757, we would not update the step back to null
for these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think something like:
if (!result && work != null) {
would be better? Even if work.getStep() did return a null, that should have the same effect as before. Maybe it would be more readable too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this is more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks for the input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thank you
This checks the work variable for NULL in all cases where it is used. Fixes CLOUDSTACK-9796.
7f62924
to
91bfedd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
_workDao.updateStep(work, step); | ||
Step previousStep = null; | ||
if (work != null) { | ||
previousStep = work.getStep(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thank you
LGTM by Rafael in commit review above (for acspr). |
@nathanejohnson thanks for this bug fix |
@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✖centos6 ✔centos7 ✔debian. JID-571 |
@blueorangutan package |
@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-572 |
@blueorangutan test |
@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
Trillian test result (tid-936)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM based on test results and code review
Great, this one looks ready to merge.
|
@GabrielBrascher privategw_acl and vpc_redundant test failures have been a pain for a while, couldn't managed to fix those tests. If someone could spend time and fix them would be really good. While the test_snapshot is already fixed, in fact we just merged the fix in Trillian as well and next run would pass. False positives should also be addressed with negative assertion within tests. Having a "Failure" test result should be considered as Failed test case, just my 2 cents.. |
Got it @borisstoyanov. Thanks! |
LGTM. /cc @karuturi |
This checks the work variable for NULL in all cases where it is
used. Fixes CLOUDSTACK-9796.