Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLOUDSTACK-9796 - Fix NPE in VirtualMachineManagerImpl.java #1956

Merged
merged 1 commit into from
Apr 22, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -744,14 +744,17 @@ public Ternary<VMInstanceVO, ReservationContext, ItWorkVO> doInTransaction(final

protected <T extends VMInstanceVO> boolean changeState(final T vm, final Event event, final Long hostId, final ItWorkVO work, final Step step) throws NoTransitionException {
// FIXME: We should do this better.
final Step previousStep = work.getStep();
_workDao.updateStep(work, step);
Step previousStep = null;
if (work != null) {
previousStep = work.getStep();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can " work.getStep()" return null?
I see that you add a check at line 757 previousStep != null. Why would we need that check there, and not need it here (line750)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelweingartner if work is null, previousStep will stay null. Maybe not the clearest way to handle this, but this prevents a null work from being passed down below. In other words, if work is null, previousStep will be guaranteed null, and if previousStep is not null, then work is guaranteed to be not null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now I think I am starting to get it.
But I am still not sure about some things here, would you mind continue discussing?

If the work is not null, you get the previous step (let’s assume it is not null) and call the method _workDao.updateStep(work, step). After this, you call stateTransitTo(vm, event, hostId). Why do we need to call _workDao.updateStep(work, previousStep) again at line 758 that is executed when the method finishes? The previousStep continues to be the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this code I didn't write, but I can make some guesses:

_workDao.updateStep(work, previousStep) line is in the finally block, which will execute even if an exception is thrown in stateTransitTo (like NoTransitException for instance). So if stateTransitTo a) returns a false, or b) throw an exception, then result will be false, and line 758 will run. So if something happens that the state isn't transitioned, someone wanted the work reverted to its previous step value. Sort of a rollback maybe?

In the case of the VM hung in starting, my desired side effect is I want stateTransitTo to be called and set the state to Stopped , i.e., Event.AgentReportStopped -> State.Stopped . The work has already expired at this point, so it is null. I was trying to preserve the same behavior as before when work was not null.

Sorry if this wasn't very clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and to your point earlier, getStep() generally shouldn't ever return a null I don't think , because the step column in the op_it_work table is marked not null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not asking to make a distinction between exception or not. What I tried to say is that, if the intent/purpose of the finally block was only to revert the step to a previous state when exceptions occur, we could do that using a catch block. I think the finally here is meant to revert the state of work step even if an exception does not happen, for instance when stateTransitTo returns false.

I think you already answered my doubt; when you said that the previousStep is most likely never to be null. I thought we could have cases where previousStep == null, and then if the stateTransitTo returns false, with the newly added check at line 757, we would not update the step back to null for these cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think something like:

if (!result && work != null) {

would be better? Even if work.getStep() did return a null, that should have the same effect as before. Maybe it would be more readable too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is more readable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks for the input

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thank you

_workDao.updateStep(work, step);
}
boolean result = false;
try {
result = stateTransitTo(vm, event, hostId);
return result;
} finally {
if (!result) {
if (!result && work != null) {
_workDao.updateStep(work, previousStep);
}
}
Expand Down Expand Up @@ -1507,12 +1510,13 @@ private void advanceStop(final VMInstanceVO vm, final boolean cleanUpEvenIfUnabl
if (doCleanup) {
if (cleanup(vmGuru, new VirtualMachineProfileImpl(vm), work, Event.StopRequested, cleanUpEvenIfUnableToStop)) {
try {
if (s_logger.isDebugEnabled()) {
if (s_logger.isDebugEnabled() && work != null) {
s_logger.debug("Updating work item to Done, id:" + work.getId());
}
if (!changeState(vm, Event.AgentReportStopped, null, work, Step.Done)) {
throw new CloudRuntimeException("Unable to stop " + vm);
}

} catch (final NoTransitionException e) {
s_logger.warn("Unable to cleanup " + vm);
throw new CloudRuntimeException("Unable to stop " + vm, e);
Expand Down