New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup build #77
Cleanup build #77
Conversation
fingerprint, err = ImportImage(image, unitParams.PlatformService.Name, bh.Remote) | ||
if err != nil { | ||
if err != nil || abortFlag { | ||
return errors.New("failed to import image: " + err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When interrupting during Image Import, this line causes panic:
"""bash
⠧ Importing mlbase-1.0.tar.gz ^CInterrupting build and cleaning artefacts
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x4528ded]
goroutine 1 [running]:
github.com/bravetools/bravetools/platform.(*BraveHost).InitUnit(0x4bf7d20, {0x486e930, 0xc0002780f0}, 0xc0001c5540)
/Users/ignat/go/src/github.com/beringresearch/bravetools/platform/host_api.go:820 +0x2ad
github.com/bravetools/bravetools/commands.deploy(0x4bed440?, {0x4c28890, 0x0, 0x0?})
/Users/ignat/go/src/github.com/beringresearch/bravetools/commands/deploy.go:98 +0x35f
github.com/spf13/cobra.(*Command).execute(0x4bed440, {0x4c28890, 0x0, 0x0})
/Users/ignat/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856 +0x663
github.com/spf13/cobra.(*Command).ExecuteC(0x4becf40)
/Users/ignat/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960 +0x39c
github.com/spf13/cobra.(*Command).Execute(...)
/Users/ignat/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897
main.main()
/Users/ignat/go/src/github.com/beringresearch/bravetools/main.go:15 +0x25
"""
I think this is because err
is actually nil and the if
statement is trying to return err.Error()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Will have to take a different approach to err checking - likely delegate it to a helper function and reuse throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue should now be dealt with. The shared utility function CollectErrors
returns an error if there is one, so there will no longer be a case where a nil error is unexpectedly used.
7ffb389
to
e26a10b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When unit deployment interrupted during execution of POSTDEPLOY commands unit record left undeleted from database.
a6f8b37
to
8de5bdc
Compare
Nice catch. I reordered the calls in 8de5bdc so that the unit insertion into the database is the last thing to happen. This should address the issue of units being left in the database despite the actual container having been cleaned up. |
Can we have at least a list of test cases which need to be checked? |
…nup after errors encountered during the image import process. Slightly improved the capability of the program to cleanup after SIGTERM so long as unit/image already exist when SIGTERM is given
…ine intercepting os interrupt signals into `BuildImage` - it flips the abortFlag and cancels the context. Between the two, builds can be safely interrupted at non-critical junctures and cleanup can be properly conducted. The cost is that there can be some waiting between SIGINT and bravetools shutting down.
…t` (deployment). A goroutine intercepts SIGINT and sets flag/cancels context. Checks made against flag throughout.
…nching unit from it. Allows for cleaning up just the images generated during the build process.
…ents attempts to reference a nil error later by ensuring an error value is present.
…low for cleanup of image by fingerprint
…ortant step of unit initialization; the two should come as part and parcel. It may be useful to separately execute Postdeploy (like on an already running container) so it has been left as a separate public function.
…duce code duplication
…was encountered during the build
…ecessary short function for build cleanup - using an anonymous function instead to make clear what happens. Fix some scoping for err short assignments to avoid overwriting more important errors
…to allow for interrupting of commands.
…s handled by the function callers
…n. This avoids the case where interrupts or errors in postdeploy would leave a record of a unit in the database even though the actual container had been cleaned up.
…s code indicates an error.
30967c0
to
7f04260
Compare
In terms of automated tests the happy path should be testable using the tests in #92 It's a bit tricky to automatically test the SIGINT capturing and cleanup so I've been proceeding by testing interrupts manually. I then ensure no LXD images or containers are left on the system by running another build - if any exist an error will occur. What I've been checking is whether bravetools catches interrupts and exits after cleaning up anything created during build/deploy:
For deploy, similar story:
From my manual testing of the above cases I found no issues - it appears to cleanup very reliably. |
Currently if a build is interrupted, LXD images imported during the build process are often left on the server. These images will cause a conflict the next time a build is attempted, and in the meantime use up space needlessly. It would be nice if bravetools cleaned up after itself nicely.
These changes aim to ensure that interrupted builds are correctly cleaned up.
The following steps are taken to ensure this:
This change does its best to ensure that builds are only cancelled when safe to do so - for example, cancellation is not allowed during image publishing.
For now the idea to check the diff of the image fingerprints mimics existing behavior for retrieving the fingerprint used in bravetools. I have some ideas on how to improve this and make the fingerprint check more accurate later - this would allow for more granular cleanups of just the images created during the build.