Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add 'continue' functionality (as opposed to the already available 'stop') #203

Open
boegel opened this issue Aug 29, 2012 · 8 comments
Open
Milestone

Comments

@boegel
Copy link
Member

boegel commented Aug 29, 2012

(old internal ticket 241)

Since we have a way to stop at a certain step, it would be nice if we then could continue also.

This way you can debug each step of the configure, build, install and create module process without having to do all of the previous steps again and again.

@JensTimmerman
Copy link
Contributor

I think this should be solved by creating the -devel modulefile after easch step?

The argument against a continue function is that eventually the .eb file will be commited to the repo, and the software isntalled, but there is no knowing what actually happened in between, and as such the build is not reproducable.

see issue #109

@boegel
Copy link
Member Author

boegel commented Aug 29, 2012

Agreed. the step-wise devel modules would almost offer what a continue function would implement.

Not completely though: devel modules can only set environment variables, not do things like create/adjust files, create directories, run commands, etc.

But nevertheless, you have a good argument not to implement it; we don't want that people are able to implement easyblocks that still require human intervention and are thus incomplete. Keeping this closed.

@fgeorgatos
Copy link
Collaborator

continue may be feasible if the intermediate states of a build-sequence are saved in a "tar" file or something; that would allow clean restart without side-effects (ie. it is a form of checkpointing of the build process)

@boegel
Copy link
Member Author

boegel commented Dec 3, 2013

Reopening this, since @wpoely86 was was asking for this.

Are devel modules a solution in case one wants to use eb --continue after fixing a bug in the install step, i.e. continue without redoing the build step?

@boegel boegel reopened this Dec 3, 2013
@wpoely86
Copy link
Member

wpoely86 commented Dec 3, 2013

@fgeorgatos That would require to catch the exceptions.

The use case in which I would like a continue functionality is to make a easyblock for highly non-standaard build systems. It's annoying that you have to start again from scratch if something doesn't work in the install_step.

@fgeorgatos
Copy link
Collaborator

@wpoley86;
if I understand correctly what you described, a CRIU checkpoint [1] right before the install step might help;
the question is then, how to allow modifications in the install step independently of the rest... I'm puzzled on this.

[1] http://en.wikipedia.org/wiki/CRIU # v1.0 was just out, but the need for 3.11 kernel is not encouraging; ok, may be we find a better direction...

@wpoely86
Copy link
Member

wpoely86 commented Dec 4, 2013

Yeah, something like that but not that complex. I think CRIU is a bit overkill and it's not intended for our purposes. I see no way to change the python script after a checkpoint.

Anyway, what I want is not that complex (Don't shoot me if it turns out to be very complex 😉):
Restart in the step that failed using all previous successful steps. That would mean: keep the current builddir, keep all mktemp generated paths and files. And store all current variables in the easyblock and the easyconfig. That would do it, no?

So, we would need to store the current status and all files paths etc in a file before executing a step in the block.
I would only do this if a certain option have been activated (--debug maybe?) and give the checkpoint file/dir in the beginning of the log output.

What do you think? Totally crazy?

@boegel
Copy link
Member Author

boegel commented Dec 31, 2013

@wpoely86: I could see some use for that, but making it work might involve quite a bit of work here and there. The current codebase is totally unaware of this restart feature, so you might need to make sure stuff sticks around rather than being cleaned up, etc.
It's not totally crazy imho, but it will involve quite a bit of work I think, and the usefulness may be limited (i.e. it will likely only work in a couple of specific cases, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants