Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{bio}[foss/2019b,fosscuda/2019b] RELION v3.1.0 #10965

Closed

Conversation

lexming
Copy link
Contributor

@lexming lexming commented Jul 12, 2020

(created using eb --new-pr)

Depends on #10963, #10964 and easybuilders/easybuild-framework#3386

I used #9096 as reference and made some changes to it

  • added templates to integrate with Torque
  • modified templates for Slurm to just rely on the extra variables support in RELION (this avoids the need to use additional patches, it's not as neat but it's easier to maintain and can be easily adapted by the users)
  • fixed configopts for CUDA capabilities in RELION-3.1.0-fosscuda-2019b.eb

I cannot test the integration with Slurm, somebody with it (@akesandgren ) please test.

Update: made an easyblock to ease the configuration of RELION. This PR depends on easybuilders/easybuild-easyblocks#2274

@lexming lexming added the update label Jul 12, 2020
@lexming lexming added this to the 4.x milestone Jul 12, 2020
@lexming
Copy link
Contributor Author

lexming commented Jul 12, 2020

Test report by @lexming
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in this PR)
node379.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, Python 2.7.5
See https://gist.github.com/f27340e6e6a7c72307a3b25a7f840ca6 for a full test report.

@lexming
Copy link
Contributor Author

lexming commented Jul 12, 2020

Test report by @lexming
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in this PR)
node125.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, Python 2.7.5
See https://gist.github.com/630ce08721f58b8e9bf8ac53363f2cc8 for a full test report.

@akesandgren
Copy link
Contributor

One drawback with separate cpu and gpu submit files is that the user will have to explicitly change the path to the submit file template in the gui when switching from one to the other. And if one is installed with foss and the other with fosscuda, they have to know how to do this with the full path. This is one of the major reasons for the patch that added the extra code for this.

Also, the removal of "which" around all the commands was done since they caused problems with the generated submit files. Don't quite remember what at the moment.

@lexming lexming force-pushed the 20200712020923_new_pr_RELION310 branch from a5534af to f626345 Compare July 13, 2020 12:24
@easybuilders easybuilders deleted a comment from boegelbot Jul 13, 2020
@lexming
Copy link
Contributor Author

lexming commented Jul 13, 2020

@akesandgren I did not bother with the which commands because even though it's a dumb thing to do, it does not cause issues on my side. I propose to remove them with a single sed instead of a patch, which is a pain to maintain. See f890a37

Regarding the job templates, switching between the non-gpu and gpu jobs is transparent for our users. Once they load the module, the gui of relion automatically takes the corresponding job template and users can directly access it from the command line through the env var $RELION_QSUB_TEMPLATE, regardless if they are using the foss or fosscuda modules. This has the advantage that users loading the foss module cannot submit jobs to GPU nodes, which we do not want.

@lexming lexming force-pushed the 20200712020923_new_pr_RELION310 branch from a859a51 to f79e662 Compare July 13, 2020 13:30
@akesandgren
Copy link
Contributor

The template problem comes when using the GUI. The path to the template file is saved in the session files for the relion setup.
So when they go from foss to fosscuda the info still points to the cpu-only submit file.
(Or am I mixing this up with Scipion??)

@lexming
Copy link
Contributor Author

lexming commented Jul 13, 2020

@akesandgren you are right, relion does not check the environment for existing projects and all parameters stay unchanged. In our case, if the job template would work in both modules, we would face the issue of users inadvertently submitting jobs with wrong parameters regarding number of nodes, threads, gpus, etc... So I personally prefer that switching modules does not allow submitting jobs out of the box.

However, the templates I submitted with this PR are just examples to be adapted on installation. So I do not mind using the gpu template on both easyconfigs and just set the gpu option to zero in foss. If that would be suitable for you let me know and I'll update the PR.

By the way, can anybody (eg @boegel ) clarify why the test do not like my perfectly working sed command? are regex with backreferences not allowed for some reason?

@akesandgren
Copy link
Contributor

akesandgren commented Jul 13, 2020

Remember that the preconfigopts gets shipped to /bin/sh, you need to escape them correctly.

@lexming
Copy link
Contributor Author

lexming commented Jul 13, 2020

The preconfigopts is properly escaped as far as I can tell and the config steps proceeds as intended in my local machine. I think that it has something to do with the tests, I have the impression that escaping a parenthesis \( is protected in the tests but something like \1 is not and gets parsed. But its just an impression.

#SBATCH -c XXXthreadsXXX
#SBATCH -e XXXerrfileXXX
#SBATCH -o XXXoutfileXXX
#SBATCH -A XXXextra2XXX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If XXXextra2XXX is empty this will generate a strange error to the user, and so will the #SBATCH -t XXXextra1XXX:00:00 if extra1 is unset.
And by strange I meant really strange ones. Slurm isn't the best at producing good error messages at all times.

That's why I made those two explicitly include -A/-t in the variable, i.e. XXXaccountXXX and XXXtimelimitXXX
Then the user will at least get a good and understandable error.

Also, for us at least, the --gres part either has to be complete or not there at all. So I made it XXXgpuspecXXX. That way we can use a single cpu/gpu common template file.
It's something that may or may not be a complicated thing to specify and is highly site dependent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated those non-positional extra parameters so that they are not empty by default. This will avoid triggering unintended errors. However, I expect sysadmins installing this software to adapt the settings of easyconfig to their needs.

I agree that your patch does a much better job at providing custom parameters for Slurm, but I really think that it is not feasible to achieve an easyconfig of RELION that works out of the box for all sites. For instance, your patch won't work with Torque and I think that it is not guaranteed that it will work on all systems using Slurm. I also had to adapt the template for Torque to make it work in our systems, but in my opinion that should be expected, as simple things as queue names or accounting do vary. That's why my goal is to have something that provides a good starting point for EB users and that can be easily adapted.

@akesandgren
Copy link
Contributor

So, we will stick to using the add_hpc2n_submit_file_args.patch.

@akesandgren
Copy link
Contributor

Also note that the sed cmd you're using doesn't quite match what the patch is doing.

@easybuilders easybuilders deleted a comment from boegelbot Jul 14, 2020
@lexming
Copy link
Contributor Author

lexming commented Jul 14, 2020

@akesandgren thanks for pointing out the missing changes in sed, those got lost at some point. I also unified the templates for cpu and gpu to ease switching between foss and fosscuda modules as you suggested.

@easybuilders easybuilders deleted a comment from boegelbot Jul 14, 2020
@easybuilders easybuilders deleted a comment from boegelbot Jul 16, 2020
@easybuilders easybuilders deleted a comment from boegelbot Jul 16, 2020
@boegel boegel closed this Aug 14, 2020
@boegel boegel reopened this Aug 14, 2020
@boegel
Copy link
Member

boegel commented Aug 14, 2020

Oh wait, it's not merged yet... 🤦‍♂️

@lexming
Copy link
Contributor Author

lexming commented Aug 14, 2020

@boegel What do you mean? easybuilders/easybuild-framework#3386 is indeed merged

@lexming
Copy link
Contributor Author

lexming commented Aug 29, 2020

@boegel All tests finally passed on this one. It is ready for review.

@lexming
Copy link
Contributor Author

lexming commented Aug 29, 2020

Test report by @lexming
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in this PR)
node312.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, Python 2.7.5
See https://gist.github.com/7b711308b0125c021bf56204544f4e16 for a full test report.

@lexming
Copy link
Contributor Author

lexming commented Aug 29, 2020

Test report by @lexming
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in this PR)
node115.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, Python 2.7.5
See https://gist.github.com/7640ef52f5e532746702a63524e4cb19 for a full test report.

@easybuilders easybuilders deleted a comment from boegelbot Sep 18, 2020
@boegel
Copy link
Member

boegel commented Sep 18, 2020

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in this PR)
node3404.kirlia.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz (cascadelake), Python 2.7.5
See https://gist.github.com/7b6e38f760d4e9ace2dee3bab8a31940 for a full test report.

@boegel
Copy link
Member

boegel commented Sep 18, 2020

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in this PR)
node2464.golett.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (haswell), Python 2.7.5
See https://gist.github.com/d952e160dc86e0ec01b5b338ab9ea9b8 for a full test report.

@boegel
Copy link
Member

boegel commented Sep 18, 2020

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in this PR)
node3302.joltik.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), Python 3.6.8
See https://gist.github.com/ae2962cf12708fdc62de3c9f88b44d4e for a full test report.

@boegel
Copy link
Member

boegel commented Oct 20, 2020

@lexming Since this requires customizations specific to the site, I think it makes sense to add a README in the RELION subdir to explain that?

Longer term (but not necessarily a blocker for this PR), it probably makes sense to create a custom easyblock for RELION that allows to "inject" the required info and avoid having to manually tweak the easyconfig file?

@boegel
Copy link
Member

boegel commented Dec 7, 2020

@lexming Ping on adding some more documentation in a README?

@lexming
Copy link
Contributor Author

lexming commented Dec 9, 2020

@boegel I explored the option with a README file but it turned out to be too long and still not very straightforward. Main issue is how the extra job parameters have to be declared as environment variables and also added in the job script template, but with different names. So, I went ahead and made an easyblock to simplify this installation. Please check again with easybuilders/easybuild-easyblocks#2274

@boegelbot
Copy link
Collaborator

@lexming: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/410373490
Output from first failing test suite run:

Failed to determine merge base (ec: 1, output: ''), falling back to specifying target branch develop
Failed to determine merge base (ec: 1, output: ''), falling back to specifying target branch develop

List of added easyconfig files in this PR: RELION-3.1.0-foss-2019b.eb
RELION-3.1.0-fosscuda-2019b.eb
(skipped conflicts test)
(skipped dep graph test)
======================================================================
ERROR: test__parse_easyconfig_RELION-3.1.0-foss-2019b.eb (test.easyconfigs.easyconfigs.EasyConfigTest)
Test for parsing of easyconfig RELION-3.1.0-foss-2019b.eb
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/easyconfigs/easyconfigs.py", line 1041, in innertest
    template_easyconfig_test(self, spec_path)
  File "test/easyconfigs/easyconfigs.py", line 837, in template_easyconfig_test
    ecs = process_easyconfig(spec)
  File "/opt/hostedtoolcache/Python/2.7.18/x64/lib/python2.7/site-packages/easybuild/framework/easyconfig/easyconfig.py", line 1977, in process_easyconfig
    raise EasyBuildError("Failed to process easyconfig %s: %s", spec, err.msg)
EasyBuildError: "Failed to process easyconfig /home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/easybuild/easyconfigs/r/RELION/RELION-3.1.0-foss-2019b.eb: No software-specific easyblock 'EB_RELION' found for RELION"

======================================================================
ERROR: test__parse_easyconfig_RELION-3.1.0-fosscuda-2019b.eb (test.easyconfigs.easyconfigs.EasyConfigTest)
Test for parsing of easyconfig RELION-3.1.0-fosscuda-2019b.eb
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/easyconfigs/easyconfigs.py", line 1041, in innertest
    template_easyconfig_test(self, spec_path)
  File "test/easyconfigs/easyconfigs.py", line 837, in template_easyconfig_test
    ecs = process_easyconfig(spec)
  File "/opt/hostedtoolcache/Python/2.7.18/x64/lib/python2.7/site-packages/easybuild/framework/easyconfig/easyconfig.py", line 1977, in process_easyconfig
    raise EasyBuildError("Failed to process easyconfig %s: %s", spec, err.msg)
EasyBuildError: "Failed to process easyconfig /home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/easybuild/easyconfigs/r/RELION/RELION-3.1.0-fosscuda-2019b.eb: No software-specific easyblock 'EB_RELION' found for RELION"

======================================================================
FAIL: test_changed_files_pull_request (test.easyconfigs.easyconfigs.EasyConfigTest)
Specific checks only done for the (easyconfig) files that were changed in a pull request.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/easyconfigs/easyconfigs.py", line 805, in test_changed_files_pull_request
    self.assertTrue(False, error_msg)
AssertionError: Failed to find parsed easyconfig for RELION-3.1.0-foss-2019b.eb (and could not isolate it in easyconfigs archive either)

----------------------------------------------------------------------
Ran 11049 tests in 766.607s

FAILED (failures=1, errors=2)
ERROR: Not all tests were successful.

bleep, bloop, I'm just a bot (boegelbot v20200716.01)
Please talk to my owner @boegel if you notice you me acting stupid),
or submit a pull request to https://github.com/boegel/boegelbot fix the problem.

@JackPerdue
Copy link
Contributor

So... where are we on this one? We got a new toy[1] and I'm trying to get it loaded up for the users.

[1] https://hprc.tamu.edu/wiki/Grace:Intro

@lexming
Copy link
Contributor Author

lexming commented Jan 6, 2021

@JackPerdue flexing with your new toys is not nice, now I'm jealous!

Regarding this PR, if you want to test it, you just need to use the easyblock for RELION in easybuilders/easybuild-easyblocks#2274. I just opened easybuilders/easybuild-easyblocks#2302 to fix the failing tests, but that is not really needed to use the easyblock.

@lexming
Copy link
Contributor Author

lexming commented Jan 6, 2021

Test report by @lexming
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#2274
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node366.hydra.os - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, Python 2.7.5
See https://gist.github.com/ae7233dbc840b39071270d2ae6fcae93 for a full test report.

@JackPerdue
Copy link
Contributor

@lexming Thanks for the info. It installed without problem on the "old" cluster and the new. I'll leave for the user to test and provide us some feedback.

Any desire whatsoever to add support for IBM's LSF? Our "older" cluster which is due to head out the door in the next semester or two used it (it came with the IBM cluster... wasn't our choice). Looks like it would be pretty easy to add to your easyblock if there was any interest.

@verdurin
Copy link
Member

verdurin commented Jun 9, 2021

@lexming any interest in pushing this one along?

@lexming
Copy link
Contributor Author

lexming commented Jun 9, 2021

@verdurin yes, I do. I'll go through the comments and see if I can get the easyblock merged 🙂

Copy link
Member

@jfgrimm jfgrimm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As decided in issue #16330, we have deprecated the use of True to signify a system-toolchain dependency (#16384), in favour of the more intuitive SYSTEM template constant. Due to the change in the test suite, please run eb --sync-pr-with-develop 10965 and update the PR to use SYSTEM instead.

@jfgrimm
Copy link
Member

jfgrimm commented Jun 21, 2023

closing since RELION 4.0 is out and there are open PRs for that (e.g. #16527)

@jfgrimm jfgrimm closed this Jun 21, 2023
@lexming lexming deleted the 20200712020923_new_pr_RELION310 branch June 21, 2023 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants