Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy all run and out dirs at once, not in for-loop #3025

Merged
merged 55 commits into from
Jan 21, 2023
Merged

Conversation

Aariq
Copy link
Collaborator

@Aariq Aariq commented Aug 31, 2022

Addresses #3019 with changes to start_model_runs()

Description

  • Moves creation of directories on the remote host outside of for loop to do it all at once
  • Moves calls to remote.copy.to outside of for loop to copy over all /out and /run directories at once.
  • Moves call to remote.copy.from outside of for loop to copy all outputs back to local at once.
  • Adds notes for future improvments
  • Add informative errors to remote.copy.to() and remote.copy.from()

There aren't any tests for start_model_runs() and I'm not sure how to write them, but I did install this PR and test it with my own setup.

Motivation and Context

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@Aariq Aariq changed the title Copy all runs at once, not in for-loop WIP: Copy all runs at once, not in for-loop Aug 31, 2022
@Aariq Aariq marked this pull request as ready for review August 31, 2022 15:09
@Aariq Aariq changed the title WIP: Copy all runs at once, not in for-loop Copy all run and out dirs at once, not in for-loop Aug 31, 2022
@Aariq Aariq requested a review from infotroph August 31, 2022 15:49
@Aariq Aariq marked this pull request as draft August 31, 2022 16:37
@Aariq Aariq marked this pull request as ready for review August 31, 2022 22:34
@Aariq Aariq requested a review from dlebauer August 31, 2022 22:35
@Aariq Aariq linked an issue Sep 1, 2022 that may be closed by this pull request
@Aariq Aariq requested review from robkooper and removed request for infotroph September 1, 2022 15:42
@Aariq Aariq marked this pull request as draft September 7, 2022 13:39
@Aariq Aariq marked this pull request as ready for review September 7, 2022 13:53
@Aariq Aariq marked this pull request as draft September 9, 2022 16:38
@Aariq Aariq marked this pull request as draft November 17, 2022 16:28
@Aariq
Copy link
Collaborator Author

Aariq commented Nov 17, 2022

Marked as draft again because I'm getting close (hopefully) to really fixing this.

@Aariq
Copy link
Collaborator Author

Aariq commented Nov 18, 2022

Ok, the remaining bug has nothing to do with PEcAn or with R for that matter. I've been able to reproduce it with just rsync. It also seems to be specific to our server as I can't reproduce it from my laptop. Going to open this back up for review. One possible safeguard is to wrap the remote.copy.to commands in PEcAn.utils::retry.func() but I'll wait to get feedback before doing this.

@Aariq Aariq marked this pull request as ready for review November 18, 2022 18:20
@mdietze
Copy link
Member

mdietze commented Jan 11, 2023

@robkooper could you look at the changes you requested so we can figure out whether this is ready to pull?

@@ -45,5 +45,37 @@ remote.copy.from <- function(host, src, dst, options = NULL, delete = FALSE, std
}
}
PEcAn.logger::logger.debug("rsync", shQuote(args))
system2("rsync", shQuote(args), stdout = TRUE, stderr = as.logical(stderr))
out <-
system2("rsync", args, stdout = "", stderr = as.logical(stderr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is removing the shQuote around args intentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was intentional, but now that you bring it up I'm not sure it's correct. system2() always shQuotes command, but I guess it doesn't do that for args. I'll add it back and make sure everything still works.

out <-
system2("rsync", args, stdout = "", stderr = as.logical(stderr))

# Informative errors from rsync man page
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the desire for clarity, but I'd rather send people to the rsync docs for these error messages. We use whatever rsync version is installed on the host system, so hard-coding values here feels like an invitation for them to get out of sync.

I assume rsync doesn't often change the meaning of exit codes between versions, but if they add a new code that's not on this list then this switch will return an empty string instead of the numeric value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this makes sense. I'm going to keep errors (anything other than 0) as logger.severe() though.

Copy link
Member

@robkooper robkooper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested with rabbitmq. please address issues pointed out by @infotroph

@Aariq Aariq requested a review from infotroph January 20, 2023 22:32
@mdietze mdietze merged commit 795b998 into develop Jan 21, 2023
@Aariq Aariq deleted the remote-copy branch January 21, 2023 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some ensemble run directories don't get copied over to HPC
5 participants