Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of the sites with opportunistic resources in WMAgent.secrets #11966

Merged
merged 4 commits into from
Apr 17, 2024

Conversation

anpicci
Copy link
Contributor

@anpicci anpicci commented Apr 11, 2024

Fixes #11722

Status

Ready

Description

With this PR, HPC sites are directly reported in the WMAgent.secrets template, both for production and testbed, in order to remove the hardcoded references in the script initializing the Agent in the containerized deployment model.
For each HPC site, an associative bash array is defined to collect information about the name, number of total available running slots, and number of total available pending slots in a human-readable format.
This setting is made in the sight of fetching this information with a proper manipulation leveraging bash in this CMSKubernetes script. The modification to this script is reported in this PR

Is it backward compatible (if not, which system it affects?)

NO

Related PRs

None

External dependencies / deployment changes

Addition of the opportunistic resources in the initialization script of the Docker container: dmwm/CMSKubernetes#1462

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15011/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15012/artifact/artifacts/PullRequestReport.html

@anpicci
Copy link
Contributor Author

anpicci commented Apr 12, 2024

@todor-ivanov @amaltaro ready to be revised

Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @anpicci I've two minor comments inline. But do not rush to resolve them. We should hear also from @amaltaro what his opinion on the matter would be.

RESOURCE_OPP6=([name]=T3_US_Anvil [run]=1000 [pend]=250 [state]=normal)
RESOURCE_OPP7=([name]=T3_US_Lancium [run]=5000 [pend]=1250 [state]=normal)
RESOURCE_OPP8=([name]=T3_ES_PIC_BSC [run]=5000 [pend]=3700 [state]=normal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I do like to see all the opportunistic sites here, but since these two files (and the one below) are actually considred empty templates, I am not sure if we should put specific numbers. Maybe we can add those lines and let them sit in these two secrets templates commented out. This way we will know what is expected to be done in order to configure an opportunistic resource at the agent, but we would not hard code default values which we do not know if they will hold in the future.

@anpicci
Copy link
Contributor Author

anpicci commented Apr 16, 2024

@amaltaro looking forward to receiving your feedback

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anpicci apologies for missing this review.

I do like the flexibility given by this PR, even though it comes with extra complexity that perhaps isn't really needed.

I see this PR deviates from the current number of slots defined for those opportunistic sites. Was it intentional? Did you consult Dirk and/or P&R to have adapted these numbers?

Lastly, please do mention the CMSKubernetes PR as a dependency for this PR (in the initial PR description). Thanks

@anpicci
Copy link
Contributor Author

anpicci commented Apr 16, 2024

@amaltaro thank you for your feedback.

As extra complexity, are you referring to how the information about opportunistic resources are stored?

Regarding the number of slots, I derived them with:
$manage execute-agent wmagent-resource-control --site-name <site-name> -p
I haven't cross-checked with P&R and/or Dirk. Even though I would be in favor of cross-checking with them and then validating these numbers, I would propose including in WMAgent.secrets the threshold reported in these lines, move on with the parameters, and update them later on after proper discussion with P&R and/or Dirk. What do you think?

@todor-ivanov

@amaltaro
Copy link
Contributor

As extra complexity, are you referring to how the information about opportunistic resources are stored?

Yes, the secrets file no longer look straight forward to me, making it easier to mistype things. If I were to implement it, I think I would add 3 new variables:
OPPORTUNISTIC_SITES=T3_US_A,T3_US_B,T3_ES_C,etc
OPPORTUNISTIC_RUNNING_SLOTS=3000
OPPORTUNISTIC_PENDING_SLOTS=2000

and leave "coding" to outside the secrets file. However, given that these changes have already been implemented and tested, I don't think my argument is strong enough for a refactoring, so let's just leave it as is, if you agree.

For the slots, I agree with you, let's set them according to the current deployment script (also mentioned above within this comment) and in a later stage we can revisit these. It has been working fine with no requests to update them, no need to worry with those right now.

@anpicci
Copy link
Contributor Author

anpicci commented Apr 16, 2024

@amaltaro sounds good. The idea was to make the definition of these env variables as flexible as possible in bash, but I see your points.

If you and @todor-ivanov it works, we can merge the PR. I will address the comments on CMSKubernetes tomorrow morning

@todor-ivanov
Copy link
Contributor

todor-ivanov commented Apr 17, 2024

hi @amaltaro

and leave "coding" to outside the secrets file

this:

RESOURCE_OPP6=([name]=T3_US_Anvil [run]=1000 [pend]=250 [state]=normal)

Is not coding. This is a standard way to define an associative array in bash. :

$ man bash
...
 Associative arrays are created using declare -A name.

       Arrays  are assigned to using compound assignments of the form name=(value1 ... valuen), where each value is of the form [subscript]=string.
...

In addition. If we were to take this approach:

if I were to implement it, I think I would add 3 new variables:
OPPORTUNISTIC_SITES=T3_US_A,T3_US_B,T3_ES_C,etc
OPPORTUNISTIC_RUNNING_SLOTS=3000
OPPORTUNISTIC_PENDING_SLOTS=2000

This means, we would have had only a list of opportunistic resource names and two hard values to be applied to all of them.... regardless of the fact that different opportunistic resources may have different capacity, state, etc.

@anpicci I think we can merge this one here and continue in CMSKubernetes.

@todor-ivanov todor-ivanov self-requested a review April 17, 2024 04:59
Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@anpicci anpicci merged commit 3468edd into dmwm:master Apr 17, 2024
@amaltaro
Copy link
Contributor

I feel like there was still discussion on going in this PR and I don't understand why we rushed with merging these changes in. In addition, it would have been better to squash these commits before merging them in (reason why I labelled this yesterday). I am happy to chat about it today, if you want. @anpicci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WMAgent: Implement adding of opportunistic resources with the new initialization model
4 participants