Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix empty userDN for MSPileupTask update of transition record #11921

Merged
merged 2 commits into from
Mar 12, 2024

Conversation

vkuznet
Copy link
Contributor

@vkuznet vkuznet commented Mar 5, 2024

Fixes #11920

Status

ready

Description

Fix issue with empty userDN during MSPileupTask cycle of updating MSPileup record. The code has been refactored to provide new updateTransitionRecord function. With new stand-alone function I added separate unit test to test its functionality.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

External dependencies / deployment changes

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 4 new failures
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 4 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14945/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 11 warnings
    • 14 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14946/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 11 warnings
    • 13 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14947/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet requested a review from amaltaro March 5, 2024 18:59
Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before any actual review, please improve the PR title and description.

Please revisit https://github.com/dmwm/WMCore/blob/master/CONTRIBUTING.rst#contributing to avoid wasting time in the future.

@vkuznet vkuznet changed the title Fix issue 11920 Fix issues during MSPileup integration tests Mar 5, 2024
@vkuznet vkuznet changed the title Fix issues during MSPileup integration tests Fix empty userDN for MSPileupTask update of transition record Mar 5, 2024
@vkuznet vkuznet requested a review from amaltaro March 5, 2024 21:32
Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valentin, please find some comments along the code.

src/python/WMCore/MicroService/MSPileup/MSPileupData.py Outdated Show resolved Hide resolved
src/python/WMCore/MicroService/MSPileup/MSPileupData.py Outdated Show resolved Hide resolved
src/python/WMCore/MicroService/MSPileup/MSPileupData.py Outdated Show resolved Hide resolved
if prevTranRecord['containerFraction'] != fraction:
customName = customDID(prevTranRecord['customDID'])
# preserve previous container fraction
transitionRecord = {'containerFraction': prevTranRecord['containerFraction'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can potentially create another transition record with the same containerFraction. This is the second bug that we discussed today which is expected to be addressed with this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally didn't put it here since after thinking it through I think it is not relevant. Here is my logic:

  • look at step 5 record content in gist
  • over there we have two transition records with the same container fractions but both have different container fraction then fraction of the MSPileup record itself
  • we used an artificial use-case when we provide via updatePileupObjects .py script a transition record while the code itself should never accept that since it is not designed for such use-case. Its logic when someone wants to update the document is to create transition record on its own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! We cannot have that check against the previous to the last record indeed.

Bottom line (that we cannot ever forget):

  • MSPileup REST call/action creates a transition record for us, with the containerFraction value before the actual update (e..g: containerFraction != transition/containerFraction). Only exception here is when a new pileup document is created, where containerFraction == transition/containerFraction.
  • MSPileupTasks (daemon) does NOT create any transition, it simply updates the last one (containerFraction).

doc['transition'] = transition
self.logger.info("Added transition record for pileup %s", pname)
# update transition record if necessary
updateTransitionRecord(doc, userDN, self.logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you pass the logger object, I would suggest to convert updateTransitionRecord to a class method instead. I think it is more clear them making it a function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, functions are easier to tests (that's what I did), and except logger (which is literally print) nothing in this function needs class data. I rather prefer to keep it as function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's okay. I agree that it's much easier to test a function, in general. However, a class with implementing and consolidating the required logic for its data looks tidy to me and more readable. No need to make any changes in here then.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 11 warnings
    • 13 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14948/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet requested a review from amaltaro March 6, 2024 14:18
Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valentin, these changes are looking good to me. But I still suspect we might have some misbehavior either when creating or updating a pileup configuration.

Can you apply this pull request (patch) to your dev k8s cluster and test things out before we merge it? If you prefer, go ahead and patch MSPileup in testbed (but ideally we should stick with our dev clusters).

if prevTranRecord['containerFraction'] != fraction:
customName = customDID(prevTranRecord['customDID'])
# preserve previous container fraction
transitionRecord = {'containerFraction': prevTranRecord['containerFraction'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! We cannot have that check against the previous to the last record indeed.

Bottom line (that we cannot ever forget):

  • MSPileup REST call/action creates a transition record for us, with the containerFraction value before the actual update (e..g: containerFraction != transition/containerFraction). Only exception here is when a new pileup document is created, where containerFraction == transition/containerFraction.
  • MSPileupTasks (daemon) does NOT create any transition, it simply updates the last one (containerFraction).

doc['transition'] = transition
self.logger.info("Added transition record for pileup %s", pname)
# update transition record if necessary
updateTransitionRecord(doc, userDN, self.logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's okay. I agree that it's much easier to test a function, in general. However, a class with implementing and consolidating the required logic for its data looks tidy to me and more readable. No need to make any changes in here then.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 1 tests no longer failing
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 11 warnings
    • 13 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14949/artifact/artifacts/PullRequestReport.html

@vkuznet
Copy link
Contributor Author

vkuznet commented Mar 7, 2024

Alan, I tested mspileup on my test10 cluster using the following script:

#!/bin/bash

echo "inject data"
scurl -X POST -H "Content-type: application/json" -d@./pileup.json https://cmsweb-test10.cern.ch/ms-pileup/data/pileup

echo "list docs in MSPileup"
scurl -s https://cmsweb-test10.cern.ch/ms-pileup/data/pileup

echo "change fraction"
scurl -X PUT -H "Content-type: application/json" -d@./pileup-fraction.json https://cmsweb-test10.cern.ch/ms-pileup/data/pileup

echo "list docs in MSPileup"
scurl -s https://cmsweb-test10.cern.ch/ms-pileup/data/pileup

#echo "delete document"
#scurl -X DELETE -H "Content-type: application/json" -d@./pileup-delete.json https://cmsweb-test10.cern.ch/ms-pileup/data/pileup

# patch the code, login to the pod
# curl -ksLO https://github.com/dmwm/WMCore/pull/11921.patch
# cd /usr/local/lib/python3.8/site-packages
# sudo patch -p3 < /data/11921.patch

It reveals one problem with fraction value assignment which I fixed and provide corresponding commits (one for code fix f35e49e and another for unit test to test it b181383). Please review them.

@vkuznet vkuznet requested a review from amaltaro March 7, 2024 13:18
Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valentin, revisiting the code we have implemented so far for the MSPileupData.updatePileup() method, I think we are implementing this logic in the wrong place.

This method should be as generic as possible, while pileup document logic to the other modules interacting with it. Being more concrete:

  1. MSPileupTasks has multiple calls to this method. IMO that module should be passing the final document to be updated in the database, where MSPileupData.updatePileup would simply persist that (or validate + persist).
  2. then it's also used for REST API calls - through MSPileup module - where again MSPileupData.updatePileup should simply persist the data (or validate + persist).

src/python/WMCore/MicroService/MSPileup/MSPileupData.py Outdated Show resolved Hide resolved
src/python/WMCore/MicroService/MSPileup/MSPileupData.py Outdated Show resolved Hide resolved
if prevTranRecord['containerFraction'] != fraction:
customName = customDID(prevTranRecord['customDID'])
# preserve previous container fraction
transitionRecord = {'containerFraction': fraction,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only set it to the container fraction IF we have succeeded resizing the pileup container and making all the data placement logic for the custom container.

In case we hit an exception while executing the partial pileup logic, this different containerFraction is the only hint that we have to re-execute the same logic in the next cycle. In other words, containerFraction == transition/containerFraction tells the service that nothing needs to be done in terms of partial pileup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alan, I'm lost on your logic here. First, you wrote in your #11921 (review) that we should have update function as generic as possible, and this is what is implemented. But in this comment you write that we should update if we have succeeded resizing which is not logic of this function and in my view is part of upstream code logic. Please don't take me wrong, I agree that we should update if we succeed data placement, etc., but it is not the scope of this function. Please clarify your comment or resolved it somehow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in this comment you write that we should update if we have succeeded resizing which ... part of upstream code logic

Yes, I fully agree on that. Hence my concern here.

Bottom line, this method is fairly generic so far and it should not do anything other than:

  1. validating the document content (if required - or we can refactor it in the future with the proper separation)
  2. persisting the document in the backend database.

Meaning, there should be no key/value addition and or assignment in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that implies that update of transition records should be done only by MSPileupTasks which performs full business logic. It also implies that that REST API will only update record in DB, even if user provide new containerFraction value. This implies that when performing new REST API call (with new container fraction) we will not have update on transition record at all. This will ONLY happen after MSPileupTasks will kick in to perform full logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that implies that update of transition records should be done only by MSPileupTasks

Update of the transition record, yes. But not the creation of a record.

Reviewing the gist, we create a new transition record in two scenarios:

  • upon creation of a new pileup document
  • upon update of the containerFraction
    and we should keep it like that, as we have already exhaustively discussed it.

Now thinking about the interactions with updatePileup, we have:

  1. pileup document update through MSPileup (REST), without changing containerFraction. The method should only persist the document - and validate, if required.
  2. pileup document update through MSPileup (REST), with changes to containerFraction. A transition record must be created somewhere between the REST/Data layer. Then the document is persisted - and validated, if required.
  3. pileup document update through MSPileupTasks. Regardless of containerFraction changes or not, it should only persist the document - and validate, if required.

Am I missing any other interactions? Maybe what we need is to have a hook such that we can identify when a document is provided by the end user or by the service daemon (maybe a method argument).

@vkuznet
Copy link
Contributor Author

vkuznet commented Mar 7, 2024

Alan, I addressed your concern about update API in this commit: 6289df9 Please have a look, now with new method updatePileupDocumentInDatabase the MSPileupTasks only calls it to update record in DB.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 1 tests no longer failing
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 14 warnings
    • 19 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14957/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Valentin. I feel much more comfortable with these changes in now. I did leave a few extra comments along the code.

In addition, giving all the confusion on the functionality/behavior of this service so far (hoping I am not alone here!), I think it's important to update the MSPileup documentation with a summary of the relevant information and expected behavior that we have been discussing in this PR and over the past weeks.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 14 warnings
    • 19 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14958/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vkuznet Valentin, I left a comment in the code that you can disregard if you will. Otherwise, please squash changes accordingly and I will cut a new release and push it to testbed before we enter the weekend. Thanks

src/python/WMCore/MicroService/MSPileup/MSPileupData.py Outdated Show resolved Hide resolved
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
    • 1 tests added
  • Python3 Pylint check: failed
    • 1 warnings and errors that must be fixed
    • 15 warnings
    • 19 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14959/artifact/artifacts/PullRequestReport.html

@vkuznet
Copy link
Contributor Author

vkuznet commented Mar 11, 2024

Alan, I updated this PR with new code where I moved validation part into updatePileupDocumentInDatabase. With this change, see b664717, all calls to update MSPileup doc may or may not have ability to validate the document. It is to upstream code to provide this parameter, see b6eab94

I want you to review these changes one more time and then if you satisfied, I'll squash them.

@vkuznet vkuznet requested a review from amaltaro March 11, 2024 12:02
Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valentin, please find a couple of comments/questions along the code.

@amaltaro
Copy link
Contributor

Valentin, once you are done and have performed basic validation of these changes, please squash the commits accordingly.

@vkuznet
Copy link
Contributor Author

vkuznet commented Mar 12, 2024

Alan, I rerun my test in test10 cluster and documents looks fine to me. Therefore, I squashed all commits and once jenkins are back you can have a final look before merging this PR.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests added
  • Python3 Pylint check: failed
    • 1 warnings and errors that must be fixed
    • 15 warnings
    • 19 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14964/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 15 warnings
    • 19 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14965/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 15 warnings
    • 19 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14966/artifact/artifacts/PullRequestReport.html

@vkuznet
Copy link
Contributor Author

vkuznet commented Mar 12, 2024

During integration tests I found that we can't use -V1 (numeric suffixes) in custom name. Please see and follow this post: https://mattermost.web.cern.ch/cms-o-and-c/pl/st3d1sd8hbf19y4jw69q4rrgza

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 15 warnings
    • 19 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14968/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vkuznet Valentin, everything is looking good, except for one change that would be very beneficial - see comment along the code. Feel free to squash commits.

src/python/WMCore/MicroService/MSPileup/MSPileupData.py Outdated Show resolved Hide resolved
src/python/WMCore/MicroService/MSPileup/MSPileupTasks.py Outdated Show resolved Hide resolved
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 tests no longer failing
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 15 warnings
    • 19 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14970/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Valentin. Please squash the commits and it's good to go.

@vkuznet
Copy link
Contributor Author

vkuznet commented Mar 12, 2024

Alan, commits are squashed. Please merge at your convenience.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 tests no longer failing
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 15 warnings
    • 19 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14971/artifact/artifacts/PullRequestReport.html

@amaltaro amaltaro merged commit 403a0cd into dmwm:master Mar 12, 2024
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix issues with MSPIleup integration tests
3 participants