-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix empty userDN for MSPileupTask update of transition record #11921
Conversation
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before any actual review, please improve the PR title and description.
Please revisit https://github.com/dmwm/WMCore/blob/master/CONTRIBUTING.rst#contributing to avoid wasting time in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valentin, please find some comments along the code.
if prevTranRecord['containerFraction'] != fraction: | ||
customName = customDID(prevTranRecord['customDID']) | ||
# preserve previous container fraction | ||
transitionRecord = {'containerFraction': prevTranRecord['containerFraction'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can potentially create another transition record with the same containerFraction
. This is the second bug that we discussed today which is expected to be addressed with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally didn't put it here since after thinking it through I think it is not relevant. Here is my logic:
- look at step 5 record content in gist
- over there we have two transition records with the same container fractions but both have different container fraction then fraction of the MSPileup record itself
- we used an artificial use-case when we provide via
updatePileupObjects .py
script a transition record while the code itself should never accept that since it is not designed for such use-case. Its logic when someone wants to update the document is to create transition record on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! We cannot have that check against the previous to the last record indeed.
Bottom line (that we cannot ever forget):
- MSPileup REST call/action creates a transition record for us, with the containerFraction value before the actual update (e..g: containerFraction != transition/containerFraction). Only exception here is when a new pileup document is created, where containerFraction == transition/containerFraction.
- MSPileupTasks (daemon) does NOT create any transition, it simply updates the last one (containerFraction).
doc['transition'] = transition | ||
self.logger.info("Added transition record for pileup %s", pname) | ||
# update transition record if necessary | ||
updateTransitionRecord(doc, userDN, self.logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that you pass the logger object, I would suggest to convert updateTransitionRecord
to a class method instead. I think it is more clear them making it a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree, functions are easier to tests (that's what I did), and except logger (which is literally print) nothing in this function needs class data. I rather prefer to keep it as function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's okay. I agree that it's much easier to test a function, in general. However, a class with implementing and consolidating the required logic for its data looks tidy to me and more readable. No need to make any changes in here then.
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valentin, these changes are looking good to me. But I still suspect we might have some misbehavior either when creating or updating a pileup configuration.
Can you apply this pull request (patch) to your dev k8s cluster and test things out before we merge it? If you prefer, go ahead and patch MSPileup in testbed (but ideally we should stick with our dev clusters).
if prevTranRecord['containerFraction'] != fraction: | ||
customName = customDID(prevTranRecord['customDID']) | ||
# preserve previous container fraction | ||
transitionRecord = {'containerFraction': prevTranRecord['containerFraction'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! We cannot have that check against the previous to the last record indeed.
Bottom line (that we cannot ever forget):
- MSPileup REST call/action creates a transition record for us, with the containerFraction value before the actual update (e..g: containerFraction != transition/containerFraction). Only exception here is when a new pileup document is created, where containerFraction == transition/containerFraction.
- MSPileupTasks (daemon) does NOT create any transition, it simply updates the last one (containerFraction).
doc['transition'] = transition | ||
self.logger.info("Added transition record for pileup %s", pname) | ||
# update transition record if necessary | ||
updateTransitionRecord(doc, userDN, self.logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's okay. I agree that it's much easier to test a function, in general. However, a class with implementing and consolidating the required logic for its data looks tidy to me and more readable. No need to make any changes in here then.
Jenkins results:
|
Alan, I tested mspileup on my test10 cluster using the following script:
It reveals one problem with fraction value assignment which I fixed and provide corresponding commits (one for code fix f35e49e and another for unit test to test it b181383). Please review them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valentin, revisiting the code we have implemented so far for the MSPileupData.updatePileup()
method, I think we are implementing this logic in the wrong place.
This method should be as generic as possible, while pileup document logic to the other modules interacting with it. Being more concrete:
- MSPileupTasks has multiple calls to this method. IMO that module should be passing the final document to be updated in the database, where MSPileupData.updatePileup would simply persist that (or validate + persist).
- then it's also used for REST API calls - through MSPileup module - where again MSPileupData.updatePileup should simply persist the data (or validate + persist).
if prevTranRecord['containerFraction'] != fraction: | ||
customName = customDID(prevTranRecord['customDID']) | ||
# preserve previous container fraction | ||
transitionRecord = {'containerFraction': fraction, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only set it to the container fraction IF we have succeeded resizing the pileup container and making all the data placement logic for the custom container.
In case we hit an exception while executing the partial pileup logic, this different containerFraction is the only hint that we have to re-execute the same logic in the next cycle. In other words, containerFraction == transition/containerFraction tells the service that nothing needs to be done in terms of partial pileup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alan, I'm lost on your logic here. First, you wrote in your #11921 (review) that we should have update function as generic as possible, and this is what is implemented. But in this comment you write that we should update if we have succeeded resizing which is not logic of this function and in my view is part of upstream code logic. Please don't take me wrong, I agree that we should update if we succeed data placement, etc., but it is not the scope of this function. Please clarify your comment or resolved it somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in this comment you write that we should update if we have succeeded resizing which ... part of upstream code logic
Yes, I fully agree on that. Hence my concern here.
Bottom line, this method is fairly generic so far and it should not do anything other than:
- validating the document content (if required - or we can refactor it in the future with the proper separation)
- persisting the document in the backend database.
Meaning, there should be no key/value addition and or assignment in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, that implies that update of transition records should be done only by MSPileupTasks
which performs full business logic. It also implies that that REST API will only update record in DB, even if user provide new containerFraction
value. This implies that when performing new REST API call (with new container fraction) we will not have update on transition record at all. This will ONLY happen after MSPileupTasks will kick in to perform full logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, that implies that update of transition records should be done only by MSPileupTasks
Update of the transition record, yes. But not the creation of a record.
Reviewing the gist, we create a new transition record in two scenarios:
- upon creation of a new pileup document
- upon update of the containerFraction
and we should keep it like that, as we have already exhaustively discussed it.
Now thinking about the interactions with updatePileup
, we have:
- pileup document update through MSPileup (REST), without changing
containerFraction
. The method should only persist the document - and validate, if required. - pileup document update through MSPileup (REST), with changes to
containerFraction
. A transition record must be created somewhere between the REST/Data layer. Then the document is persisted - and validated, if required. - pileup document update through MSPileupTasks. Regardless of
containerFraction
changes or not, it should only persist the document - and validate, if required.
Am I missing any other interactions? Maybe what we need is to have a hook such that we can identify when a document is provided by the end user or by the service daemon (maybe a method argument).
Alan, I addressed your concern about update API in this commit: 6289df9 Please have a look, now with new method |
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Valentin. I feel much more comfortable with these changes in now. I did leave a few extra comments along the code.
In addition, giving all the confusion on the functionality/behavior of this service so far (hoping I am not alone here!), I think it's important to update the MSPileup documentation with a summary of the relevant information and expected behavior that we have been discussing in this PR and over the past weeks.
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vkuznet Valentin, I left a comment in the code that you can disregard if you will. Otherwise, please squash changes accordingly and I will cut a new release and push it to testbed before we enter the weekend. Thanks
Jenkins results:
|
Alan, I updated this PR with new code where I moved validation part into I want you to review these changes one more time and then if you satisfied, I'll squash them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valentin, please find a couple of comments/questions along the code.
Valentin, once you are done and have performed basic validation of these changes, please squash the commits accordingly. |
c6d77b3
to
b8cbfbb
Compare
Alan, I rerun my test in test10 cluster and documents looks fine to me. Therefore, I squashed all commits and once jenkins are back you can have a final look before merging this PR. |
Jenkins results:
|
b8cbfbb
to
60ec3d2
Compare
Jenkins results:
|
60ec3d2
to
05a5ec5
Compare
Jenkins results:
|
05a5ec5
to
4ec365b
Compare
During integration tests I found that we can't use |
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vkuznet Valentin, everything is looking good, except for one change that would be very beneficial - see comment along the code. Feel free to squash commits.
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Valentin. Please squash the commits and it's good to go.
48b4a03
to
0e59aa1
Compare
Alan, commits are squashed. Please merge at your convenience. |
Jenkins results:
|
Fixes #11920
Status
ready
Description
Fix issue with empty userDN during MSPileupTask cycle of updating MSPileup record. The code has been refactored to provide new
updateTransitionRecord
function. With new stand-alone function I added separate unit test to test its functionality.Is it backward compatible (if not, which system it affects?)
YES
Related PRs
External dependencies / deployment changes