Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with partial block in the parentage fix - change to API #11757

Merged
merged 3 commits into from
Oct 24, 2023

Conversation

amaltaro
Copy link
Contributor

@amaltaro amaltaro commented Sep 29, 2023

Fixes #11715

Status

ready

Description

The following is provided with this PR:

  • create a separate method in DBS3Reader to resolve the list of child/parent file ids (making it easier to code unit tests for)
  • whenever a given run/lumi does not have a match in the parent file, skip that file id tuple, set the parent file to -1
  • calculate the number of missing parents (child files with no parentage relationship) and provide it to the dbs3-client insertFileParents API
  • lastly on DBS3Reader, make the list of child/parent ids unique before passing it over to the DBS server (through a new Utils function called makeListElementsUnique)
  • added some verbose logging records (at DEBUG level)

In addition, a couple of extra data is mocked in DBS and the json dump has been updated as well.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

None

External dependencies / deployment changes

Changes are being made to the DBS Server: dmwm/dbs2go#101

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 tests no longer failing
    • 3 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 1 warnings
    • 91 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 22 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14539/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 1 warnings
    • 91 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 22 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14547/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 1 warnings
    • 96 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 24 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14569/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 1 warnings
    • 96 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 24 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14570/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
    • 2 tests added
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 1 warnings
    • 96 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 25 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14576/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

I should have requested the PR review yesterday evening, but I forgot it. So here we go. Unit test failure is unstable and unrelated.

@@ -50,3 +51,15 @@ def convertFromUnicodeToBytes(data):
return type(data)(list(map(convertFromUnicodeToBytes, data)))
else:
return data


def noDupListOfLists(listObj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that logic of this function cover all use cases, e.g. what if you have overlap slices like this:

[[1,10], [4,12]]

In fact, here is what python says about it:

>>> from itertools import islice, chain, groupby
>>> obj=[[1,10], [4,12]]
>>> obj.sort()
>>> list(k for k, _ in groupby(obj))
[[1, 10], [4, 12]]

Therefore, you either should change the docstring and describe the use-case correctly or change the logic of the function if we need to cover such use-cases.

I also suggest to change the function name from being to something like removeDuplicateListRanges which is what it is doing right now, and change its docstring accordingly to reflect its logic. It removes duplicate list ranges but it allows duplicate list entries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valentin, perhaps I could rename it to something like renameDuplicateListElements?

I don't understand your first comment though. The use case is clear and it's described in the docstring (and covered by unit tests). I don't understand what "overlap" you mean with [[1,10], [4,12]], please clarify.

The use case is simple, you have a list where each element is another list (list of lists). The goal of this function is to make the list of elements unique (so maybe saying remove duplicates isn't great either). E.g., if you have:

[[1,10], [4,12], [1,10]]

the output of this function should be (regardless of the order of the elements):

[[1,10], [4,12]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe named it as makeListElementsUnique would be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few things here:

  • if you call them tuples then why do you use lists, the list of tuples would be [(1,10), (4,12)]
  • if your lists represent range, e.g. [1,10] is short cut for [1,2,3,4,5,6,7,8,9,10], then you have overlap in ranges

In either way I suggest to make proper clarification either in docstring or logic.

Said that, makeListElementsUnique seems more appropriate but if you treat it as pairs, i.e. tuples, but in this case it is better to call them pairs or tuples and use tuple data types.

@@ -916,29 +924,11 @@ def fixMissingParentageDatasets(self, childDataset, insertFlag=True):
self.logger.info("Found %d blocks without parentage information", len(blocks))
for blockName in blocks:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is sequential logic which will grow significantly with number of processed blocks. My advise to make it concurrent and process blocks in parallel, but it will require changing logic to use async or rely on queue and multiprocessing modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beyond the scope of these changes, but it can definitely be changed at some point. Feel free to open a GH issue for that.

self.assertListEqual(expRes, noDupListOfLists(inputTest))

# trying a different data type
expRes = [[2, 20], [4, 40], [4, 41], [5, 40]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why you have overlapping lumi ranges. the [4,40], [4,41] and [5.40], is it valid use case? If so, then noDupListOfLists should not be called no duplicate list since they list are overlapping and contain duplicates upon unfolding them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not lumi ranges. This is a tuple of 2 integers. I guess by overlap you mean tuple with the same values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is tuples then why not to use tuple data-type, i.e. change from [[1,10], ...] to [(1,10), ...] and then call the items as pairs.

parentFileId = parentRunLumi[runLumi]
listChildParent.append([childFileId, parentFileId])

# then add all run lumi pairs which are missing at the parent Dataset by appending None as parentage information
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is misleading as logic below does not add None, instead it adds -1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks for catching it!

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 5 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 1 warnings
    • 95 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 24 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14580/artifact/artifacts/PullRequestReport.html

@amaltaro amaltaro requested a review from vkuznet October 24, 2023 19:47
Identify missing files and use the new missing_files dbs parameter

Do not append file pair when it is missing parent; make list unique

Provide function to remove duplicates in a list of lists

Add tuple with -1 parent file id

Convert AMR debugging records to logger.debug level

apply Valentins change

rename noDupListOfLists to makeListElementsUnique
update unit test

IteratorTools unit test

fix unit tests

use new function name in unit tests
@amaltaro
Copy link
Contributor Author

Given that I updated the code we were discussing, I can no longer comment in thread. So here goes my reply to your points, @vkuznet:

  • in no moment I said those were ranges, so don't mix things up please.
  • that function works for a list object, regardless of the internal object type.

Nonetheless, I updated the docstring saying that it expects a list of lists OR a list of tuples. It has examples in the function docstring as well, in addition to extra tests in the unit tests.

I hope you don't mind, but I already squashed all those changes in their respective commits. Please have another look still today, if possible :-D

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
    • 5 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 1 warnings
    • 95 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 24 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14581/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@vkuznet vkuznet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alan, thanks for addressing my questions and clarifications. I have no furthe rcomments.

@amaltaro
Copy link
Contributor Author

Awesome! Much appreciated Valentin.

@amaltaro amaltaro merged commit 88489ee into dmwm:master Oct 24, 2023
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StepChainParentageFixTask ReqMgr2 thread failing to resolve dataset with incomplete parentage information
3 participants