Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to create a HipPy input dataset file #15886

Merged
merged 4 commits into from Sep 20, 2016

Conversation

hroskes
Copy link
Contributor

@hroskes hroskes commented Sep 16, 2016

Can also add a similar function for MP if that's needed, but I'm not familiar with the syntax.

Also make the AIO tool dataset class a bit more general by splitting up the biggest function. No point in rewriting another module to do the same thing.

Also, remove some inefficiency in validation by skipping files that have the run number in their filename if they are not in the selected run range.

And a little fix in the geometry comparison.

- filter the file list when possible to avoid unnecessary opening and closing
- add function for HipPy file list
(the root file has never been called that as far back as I can tell, and it's copied later anyway https://github.com/cms-sw/cmssw/blob/813e5a/Alignment/OfflineValidation/python/TkAlAllInOneTool/configTemplates.py#L104)
@cmsbuild
Copy link
Contributor

A new Pull Request was created by @hroskes (Heshy Roskes) for CMSSW_8_1_X.

It involves the following packages:

Alignment/HIPAlignmentAlgorithm
Alignment/OfflineValidation

@ghellwig, @cerminar, @cmsbuild, @franzoni, @mmusich, @davidlange6 can you please review it and eventually sign? Thanks.
@mschrode, @ghellwig, @mmusich, @tocheng, @tlampen this is something you requested to watch as well.
@slava77, @smuzaffar you are the release manager for this.

cms-bot commands are list here #13028

@ghellwig
Copy link

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 19, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/15258/console

Copy link

@ghellwig ghellwig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hroskes looks fine to me, except for the minor comments I made
In general, I think, it would be good to migrate the Dataset class to Alignment/CommonAlignment/python/tools. But this could be done in the next round of updates. On the MillePede side there is a script that does a similar job, but with some code duplication. The goal of this script (mps_create_file_lists.py) is to create statistically independent dataset for alignment and validation. I think, it would be good if we could unify this somehow, to get common validation datasets and create file lists for the remaining data in each of the formats required by the two alignment algorithms.
What do you think?

def getrunnumberfromfilename(filename):
parts = filename.split("/")
result = error = None
if parts[0] != "" or parts[1] != "store":

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hroskes shouldn't it be and instead of or?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so... this way catches "something/store" and also "/something", and would not.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right!

error = "does not start with /store"
elif parts[2] in ["mc", "relval"]:
result = 1
elif parts[-2] != "00000" or not parts[-1].endswith(".root"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hroskes is this nomenclature for file names defined or required somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question... I didn't know about it until @usarica told me. It's true for many datasets but not for this one. Actually Ulascan mentioned that in that case there are multiple runs in the same data file.

Basically at this point I was trying to be as strict as possible, and not remove the filename if it's not in the exact pattern that seems to be satisfied for most datasets.

If you have a better way of figuring out the run number from the filename that would be great. This way is more efficient than using a lumi filter which requires opening and closing all the files. It's particularly important for HipPy because we loop through the files multiple times, but since I was implementing it anyway I figured we might as well use it for validation too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your way works in case of prompt data, but not for rereco (as the dataset that you linked). Since we usually run alignments on prompt reco, it is fine this way.
I was just curious, if you know about a place where this convention is defined.
In MillePede, we have a similar guessing mechanism, but I would not call it "a better way".

@@ -252,9 +252,6 @@ def createScript(self, path):
resultingFile = os.path.expandvars( resultingFile )
resultingFile = os.path.abspath( resultingFile )
resultingFile = "root://eoscms//eos/cms" + resultingFile #needs to be AFTER abspath so that it doesn't eat the //
repMap["runComparisonScripts"] += \
("xrdcp -f OUTPUT_comparison.root %s\n"
%resultingFile)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hroskes can you elaborate what the exact effect of this fix is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effect is basically to remove a bash error because OUTPUT_comparison.root does not exist.

I think this line was supposed to copy the output of makeArrowPlots("comparison.root", "...")

But actually the first argument to makeArrowPlots is something else, so this file is never created and it just gives a warning.

The actual output file contains .oO[name]Oo. so it gets copied in this step.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks for the explanation!

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

@hroskes
Copy link
Contributor Author

hroskes commented Sep 19, 2016

@ghellwig Unifying sounds great. Where is mps_create_file_lists.py? I don't see it in Alignment/MillePedeAlignmentAlgorithm/scripts.

@ghellwig
Copy link

@hroskes I just realized, that the PR containing this script is not yet merged, but I linked the code in one of my comments.

fileList.remove(filename)
except AllInOneError as e:
if forcerunselection: raise
print e.message

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hroskes just one question: in the All-in-One tool forcerunselection is set to False, right?
But in case, one runs a validation using rereco data (for whatever reason), one get's quite a lot of stdout by this line, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I pushed and fixed this now.

(I guess it's still a lot of output in the case where most but not all of the files fall into this category, but I would be surprised if that ever happens, and in that case you would want to know what's going on.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hroskes looks good now!

@cmsbuild
Copy link
Contributor

Pull request #15886 was updated. @ghellwig, @cerminar, @cmsbuild, @franzoni, @mmusich, @davidlange6 can you please check and sign again.

@ghellwig
Copy link

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 19, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/15268/console

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

@ghellwig
Copy link

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_X IBs (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @smuzaffar

@davidlange6
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit a0e3a37 into cms-sw:CMSSW_8_1_X Sep 20, 2016
@hroskes hroskes deleted the hippy-dataset branch October 4, 2016 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants