Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport StoredMergeableRunProductMetadata #25214

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
152 changes: 152 additions & 0 deletions DataFormats/Provenance/interface/StoredMergeableRunProductMetadata.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
#ifndef DataFormats_Provenance_StoredMergeableRunProductMetadata_h
#define DataFormats_Provenance_StoredMergeableRunProductMetadata_h

/** \class edm::StoredMergeableRunProductMetadata

This class holds information used to decide how to merge together
run products when multiple run entries with the same run number
and ProcessHistoryID are read from input files contiguously. This
class is persistent and stores the information that needs to be
remembered from one process to the next. Most of the work related
to this decision is performed by the class MergeableRunProductMetadata.
The main purpose of this class is to hold the information that
needs to be persistently stored. PoolSource and PoolOutputModule
interface with this class to read and write it.

Note that the information is not stored for each product.
The information is stored for each run entry in Run TTree
in the input file and also for each process in which at least
one mergeable run product was selected to be written to the
output file. It is not necessary to save information
for each product individually, it will be the same for every
product created in the same process and in the same run entry.

The main piece of information stored is the list of luminosity
block numbers processed when the product was created. Often,
this list can be obtained from the IndexIntoFile and we do not
need to duplicate this information here and so as an optimization
we don't. There are also cases where we can detect that the merging
has created invalid run products where part of the content
has probably been double counted. We save a value to record
this problem.

To improve performance, the data structure has been flattened
into 4 vectors instead of containing a vector containing vectors
containing vectors.

When the user of this class fails to find a run entry with a
particular process, the assumption should be made that the lumi
numbers are in IndexIntoFile and valid.

Another optimization is that if in all cases the lumi numbers
can be obtained from IndexIntoFile and are valid, then all
the vectors are cleared and a boolean value is set to indicate
this.

\author W. David Dagenhart, created 23 May, 2018

*/

#include "DataFormats/Provenance/interface/RunLumiEventNumber.h"

#include <string>
#include <vector>

namespace edm {

class StoredMergeableRunProductMetadata {
public:

// This constructor exists for ROOT I/O
StoredMergeableRunProductMetadata();

// This constructor is used when creating a new object
// each time an output file is opened.
StoredMergeableRunProductMetadata(std::vector<std::string> const& processesWithMergeableRunProducts);

std::vector<std::string> const& processesWithMergeableRunProducts() const {
return processesWithMergeableRunProducts_;
}

class SingleRunEntry {
public:

SingleRunEntry();
SingleRunEntry(unsigned long long iBeginProcess, unsigned long long iEndProcess);

unsigned long long beginProcess() const { return beginProcess_; }
unsigned long long endProcess() const { return endProcess_; }

private:

// indexes into singleRunEntryAndProcesses_ for a single run entry
unsigned long long beginProcess_;
unsigned long long endProcess_;
};

class SingleRunEntryAndProcess {
public:

SingleRunEntryAndProcess();
SingleRunEntryAndProcess(unsigned long long iBeginLumi,
unsigned long long iEndLumi,
unsigned int iProcess,
bool iValid,
bool iUseIndexIntoFile);


unsigned long long beginLumi() const { return beginLumi_; }
unsigned long long endLumi() const { return endLumi_; }

unsigned int process() const { return process_; }

bool valid() const { return valid_; }
bool useIndexIntoFile() const { return useIndexIntoFile_; }

private:

// indexes into lumis_ for products created in one process and
// written into a single run entry.
unsigned long long beginLumi_;
unsigned long long endLumi_;

// index into processesWithMergeableRunProducts_
unsigned int process_;

// If false this indicates the way files were split and merged
// has created run products that are invalid and probably
// double count some of their content.
bool valid_;

// If true the lumi numbers can be obtained from IndexIntoFile
// and are not stored in the vector named lumis_
bool useIndexIntoFile_;
};

// These four functions are called by MergeableRunProductMetadata which
// fills the vectors.
std::vector<SingleRunEntry>& singleRunEntries() { return singleRunEntries_; }
std::vector<SingleRunEntryAndProcess>& singleRunEntryAndProcesses() { return singleRunEntryAndProcesses_; }
std::vector<LuminosityBlockNumber_t>& lumis() { return lumis_; }
bool& allValidAndUseIndexIntoFile() { return allValidAndUseIndexIntoFile_; }

// Called by RootOutputFile immediately before writing the object
// when an output file is closed.
void optimizeBeforeWrite();

bool getLumiContent(unsigned long long runEntry,
std::string const& process,
bool& valid,
std::vector<LuminosityBlockNumber_t>::const_iterator & lumisBegin,
std::vector<LuminosityBlockNumber_t>::const_iterator & lumisEnd) const;

private:

std::vector<std::string> processesWithMergeableRunProducts_;
std::vector<SingleRunEntry> singleRunEntries_; // index is the run entry
std::vector<SingleRunEntryAndProcess> singleRunEntryAndProcesses_;
std::vector<LuminosityBlockNumber_t> lumis_;
bool allValidAndUseIndexIntoFile_;
};
}
#endif
81 changes: 81 additions & 0 deletions DataFormats/Provenance/src/StoredMergeableRunProductMetadata.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#include "DataFormats/Provenance/interface/StoredMergeableRunProductMetadata.h"

namespace edm {

StoredMergeableRunProductMetadata::StoredMergeableRunProductMetadata() :
allValidAndUseIndexIntoFile_(true) { }

StoredMergeableRunProductMetadata::
StoredMergeableRunProductMetadata(std::vector<std::string> const& processesWithMergeableRunProducts):
processesWithMergeableRunProducts_(processesWithMergeableRunProducts),
allValidAndUseIndexIntoFile_(true) { }

StoredMergeableRunProductMetadata::SingleRunEntry::SingleRunEntry() :
beginProcess_(0),
endProcess_(0) { }

StoredMergeableRunProductMetadata::SingleRunEntry::SingleRunEntry(unsigned long long iBeginProcess,
unsigned long long iEndProcess) :
beginProcess_(iBeginProcess),
endProcess_(iEndProcess) { }

StoredMergeableRunProductMetadata::SingleRunEntryAndProcess::SingleRunEntryAndProcess() :
beginLumi_(0),
endLumi_(0),
process_(0),
valid_(false),
useIndexIntoFile_(false) { }

StoredMergeableRunProductMetadata::SingleRunEntryAndProcess::
SingleRunEntryAndProcess(unsigned long long iBeginLumi,
unsigned long long iEndLumi,
unsigned int iProcess,
bool iValid,
bool iUseIndexIntoFile) :
beginLumi_(iBeginLumi),
endLumi_(iEndLumi),
process_(iProcess),
valid_(iValid),
useIndexIntoFile_(iUseIndexIntoFile) { }

void StoredMergeableRunProductMetadata::optimizeBeforeWrite() {
if (allValidAndUseIndexIntoFile_) {
processesWithMergeableRunProducts_.clear();
singleRunEntries_.clear();
singleRunEntryAndProcesses_.clear();
lumis_.clear();
}
}

bool StoredMergeableRunProductMetadata::
getLumiContent(unsigned long long runEntry,
std::string const& process,
bool& valid,
std::vector<LuminosityBlockNumber_t>::const_iterator & lumisBegin,
std::vector<LuminosityBlockNumber_t>::const_iterator & lumisEnd) const {

valid = true;
if (allValidAndUseIndexIntoFile_) {
return false;
}

SingleRunEntry const& singleRunEntry = singleRunEntries_.at(runEntry);
for (unsigned long long j = singleRunEntry.beginProcess(); j < singleRunEntry.endProcess(); ++j) {
SingleRunEntryAndProcess const& singleRunEntryAndProcess = singleRunEntryAndProcesses_.at(j);
// This string comparison could be optimized away by storing an index mapping in
// MergeableRunProductMetadata that gets recalculated each time a new input
// file is opened
if (processesWithMergeableRunProducts_.at(singleRunEntryAndProcess.process()) == process) {
valid = singleRunEntryAndProcess.valid();
if (singleRunEntryAndProcess.useIndexIntoFile()) {
return false;
} else {
lumisBegin = lumis_.begin() + singleRunEntryAndProcess.beginLumi();
lumisEnd = lumis_.begin() + singleRunEntryAndProcess.endLumi();
return true;
}
}
}
return false;
}
}
1 change: 1 addition & 0 deletions DataFormats/Provenance/src/classes.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include "DataFormats/Provenance/interface/ProcessHistoryID.h"
#include "DataFormats/Provenance/interface/ProductID.h"
#include "DataFormats/Provenance/interface/ProductProvenance.h"
#include "DataFormats/Provenance/interface/StoredMergeableRunProductMetadata.h"
#include "DataFormats/Provenance/interface/StoredProductProvenance.h"
#include "DataFormats/Provenance/interface/ProductRegistry.h"
#include "DataFormats/Provenance/interface/RunAuxiliary.h"
Expand Down
11 changes: 11 additions & 0 deletions DataFormats/Provenance/src/classes_def.xml
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,17 @@
<version ClassVersion="10" checksum="262935904"/>
</class>
<class name="edm::IndexIntoFile::Transients" ClassVersion="0"/>
<class name="edm::StoredMergeableRunProductMetadata::SingleRunEntryAndProcess" ClassVersion="3">
<version ClassVersion="3" checksum="3004083460"/>
</class>
<class name="std::vector<edm::StoredMergeableRunProductMetadata::SingleRunEntryAndProcess>"/>
<class name="edm::StoredMergeableRunProductMetadata::SingleRunEntry" ClassVersion="3">
<version ClassVersion="3" checksum="294451415"/>
</class>
<class name="std::vector<edm::StoredMergeableRunProductMetadata::SingleRunEntry>"/>
<class name="edm::StoredMergeableRunProductMetadata" ClassVersion="3">
<version ClassVersion="3" checksum="2293482595"/>
</class>
<class name="edm::ProcessHistoryID"/>
<class name="std::set<edm::ProcessHistoryID >"/>
<class name="std::vector<edm::ProcessHistory>"/>
Expand Down