-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moving RDataSource closer to Podio/EDM4hep #593
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kjvbrt. I wasn't aware this was possible in this generality. Very nice. I am probably still missing quite a few things, but I tried to summarize my current understanding below and added and a few comments / questions inline
For my general understanding, what this does is basically:
- Create a reader for each thread / slot
- Keep a vector of Frames, again one per thread / slot
- Populate a vector of collections per thread / slot, where
- each collection always is at the same position, so that we can simply index into it
I am wondering whether we could do with just one reader instead of multiple, but I suppose then we would probably have to do our own locking around that, because the RDataSource
doesn't do that for us?
If I understand correctly, we could in the end even replace the current ROOTReader
with the Reader
that is introduced in #522 and effectively support all backends + RDataFrame?
include/podio/ROOTDataSource.h
Outdated
template<typename T> | ||
std::vector<T**> GetColumnReaders(std::string_view columnName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the documentation of RDataSource
correctly, we don't need this at all. Also this is not overriding anything from the RDataSource
base class. I think all of the "magic" is happening in the GetColumnReadersImpl
, right?
src/ROOTDataSource.cc
Outdated
m_frames.emplace_back( | ||
std::make_unique<podio::Frame>( | ||
podio::Frame(m_podioReaders[i]->readEntry("events", 0)))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we reading the same event nSlots
times this way? As far as I can see we read the correct entry in any case later in SetEntry
. Can we not just simply put an empty Frame here? Or even just populate the vector with a bunch of nullptr
?
The sanitizer workflows are most likely failing because the tests that produce the input files are not running. You can disable them by adding the new tests also to the list in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid the hardcoded strings :)
src/ROOTDataSource.cc
Outdated
// Get collections stored in the files | ||
std::vector<std::string> collNames = frame.getAvailableCollections(); | ||
// std::cout << "podio::ROOTDataSource: Found following collections:\n"; | ||
for (auto& collName : collNames) { | ||
const podio::CollectionBase* coll = frame.get(collName); | ||
if (coll->isValid()) { | ||
m_columnNames.emplace_back(collName); | ||
m_columnTypes.emplace_back(coll->getValueTypeName()); | ||
// std::cout << " - " << collName << "\n"; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also make the parameters of the frame available as columns, e.g. to access the cross section? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few new minor suggestions for using the reader interface, now that it's available. I think I got all the places, but I might have missed some.
src/CMakeLists.txt
Outdated
target_link_libraries(podioRootIO PUBLIC podio::podio | ||
ROOT::Core | ||
ROOT::RIO | ||
ROOT::Tree | ||
ROOT::ROOTVecOps | ||
ROOT::ROOTDataFrame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am contemplating whether we should move the ROOTDataSource
into a separate library instead of making it part of the podioRootIO
one. The main argument against it is that we will have one more library. However, if we really make use of the Reader
interface internally, then we would need to link podioRootIO
against podioIO
(making a cyclic lib dependency) and potentially also against podioSioIO
. Then we would have no separation of the libraries at all any longer.
Since it is probably easier to merge libraries down the line, than splitting them apart, I would suggest to leave the podioRootIO
library untouched and instead make a new podioRDataSource
library that links against podio::podioIO
(and all the necessary things from ROOT).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I had to link against podio::podioIO
in couple of places. Will create separate library...
I continue with testing the changes made here in FCCAnalyses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be an issue with the installation now, because the EDM4hep tests are failing. As far as I can tell it has to do with header files that are no longer installed. Otherwise it's mainly minor comments.
# Export compile commands --- used by the tools from clang family | ||
set(CMAKE_EXPORT_COMPILE_COMMANDS ON) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this in an alias for cmake
. I don't think we want to change a default here and rather let the users decide whether they want this or not.
include/podio/ROOTDataSource.h
Outdated
std::vector<void*> GetColumnReadersImpl(std::string_view name, const std::type_info& typeInfo) override; | ||
|
||
std::string AsString() override { | ||
return "Podio data source"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are spaces a problem here? Or the fact that this is not the same as the class name?
endif() | ||
if (NOT ENABLE_RNTUPLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (NOT ENABLE_RNTUPLE) | |
if (NOT ENABLE_DATASOURCE) |
At least that would make sense for the files that get filtered.
file(GLOB headers_necessary | ||
RELATIVE ${PROJECT_SOURCE_DIR} | ||
"${PROJECT_SOURCE_DIR}/install/podio/*.h") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is fishy with this construct here or the filtering below, because no headers are installed now: https://github.com/AIDASoft/podio/actions/runs/10090563011/job/27900250138?pr=593#step:6:600
unsigned int nEventsInFiles = 0; | ||
auto podioReader = podio::makeReader(m_filePathList); | ||
nEventsInFiles = podioReader.getEntries(podio::Category::Event); | ||
frame = podio::Frame(podioReader.readFrame(podio::Category::Event, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frame = podio::Frame(podioReader.readFrame(podio::Category::Event, 0)); | |
frame = podioReader.readFrame(podio::Category::Event, 0); |
The Reader
interface already gives us a fully usable Frame, we don't need to construct it any longer.
} | ||
|
||
for (size_t i = 0; i < m_nSlots; ++i) { | ||
m_frames.emplace_back(std::make_unique<podio::Frame>(podio::Frame{})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_frames.emplace_back(std::make_unique<podio::Frame>(podio::Frame{})); | |
m_frames.emplace_back(std::make_unique<podio::Frame>()); |
make_unique forwards to c'tor, so there is no need to first create a frame and then make make_unique call the copy c'tor ?
}; | ||
|
||
/// | ||
/// Not used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not used, can you explain why it is needed to be here?
std::vector<std::string> m_filePathList = {}; | ||
|
||
/// Total number of events | ||
unsigned int m_nEvents = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsigned int m_nEvents = 0; | |
ULong64_t m_nEvents = 0; |
Shouldn't this match with the ranges defined in the next lines?
BEGINRELEASENOTES
ENDRELEASENOTES