-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store "architecture" in event provenance #30044
Comments
assign core |
New categories assigned: core @Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks |
A new Issue was created by @makortel Matti Kortelainen. @Dr15Jones, @silviodonato, @dpiparo, @smuzaffar, @makortel can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
Is the decision to use a GPU to produce products with a certain branch name one that applies to an entire process? Or can the decision be different for different Events in the same process? Can the solution we are designing assume this is always true? Is there a simple toy test configuration that will produce a product on a GPU? I actually have not worked with these GPU issues yet. I don't know much about it yet, although it is interesting. Is there any documentation about the current state of things and how the decision is made and recorded? Is this in the SwitchProducer related code? |
Just to summarize our discussion, at this point the plan is to store only process-level quantities
For point 1, the main information source would be configuration (e.g. SwitchProducer does its decision in python and communicates that to C++ code via configuration), so this connects to #31760. For point 2, the same information goes (or should eventually go for GPUs) into framework job report, and we should think how to deliver the information for both in a unified way. (likely from CUDAService in case of NVIDIA GPUs). |
I made a first pass at this. I know it is not done and there are issues that need to be resolved before we even think about merging this. But my working branch is here if you want to take a look. https://github.com/wddgit/cmssw/tree/acceleratorProvenance One thing we had discussed was making the module that produces this similar to TriggerResults in that it is not scheduled in the configuration and just always runs. The code for that is in the branch above, but I noticed a number of problems with this when I tried to run existing unit tests. Possibly this idea is a mistake?
We could just make this a standard producer that you have to configure if you want this product. I'm starting to think that might be better. Doing it like TriggerResults also introduces special code in the middle of the Framework (tolerable, but not really very nice for maintainability). Another alternative would be a configuration parameter that allows one to turn it off. I noticed the PR you submitted. Can I get the configured accelerator from this variable in the Options parameter set?
Should one of the enumeration values be assigned to the "auto" case? Can I use "availableAccelerators" for the list of hardware accelerator chips? Or does that contain different information? |
I agree that including this special producer in e.g. merge jobs would be bad. But I'm not sure if the condition should be the same as for TriggerResults producer. If I read the code correctly, the TriggerResults is produced if the process has any (trigger) Paths, did I got that right? For this case somewhat natural condition would be "does process have any modules that produce data products" (i.e. any EDProducers or EDFilters). I'd think a case where all EDProducers are in Tasks and an OutputModule in EndPath drives which of the EDProducers get run would then not lead to TriggerResults being produced, but would benefit from "AcceleratorProvenance".
A standard producer approach would work nicely with #36699 (in the leading order, at least), but this infrastructure should also cover the CPU vector architecture (for the strings storing the details of the architecture, the enum wouldn't necessarily need Maybe we should discuss more about this point (also with @Dr15Jones).
Yes, that was my idea.
That is not needed, because the One question that I'm thinking about is that do we want to interpret the "high level class" of the accelerator for the enum (e.g. |
I agree. I will try to modify it to only run when any other product is produced. That is a better criteria than associating it to TriggerResults or the existence of a path. |
Is there anything that exists to get the hardware chip names from in CMSSW or should I just start from scratch and try to figure out how to get this from the OS or environment? Maybe we can discuss this again next week. |
We should discuss about this aspect. There are already
that dig up hardware information. The CPU service itself fills JobReport cmssw/FWCore/Services/plugins/CPU.cc Lines 153 to 154 in 2472dee
and is being used by CondorStatusService cmssw/FWCore/Services/plugins/CondorStatusUpdater.cc Lines 230 to 236 in 2472dee
The information in CUDAService is not currently propagated to any monitoring (something to be addressed). I'm suspecting the current approach (with CPU service) won't really scale to many accelerator architectures, and am thinking if we should somehow harmonize the information delivery (e.g. CPU, CUDA, etc services fill some central information piece that the other Services/whatnot can ask it from). To leading order I'd expect this aspect to be decoupled from the actual storage and access part, as long as there is a sensible hook/place where to fill the actual information after we figure out a feasible way for that. |
@fwyzard What do you think, would it be useful to have the architecture/accelerator information stored by the framework (similar to provenance) for the HLT? @smorovic @emeschi (motivated by the question above, for considering implementation options) What are the constraints for evolving the streamer file format (e.g. for adding support for ProcessBlock data products)? Are these something that could be done e.g. in YETS during Run 3, or something better left for LS3, or something more relaxed or constrained? |
I guess it could be useful, yes. I assume that this information should be equivalent to untracked parameters, to avoid affecting the process hash ?
I'm very not familiar with the streamer format, but one point to be kept in mind is how this would affect (hierarchically) merging streamer files with different information. |
@makortel, streamer is used as a temporary format to ship data in P5 and to Tier-0 and there are no constraints in terms of backwards compatibility (i.e. all clients should use the same major_minor version of CMSSW) A streamer file consists of header (serialized registry) and event payload (concatenated for each event), but the variant of the format we use until the last step is missing the header, so we can merge files and as the final step prepend the header, all done each lumisection. Do you intend to have only one block globally (constant information per run), one (constant) block per process, or a unique one per event? For the first case, it is like with the current header, only one copy is taken and others are discarded, so we need to expand the header format to add another block. This kind of change could be done even during the year (hypothetical example: moving from 12_1 to 12_2). Second option is that is part of the file header, but in DAQ we need to merge them instead of discarding and add all blocks together as part of the merging process (including growing it when processes crash/restart etc.). I would put that as a task for EYETS since it is significant for the HLT and merging system. If it's shipped with each event (assuming also that it's small) it could be accompany the event payload, either a separate block - or an event product like trigger results. For the latter, probably no changes are needed in the format. Also can be done during the year. |
Not quite sure I grasp all the implications but I’ll give my two cents.
According to Andrea, for statistical purposes, this information is already collected by the DQM (for the case where one would want to know fractions of events that went through a particular arch/accelerator, and perhaps do timing analysis separately, etc.).
So the only added value would be to have the information on an event-by-event basis, right ?
This is assuming that in the future we might have multiple arch/accelerator in the farm (which is not the case today).
Then the only thing that makes sense is to have this information as an event product so that it is preserved through the merging process and easily accessible to an analysis job. I assume this is really small (there aren’t many different cases and one can encode the information efficiently) and then it is tranparent for the streamer format and easy to use downstream.
Am I missing something ?
e.
On 20 Jan 2022, at 11:07, Srecko Morovic ***@***.******@***.***>> wrote:
@makortel<https://github.com/makortel>, streamer is used as a temporary format to ship data in P5 and to Tier-0 and there are no constraints in terms of backwards compatibility (i.e. all clients should use the same major_minor version of CMSSW)
A streamer file consists of header (serialized registry) and event payload (concatenated for each event), but the variant of the format we use until the last step is missing the header, so we can merge files and as the final step prepend the header, all done each lumisection.
Do you intend to have only one block globally (constant information per run), one (constant) block per process, or a unique one per event?
For the first case, it is like with the current header, only one copy is taken and others are discarded, so we need to expand the header format to add another block. This kind of change could be done even during the year (hypothetical example: moving from 12_1 to 12_2).
Second option is that is part of the file header, but in DAQ we need to merge them instead of discarding and add all blocks together as part of the merging process (including growing it when processes crash/restart etc.). I would put that as a task for EYETS since it is significant for the HLT and merging system.
Alterntive to this, ship it from each process each lumisection as a block within containing event payloads, thus avoiding to change the merging logic (assuming reasonably small size compared to event). Probably doable during the year.
If it's shipped with each event (assuming also that it's small) it could be accompany the event payload, either a separate block - or an event product like trigger results. For the latter, probably no changes are needed in the format. Also can be done during the year.
—
Reply to this email directly, view it on GitHub<#30044 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABIXAXD4YPIUPB4AWMJ4XILUW7NF7ANCNFSM4NOCUKBA>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Yes, the idea would be to have this effectively per-event, so we can check offline if there are problems related to accelerator-specific implementations. So, storing the information in a per-Event collection would seem the easiest implementation - but it' not clear if it may cause problems downstream. Since I'm also not familiar with the |
@fwyzard @smorovic @emeschi Thanks for your input! At this point we're still considering various implementation options. The idea would be to store more information than just "was GPU used or not", right now this would include the CPU and GPU model information. (note that the starting point of this issue was CPU vector architecture) The main use cases would be for monitoring and debugging after the fact. For now our idea was to specifically store only process-wide information, since that's what we have, and in principle storing it in some per-process way should be the most efficient way to do it. If it turns out that we need to start deciding accelerator usage event-by-event, it should be straightforward to extend the model with an event product that stores only the information that can vary. I agree just storing the information in event would be easiest to do, and in the end even the space needed for CPU and GPU model names should be rather small. We'll digest the details you gave and think this further. |
By the way, if the streamer format eventually gains the possibility of storing |
FYI A step forward in addressing this issue. Just submitted and waiting for review comments from anyone interested. #37831. The information collected by the new service is the information that we propose be saved persistently. That PR does not address persistence, although that is coming in a future PR. |
I spent yesterday afternoon and this morning looking at this issue and I am not sure how to proceed with this and the related PR. But here are some comments related to the streamer format based on comments received so far. In HLT, several processes run on different machines (could be different hardware, chips). Each produces a streamer format without a file header, just events. Different events processed on different machines. Later in a separate process, these streamer files get merged by simple concatenation and a single header is added. There is currently no provision for complex logic here to do things other than simple concatenation of events and appending that to a header. Unless we dramatically change how this works, the only way to store information about the machine that processes an individual event is in the event. I see no way to store this in ProcessBlock. The only place to put the information is in the events. Unless there is special code to merge the event information, I cannot even see how create a ProcessBlock out of the information. And this special code would have to run as part of the code that writes the file header... Not sure how to do that. This is far outside of the context of the current ProcessBlock code we have now. I'm not saying it cannot be done and that I cannot do it, but it is outside of the code I usually work on and outside of my expertise and outside code Core usually controls... I suppose the process that creates the file header could read the events, merge the relevant information in some way and create a block in the header. Or the process that converts from streamer format to edm format could do the same thing and then really create a ProcessBlock object (although I think currently this is a simple merge process with no producers)... Earlier comments: From Emilio back in January, "Then the only thing that makes sense is to have this information as an event..." #30044 (comment) From Andrea back in January: "Yes, the idea would be to have this effectively per-event, so we can check offline if there are problems related to accelerator-specific implementations."#30044 (comment) So I am lost in trying to understand how to make a Streamer ProcessBlock for this and what information we would put in there. Probably after this week we should put this on hold while Matti is away until July... Unless we can resolve something this week that I can implement. I also need to focus on finishing the unrelated run concurrency PR which is nearing completion, some uninterrupted time on that would help. |
I was wondering how the ParameterSetRegistry is handled (since we were thinking to exploit that for the storage). I guess this cmssw/IOPool/Streamer/src/StreamerOutputModuleBase.cc Lines 26 to 42 in 6a9931b
means it is stored as part of the INI files. If I understood #37831 (comment) correctly, only one of the INI files is used in the merging, and the rest are discarded as all of them are (assumed to be) identical. To me the assumption of "all HLT jobs have identical architecture provenance" sounds something I would not bet on. Even if it is the situation today, and probably cases where some node(s) are unable to use the GPU or has different driver version would be very unlikely to happen, the assumption sounds rather restrictive (towards the HLT farm evolution and the architecture information we'd include in here). So probably we need to re-think the storage approach. |
Hi @makortel, correct. We only keep one copy (from one process) and discard others, which isn't sufficient for a heterogeneous farm. First option, already discussed, is to store it in the event payload. If ProcesBlock is tiny, this wouldn't add much size overhead and is the simplest option. A second option is storing this at file level (which is lumisection level in DAQ), as the output module in HLT process opens a new streamer output file for each LS (without INI part). This would require a third serialized object type for the streamer format (see, for example, https://github.com/cms-sw/cmssw/blob/master/IOPool/Streamer/interface/EventMessage.h). |
The last time we discussed this issue, one question that came up was whether our design for storing parentage of products could be useful for storing the information we are discussing in this issue. I agreed to review that and summarize that design. In memory the Parentage objects are in a ParentageRegistry. That is basically a map using the ParentageID (a big hash) to reference its contents. ProductProvenance objects in memory hold one of those ParentageIDs (big hash) along with a BranchID (32 bits) to identify the product. For persistent storage, we have a separate Parentage TTree with a single branch whose elements are Parentage objects. All unique Parentage objects that we save are written once into the TTree. The order is important. StoredProductProvenance objects do not contain the ParentageID. Instead they contain an unsigned int that is an index into the Parentage TTree. The index is the entry number of Parentage object in the TTree. The Parentage TTree is written once per file. A StoredProductProvenance object contains the index (32 bit unsigned int, and also a single 32 bit BranchID to identify the product). It is a per event object written into a branch in the Events TTree. The indexes have small values and compress very well. The ParentageID hashes are not saved persistently, which is good because they are large. Also note that the output module only saves Parentage objects for products which are kept, their ancestors, and for the original product in the case of EDAlias. The rest are dropped and don't take space in the output file. A job can be configured to drop some or all of them. I have not thought through how this applies to "accelerator provenance" yet or whether this design is useful for that case. |
Hello, |
Now that scram gained the ability to build (and load) individual libraries and externals for different "vector architectures" (the exact meaning of "architecture" will likely evolve), we should find a way to track that "architecture" in the event provenance.
After we have figured out the storage model, we need to decide how to communicate the "architecture" from scram to
cmsRun
.The text was updated successfully, but these errors were encountered: