Skip to content
This repository has been archived by the owner on Jan 31, 2022. It is now read-only.

Hw access optimization #327

Merged
merged 15 commits into from Apr 6, 2020

Conversation

mexanick
Copy link
Contributor

@mexanick mexanick commented Mar 19, 2020

Description

This PR brings:

  • Complete removal of uhal dependencies everywhere in gemhardware except amc13-related code
  • Conversion of all the RPC calls in gemhardware to templated ones
  • Partial removal of obsolete functionality where it is obvious
    • hardware monitoring classes are removed as not used
  • Cleaning of the FM transitions

The quality of the code is not up to industrial standards, but this is one more technical stop on the way to a better structured clean project. A number of issues were spotted during the work on this PR which have to be fixed before we can advance.

Notably:

  • namespaces accross the and has to be uniformized
  • Compile-time definition for GEM_VARIANT added to makefiles of gemhardware/managers and gemhardware/devices
    • must be fixed with new build system. Eventually has to be converted into a runtime constant as we want the same backend software to work with different generation front-ends
  • lots of obsolete functions which require interface redesign
  • argument passing between and has to be uniformized - see New Feature: Adding setAllChannelRegisters to HwVFAT #173 of ctp7_modules

Cleaning the FSM transition, getting rid of uhal library.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context

How Has This Been Tested?

Not yet tested on the hardware. Help is needed on building/deployment instructions for ctp7_modules. So far build system is unclear. However the code compiles and in general should run. Would also prefer a code review before stepping into testing phase.

Screenshots (if appropriate):

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Copy link
Contributor

@jsturdy jsturdy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few quick comments before it's ready for reveiw:

  • would be better to replace all the // Deprecated comments with [[deprecated]] or __attribute__((__deprecated__)) (if the functions themselves aren't going to be removed in this rework)
  • why are we reintroducing the duplication of remote vs local function calls? if the method in the CTP7 modules is desired outside of the modules themselves, then why not just export it directly, i.e., why duplicate readReg with the readRemoteReg wrapper, when the templating was introduced with one of the intended goals of removing this duplication?

I think everything else can wait for proper review

@lpetre-ulb
Copy link
Contributor

* would be better to replace all the `// Deprecated` comments with `[[deprecated]]` or `__attribute__((__deprecated__))` (if the functions themselves aren't going to be removed in this rework)

👍 I would even remove the deprecated placeholder functions since they can lead to unexpected misbehavior (if cmsgemos still compiles without them).

* why are we reintroducing the duplication of remote vs local function calls? if the method in the CTP7 modules is desired outside of the modules themselves, then why not just export it directly, i.e., why duplicate `readReg` with the `readRemoteReg` wrapper, when the templating was introduced with one of the intended goals of removing this duplication?

Wouldn't these remote RPC methods be only temporary until we no longer need raw register accesses from cmsgemos? And replacing all function calls by functor calls in the ctp7_modules is not going to be pretty...

@jsturdy
Copy link
Contributor

jsturdy commented Mar 20, 2020

  • why are we reintroducing the duplication of remote vs local function calls? if the method in the CTP7 modules is desired outside of the modules themselves, then why not just export it directly, i.e., why duplicate readReg with the readRemoteReg wrapper, when the templating was introduced with one of the intended goals of removing this duplication?

Wouldn't these remote RPC methods be only temporary until we no longer need raw register accesses from cmsgemos? And replacing all function calls by functor calls in the ctp7_modules is not going to be pretty...

I don't know, either there's a use case for having these low level functions exposed on on the PC side, in which case I'd just export them as is and do a refactor in ctp7_modules now (it's a pretty simple regex replace to get it all done), or we don't want them outside, in which case why not just make that change now, i.e., create the ctp7_modules functions that replace the ones that are using these one-off calls. That's what I did with the HwGenericAMC -> amc module (although I didn't replace the calls in cmsgemos at the time).
The third alternative is to have the exported name the same as the "local" name, as they'll be in a different namespace, right?

@lpetre-ulb
Copy link
Contributor

I don't know, either there's a use case for having these low level functions exposed on on the PC side, in which case I'd just export them as is and do a refactor in ctp7_modules now (it's a pretty simple regex replace to get it all done), or we don't want them outside, in which case why not just make that change now, i.e., create the ctp7_modules functions that replace the ones that are using these one-off calls. That's what I did with the HwGenericAMC -> amc module (although I didn't replace the calls in cmsgemos at the time).

I think the use case is only a temporary compatibility layer for migration purposes. I prefer to see multiple easily reviewable small PR rather than one huge PR with many changes. The mid-term goal could be like:

  1. Get rid of uHAL (this PR)
  2. Create RPC methods that replace the direct read/write accesses (one or more PR)
  3. Create a RPC module for monitoring purposes (one or more PR)

The third alternative is to have the exported name the same as the "local" name, as they'll be in a different namespace, right?

For now, at least, they are in the same namespace, in this case, utils. But, the function would hide the class, so the behavior would be the one you expect, i.e. the function available for the ctp7_modules and the functor available for remote calls. That's terrible design though.

…package.

Lots of dirty hacks which won't be fixed in the current PR in order to keep it compact.
Notably:
  * namespaces accross the  and  has to be uniformized
  * Compile-time definition for  added to makefiles of  and
    * must be fixed with new build system. Eventually has to be converted into a runtime constant as we want the same backend software to work with different generation front-ends
  * lots of obsolete functions which require interface redesign
  * argument passing between  and  has to be uniformized - see cms-gem-daq-project#173 of
Code compiles and this is IMHO enough for the current PR
@mexanick
Copy link
Contributor Author

please check updated PR description

Copy link
Contributor

@lpetre-ulb lpetre-ulb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing outstanding, can be compiled. Should be good for hardware tests. I would be careful where and how the INPUT_ENABLE_MASK is set.


// FIXME make me more streamlined
// this maybe shouldn't be done? Commenting out for now, but needs testing
//info.slotID = slot+1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be tested. A changelog, or announcement, is required if the configuration has to be changed somehow.

Comment on lines -194 to -195
// FIXME should not need this here?
amc->disableDAQLink();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be tested, since it changes the behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but the main purpose if this PR is to have something which uses the templated RPC framework and can cycle through the FSM states. Proper implementation of the state transitions IMHO deserves another PR (and at the moment it has to be also chained with the corresponding changes of the ctp7_modules). As a lot of things are candidates for refactoring, simply making it work "as is but without uHAL" doesn't appeal too much to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I believe that regressions should not be allowed in PR. However, this PR as well as the next round of PR are kind of special due to the huge refactoring in progress. But what about the other developments which happen in parallel? Once this PR gets merged, how would the calibration suite be tested, for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say there's more to the calibration suite tests... The system also can't be easily recovered from cold boot state. But if you insist, I can retrofit this back. However I can't test the real data taking until we fix the cooling

gemhardware/devices/include/gem/hw/devices/GEMHwDevice.h Outdated Show resolved Hide resolved
gemhardware/devices/include/gem/hw/devices/GEMHwDevice.h Outdated Show resolved Hide resolved
gemhardware/devices/src/common/amc/HwGenericAMC.cc Outdated Show resolved Hide resolved
gemhardware/devices/src/common/optohybrid/HwOptoHybrid.cc Outdated Show resolved Hide resolved
gemhardware/devices/src/common/optohybrid/HwOptoHybrid.cc Outdated Show resolved Hide resolved
gemhardware/devices/src/common/optohybrid/HwOptoHybrid.cc Outdated Show resolved Hide resolved
…the cmsgemos side, I make it compliant with the corresponding method of ctp7_modules
@mexanick mexanick linked an issue Apr 1, 2020 that may be closed by this pull request
3 tasks
…faces (candidates for refactoring in base class) and linking issues against xhal-base (temporary hacks)
@mexanick
Copy link
Contributor Author

mexanick commented Apr 2, 2020

Test report:

  • templated RPC calls could be successfully executed from the cmsgemos
  • linking problem against xhal-base has been discovered, which prevented correct exceptions treatment and subsequent code crash. Issue reported in xhal repository, suggested there temporary fix is implemented.
  • discovered quite a few attempts to read registers which are not leaf nodes and do not have "read" permissions. This part of the code wasn't changed in the current PR and I suspect that the problem was hidden by ipbus transactions ignoring register permissions. Since these bugs don't affect state transitions, I propose to address them in subsequent PR
  • FSM transition between "Initial" and "Halted" states is executed correctly. If the HW is present, the system goes to "Halted" state, if HW is not accessible, the system goes to "Error"
  • FSM transition between "Halted" and "Configured" state resulted in "Error" due to HW problem
  • "Reset" action from "Halted" state hangs up indefinitely

Following this I'd propose to merge this PR and address remaining and future issues in separate PRs. As a proof of concept, the system can work without uhal and with templated rpc.

jsturdy
jsturdy previously requested changes Apr 2, 2020
@@ -55,7 +59,8 @@ LibraryDirs+=$(BUILD_HOME)/gemutils/lib/$(XDAQ_OS)/$(XDAQ_PLATFORM)
LibraryDirs+=$(uHALROOT)/lib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still needed?

gemhardware/devices/Makefile Outdated Show resolved Hide resolved
@@ -44,6 +47,7 @@ IncludeDirs+=$(BUILD_HOME)/gemhardware/utils/include
IncludeDirs+=$(BUILD_HOME)/gemutils/include
IncludeDirs+=$(uHALROOT)/include
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, still refs uhal

}

uint32_t gem::hw::HwGenericAMC::getTTCCounter(AMCTTCCommandT const& cmd)
uint32_t gem::hw::amc::HwGenericAMC::getTTCCounter(AMCTTCCommandT const& cmd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in cases like this, where there is much more than just a single (read|write)Reg it would have made sense to use the ctp7_modules functor call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but this is out of the scope of the current PR

}

uint32_t gem::hw::HwGenericAMC::getOptoHybridTriggerLinkCount(uint8_t const& oh, uint8_t const& link, AMCOHLinkCountT const& count)
uint32_t gem::hw::amc::HwGenericAMC::getOptoHybridTriggerLinkCount(uint8_t const& oh, uint8_t const& link, AMCOHLinkCountT const& count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise here i guess not, as i see this one was not ported over...

gemhardware/managers/Makefile Show resolved Hide resolved
@mexanick
Copy link
Contributor Author

mexanick commented Apr 2, 2020

Test update
With a cool enough front end all the FSM transitions are executed as before

@mexanick mexanick requested a review from jsturdy April 2, 2020 16:00
Copy link
Contributor

@lpetre-ulb lpetre-ulb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved as a technical PR. I prefer to see all new developments use the templated RPC and run into easily fixable issues rather than waiting months for the perfect PR; all issues from such a large change cannot be discovered by simple testing.

@mexanick mexanick dismissed jsturdy’s stale review April 6, 2020 13:31

Changes implemented

@mexanick mexanick changed the title WIP: Hw access optimization Hw access optimization Apr 6, 2020
@mexanick mexanick merged commit ba2bc52 into cms-gem-daq-project:develop Apr 6, 2020
@mexanick mexanick deleted the hw_access_optimization branch April 6, 2020 13:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FSM actions refactoring
3 participants