User file fails with NAN in predicted load #4557

mjwitte · 2014-10-22T22:32:30Z

Helpdesk ticket 9913
This ticket has various issues going on, but after fixing some earlier problems, the file now fails with this error:

** Severe ** DualSetPointWithDeadBand: Unanticipated combination of heating and cooling loads - report to EnergyPlus Development Team
** ~~~ ** occurs in Zone=OFFICES During Warmup, Environment=SUMMER DESIGN DAY IN BAD AXE MI JUL, at Simulation time=07/15 05:03 - 05:05
** ~~~ ** LoadToHeatingSetPoint=NAN, LoadToCoolingSetPoint=NAN
** ~~~ ** Zone Heating Set-point=21.70
** ~~~ ** Zone Cooling Set-point=23.30
** ~~~ ** Zone TempDepZnLd=NAN
** ~~~ ** Zone TempIndZnLd=NAN
** ~~~ ** Zone ThermostatSetPoint=0.00
** Fatal ** Program terminates due to above conditions.

Running v8.2.0 64-bit on Win 8.1

rraustad · 2014-10-23T02:11:20Z

It looks like it starts here. It's looking at the zone inlet node temp?

rraustad · 2014-10-23T02:19:37Z

Using this breakpoint:
!EnergyPlus::DataGlobals::DoingSizing & EnergyPlus::DataGlobals::CurrentTime > 5 & EnergyPlus::DataGlobals::CurrentTime < 6 & ZoneNum ==4

Not sure what set that node to NAN. Maybe this code isn't the issue and something (Airloop?) is setting that node to NAN for some reason.

rraustad · 2014-10-23T02:46:34Z

Of course the UnitarySys with the multi-speed coil is connected to that node.

AirLoopHVAC:UnitarySystem,
AirLoop_UnitCool1_Offices, !- Name

The outlet node went to NAN before the other issues occurred.

rraustad · 2014-10-23T14:53:07Z

It's a choice of operating air flow for set point based control of multispeed coils. This "choice" occurs in SetOnOffMassFlowRate and considers a "zone" Heating, Cooling, or no load case. However, set point based control in not based on zone load. Setting the load variables (Heating or Cooling) may help if not already done, or rethinking what to do in the "NoLoad" (referring to zone load even though it's not used) case may help, or.....

mjwitte · 2014-10-23T15:33:10Z

@kbenne @Myoldmopar I was surprised to see NAN in the error message here. Are the release build settings such that exceptions such as divide by zero do not halt execution immediately?

EnergyArchmage · 2014-10-23T15:36:09Z

I agree, builds that carry NANs along are going to be much more difficult to debug and can defeat the utility of a traceback.

Myoldmopar · 2014-10-23T15:39:21Z

Agreed. We haven't enabled any flags such as "fast-math" equivalents that would disable NaNs, so I don't know how it's getting through calculations. @DeadParrot @kbenne Any ideas?

kbenne · 2014-10-23T15:53:32Z

It appears floating point exceptions are disabled by default on MSVC.

http://msdn.microsoft.com/en-us/library/aa289157%28VS.71%29.aspx#floapoint_topic8

Looks like you can turn them on with

_clearfp();
_controlfp( _EM_ZERODIVIDE, _MCW_EM );

DeadParrot · 2014-10-23T16:00:04Z

@Myoldmopar NAN detection is not inherent but can be controlled by the code. In main.cc we have this:

// Enable floating point exceptions
#ifndef NDEBUG
#ifdef __unix__
    feenableexcept( FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW );
#endif
#endif

so in debug Linux builds those events will get trapped. The FE_INVALID should halt on a NAN that is caused by other than divide by zero bugs (uninitialized variables, stomping on or reading garbage memory via a bounds error, etc.) do not cause an immediate abort. The reason this isn't on for release builds is the performance impact.

The best debugging approach would appear to be running the case with a debug Linux build. There are Intel C++ switches and probably VC as well that can do something similar in Windows and _controlfp may work there instead of feenableexcept and setting those up for the Windows debug builds is probably a good idea.

If that doesn't identify the problem then it needs good old-fashioned debugging. That is hard work: more efficient/proper/logical would be to first fix the known uses of uninitialized variables and all the "possibly uninitialized" compiler warnings.

mjwitte · 2014-10-23T17:13:02Z

@DeadParrot What kind of performance impact is there? Without enabling
these exceptions in release builds, it seems possible that bad values
could slip through without getting caught in a user's simulation. It's
nice to say we'll fix all the possible uninitialized warnings, but there
will always be risks of divide by zero from unforeseen issues.

On 10/23/2014 11:00 AM, Stuart Mentzer wrote:

so in debug Linux builds those events will get trapped. The
|FE_INVALID| should halt on a NAN that is caused by other than divide
by zero bugs (uninitialized variables, stomping on or reading garbage
memory via a bounds error, etc.) do not cause an immediate abort. The
reason this isn't on for release builds is the performance impact.

DeadParrot · 2014-10-23T18:19:58Z

@mjwitte the performance impact depends on the extent that the hot spots of your code are floating point operations. For EnergyPlus this may be "not much" now but as we reduce the bookkeeping overhead it should, ideally, increase. My understanding is that the FP state flags are set in h/w and thus not a performance impact: it is the checking of that state that takes some clock cycles. More seriously, enabling floating point exception can cause some operations that could be done in parallel to be serialized, which is not the direction we want to move in. But we could always enable the exceptions and measure the performance impact for release builds.

I suggest that the correct approach is that any NAN appearing is a programming bug and thus should be caught during testing (of debug builds with FP exceptions enabled). All divisions should be protected: by logic if a zero is possible and by an assert if it should never happen. Until we are able to catch up on our QA and coding practices then I guess having FP exceptions caught in the release code could be beneficial in that it prevents the user from being presented wrong results, but usually the NANs propagate through to the results such that the problem is clear. I think of the FP exception information as diagnostic for the developer, not the end user: they don't fix anything they just help you localize the bug and prevent the program from continuing with invalid quantities.

In my debug builds I always enable initialization by odd/NAN values so that you are more likely to catch use of uninitialized variables during testing. The arrays do bounds checking in debug builds too. With that plus the FP exceptions and good test case coverage we should be able to eliminate these issues in the code.

rraustad · 2014-10-23T18:49:19Z

...I always enable initialization by odd/NAN values...

I thought CMake was setting the switches?

On 2014-10-23 14:19, Stuart Mentzer wrote:

@mjwitte [1] the performance impact depends on the extent that the hot
spots of your code are floating point operations. For EnergyPlus this
may be "not much" now but as we reduce the bookkeeping overhead it
should, ideally, increase. My understanding is that the FP state flags
are set in h/w and thus not a performance impact: it is the checking
of that state that takes some clock cycles. More seriously, enabling
floating point exception can cause some operations that could be done
in parallel to be serialized, which is not the direction we want to
move in. But we could always enable the exceptions and measure the
performance impact for release builds.

I suggest that the correct approach is that any NAN appearing is a
programming bug and thus should be caught during testing (of debug
builds with FP exceptions enabled). All divisions should be protected:
by logic if a zero is possible and by an assert if it should never
happen. Until we are able to catch up on our QA and coding practices
then I guess having FP exceptions caught in the release code could be
beneficial in that it prevents the user from being presented wrong
results, but usually the NANs propagate through to the results such
that the problem is clear. I think of the FP exception information as
diagnostic for the developer, not the end user: they don't fix
anything they just help you localize the bug and prevent the program
from continuing with invalid quantities.

In my debug builds I always enable initialization by odd/NAN values so
that you are more likely to catch use of uninitialized variables
during testing. The arrays do bounds checking in debug builds too.
With that plus the FP exceptions and good test case coverage we should
be able to eliminate these issues in the code.

Reply to this email directly or view it on GitHub [2].

Links:

[1] https://github.com/mjwitte
[2]
#4557 (comment)

DeadParrot · 2014-10-23T20:27:27Z

@rraustad Yes, CMake is setting the switches for the EnergyPlus builds. I was referring to what I do for projects in general. I checked the CMake compiler switches and it looks like all of the settings I used wrt initialization are in there. GFortran has some nice -finit options for scalars that g++ doesn't, unfortunately, so we don't have full coverage of all variables, which is why fixing all the uninitialized warnings is important.

mjwitte · 2015-02-26T19:49:14Z

Also helpdesk ticket 10301, user file showing similar symptom but that file does not use unitarysystem or multispeed coil - it's just ideal loads, so probably a different bug.

mjwitte · 2015-02-27T23:41:27Z

@rraustad So, the divide by zero error occurs in DXCoils.cc CalcMultiSpeedDXCoilCooling (currently line 9807): hDelta = TotCapLS / AirMassFlow;
Because the incoming massflows are not consistent. AirMassFlow is >0 when this routine is entered, but MSHPMassFlowRateLow and MSHPMassFlowRateHigh are both zero. At the same time CompOn=0 but CycRatio = 1.0. With CycRatio > 0.0, for this particular case AirMassFlow gets set to MSHPMassFlowRateLow (which is zero) and then is used to check airflow/capacity and is used in a denominator.

So, seems we should trap this right at the top of CalcMultiSpeedDXCoilCooling and if AirMassFlow>0 but MSHPMassFlowRateLow and MSHPMassFlowRateHigh are zero, then stop, leave, or set MSHPMassFlowRateLow=AirMassFlow and MSHPMassFlowRateHigh=AirMassFlow?

I realize this doesn't really solve the issue of what it means to use setpoint based controls with a multispeed coil (is this even a valid combination - how do you determine flow rate?), but it protects against divide by zero.

rraustad · 2015-02-28T22:17:24Z

MSHPMassFlowRateLow and High need to be set by the parent each time the child is called if the speed changes. If these are 0 while flow is non-zero, that's a bug. This is done in SetOnOffMassFlowRate or some similar routine (e.g. SetVSHPAirFlow in Furnaces).

rraustad · 2015-02-28T22:19:40Z

CompOn = 0? This subroutine doesn't act on CompOn? Should have bypassed calcs to a coil off condition if CompOn is actually used for this child.

… assignments for MSHPMassFlowRateLow and MSHPMassFlowRateHigh to prevent divide by zero in CalcMultiSpeedDXCoilCooling/Heating.

rraustad · 2015-03-03T18:08:29Z

How did you find this? Somehow break when NAN is first found? I'd hate to think it was single stepping through the code.

mjwitte · 2015-03-03T19:45:04Z

@rraustad Well, you had already started pointing in the right direction, but yes it took a while. There was a lot of single-stepping, adding watches, and adding breaks and conditional breaks, then working back up the logic to add earlier breaks. I wish I knew how to break on NAN - is that possible?

DeadParrot · 2015-03-03T20:48:12Z

@mjwitte The debug build on Linux should be trapping NANs with the feenableexcept setup and the -fsignaling-nans. That should let you backtrace in gdb to find the cause.

If you are on VC then someone needs to add the equivalent _controlfp setup.

mjwitte added EnergyPlus PriorityLow This defect or feature has been declared low priority SeverityHigh This defect is high severity, generally indicating a hard crash without a workaround labels Oct 22, 2014

mjwitte mentioned this issue Nov 18, 2014

Adaptive convection hard crash #4591

Closed

mjwitte added a commit that referenced this issue Mar 1, 2015

Fix issue #4557 HVACUnitarySystem SetOnOffMassFlowRate to add missing…

44b4fac

… assignments for MSHPMassFlowRateLow and MSHPMassFlowRateHigh to prevent divide by zero in CalcMultiSpeedDXCoilCooling/Heating.

mjwitte mentioned this issue Mar 1, 2015

FixHVACUnitarySystem SetOnOffMassFlowRate NAN problem #4761

Merged

jasondegraw assigned mjwitte Mar 11, 2015

mjwitte mentioned this issue Mar 13, 2015

add for windows /fp:except #4795

Open

mjwitte closed this as completed in #4761 Mar 21, 2015

mjwitte mentioned this issue Aug 31, 2015

User file failing with NAN values in error DualSetPointWithDeadBand: Unanticipated combination of heating and cooling loads #5160

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User file fails with NAN in predicted load #4557

User file fails with NAN in predicted load #4557

mjwitte commented Oct 22, 2014

rraustad commented Oct 23, 2014

rraustad commented Oct 23, 2014

rraustad commented Oct 23, 2014

rraustad commented Oct 23, 2014

mjwitte commented Oct 23, 2014

EnergyArchmage commented Oct 23, 2014

Myoldmopar commented Oct 23, 2014

kbenne commented Oct 23, 2014

DeadParrot commented Oct 23, 2014

mjwitte commented Oct 23, 2014

DeadParrot commented Oct 23, 2014

rraustad commented Oct 23, 2014

Links:

DeadParrot commented Oct 23, 2014

mjwitte commented Feb 26, 2015

mjwitte commented Feb 27, 2015

rraustad commented Feb 28, 2015

rraustad commented Feb 28, 2015

rraustad commented Mar 3, 2015

mjwitte commented Mar 3, 2015

DeadParrot commented Mar 3, 2015

User file fails with NAN in predicted load #4557

User file fails with NAN in predicted load #4557

Comments

mjwitte commented Oct 22, 2014

rraustad commented Oct 23, 2014

rraustad commented Oct 23, 2014

rraustad commented Oct 23, 2014

rraustad commented Oct 23, 2014

mjwitte commented Oct 23, 2014

EnergyArchmage commented Oct 23, 2014

Myoldmopar commented Oct 23, 2014

kbenne commented Oct 23, 2014

DeadParrot commented Oct 23, 2014

mjwitte commented Oct 23, 2014

DeadParrot commented Oct 23, 2014

rraustad commented Oct 23, 2014

Links:

DeadParrot commented Oct 23, 2014

mjwitte commented Feb 26, 2015

mjwitte commented Feb 27, 2015

rraustad commented Feb 28, 2015

rraustad commented Feb 28, 2015

rraustad commented Mar 3, 2015

mjwitte commented Mar 3, 2015

DeadParrot commented Mar 3, 2015