Skip to content
Geof Sawaya edited this page May 21, 2014 · 4 revisions

This page tells the story of the evolving performance of EnergyPlus.

Here we'll describe the internal performance characteristics of simulations at various commit points in the develop branch.

We can consider various models, especially those that highlight a feature of EnergyPlus that we're working to make faster.

The details included are profiler output and raw execution time.


This section describes the performance of the model prj10. This is a model that takes considerable time for simulation complete, and has many surfaces (60k). It is especially interesting with regard to interior radiant exchange and solar shading.

Code

This version is where the following optimizations are applied to the initial C++ release:

  • a rewrite of WriteSurfaceShadowing, using techniques of loop combining and eliminating redundant checks
  • a rewrite of CalcScriptF, replacing the CalcMatrixInverse method for solving the linear system with an external library call (which uses an LU decomposition solving method)
  • parallelization of CalcScriptF (including calls to CalcScriptF)
  • some data structure optimization (for cache performance improvement) -- extracting certain data items from existing (large) structures to form smaller structures, and promoting others to their own arrays (i.e. AOS, or array of structures, to SOA, or structure of arrays)

gprof dot call graph

Here's the gprof data:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ks/call  Ks/call  name    
 38.67   2290.72  2290.72 6847570739     0.00     0.00  ObjexxFCL::Fstring::has_prefix(ObjexxFCL::Fstring const&, bool) const
 13.14   3069.16   778.44  2414833     0.00     0.00  EnergyPlus::HVACDXHeatPumpSystem::GetDXHeatPumpSystemInput()
  9.95   3658.58   589.42 13105005332     0.00     0.00  EnergyPlus::SolarShading::ComputeIntSolarAbsorpFactors()
  4.49   3924.61   266.03  2974154     0.00     0.00  EnergyPlus::SolarShading::DeterminePolygonOverlap(int, int, int)
  4.11   4168.05   243.44                             EnergyPlus::SolarShading::CTRANS(int, int, int&, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>)
  3.53   4377.29   209.24 7918841234     0.00     0.00  ObjexxFCL::Fstring::lstrip_whitespace()
  3.24   4569.48   192.19   703666     0.00     0.00  EnergyPlus::HVACStandAloneERV::GetStandAloneERV()
  2.82   4736.80   167.32 7839344897     0.00     0.00  ObjexxFCL::Fstring::Fstring(unsigned long, ObjexxFCL::Fstring const&)
  2.23   4868.90   132.10 3534730154     0.00     0.00  ObjexxFCL::operator<=(ObjexxFCL::Fstring const&, std::string const&)
  1.33   4947.61    78.71 12656954825     0.00     0.00  ObjexxFCL::MArrayR<ObjexxFCL::FArray1<EnergyPlus::AirflowNetworkBalanceManager::AirflowNetworkReportVars>, double, 1>::~MArrayR()
  1.32   5025.74    78.13                             EnergyPlus::SolarShading::HTRANS(int, int, int)
  0.98   5084.03    58.29                             EnergyPlus::InputProcessor::GetNumSectionsinInput()
  0.97   5141.56    57.53 3972206288     0.00     0.00  ObjexxFCL::FArray1D<EnergyPlus::IceThermalStorage::DetailedIceStorageData>::clear()
  0.88   5193.97    52.41       14     0.00     0.01  EnergyPlus::InputProcessor::GetListofSectionsinInput(ObjexxFCL::FArray1S<ObjexxFCL::Fstring>, int&)
  0.84   5243.52    49.55                             ObjexxFCL::MArrayR<ObjexxFCL::FArray1<EnergyPlus::DataAirflowNetwork::AirflowNetworkLinkSimuData>, double, 1>::~MArrayR()
  0.78   5289.47    45.95 3660157438     0.00     0.00  EnergyPlus::SolarShading::CLIP(int, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>)
  0.71   5331.28    41.81 509168817     0.00     0.00  ObjexxFCL::operator<(ObjexxFCL::Fstring const&, std::string const&)
  0.67   5370.98    39.70 18894861285     0.00     0.00  ObjexxFCL::FArray1D<int>::operator=(ObjexxFCL::FArray1D<int> const&)
  0.65   5409.22    38.24 365951889     0.00     0.00  ObjexxFCL::FArray1D<EnergyPlus::SolarReflectionManager::SolReflRecSurfData>::clear()
  0.60   5444.58    35.37                             ObjexxFCL::MArrayR<ObjexxFCL::FArray1<EnergyPlus::DataAirflowNetwork::AirflowNetworkLinkReportData>, double, 1>::~MArrayR()
  0.57   5478.33    33.75 3146933334     0.00     0.00  EnergyPlus::SolarShading::CHKGSS(int, int, double, bool&)
  0.57   5511.92    33.59 3660161018     0.00     0.00  EnergyPlus::SolarShading::CalcInteriorSolarDistribution()
  0.52   5542.61    30.69        1     0.03     2.27  EnergyPlus::SolarShadi

As you can see with this profile, the WriteSurfaceShadowing and CalcInteriorRadExchange components are reduced off the chart, with string operations and SolarShading dominating the execution time.

This gave a 1.57x improvement over the initial C++ code.

  • Execution time: 12586.21 user 11.37 system 2:55:36 elapsed

Code

This milestone is where std::string was introduced to replace the fixed length string objects.

gprof dot call graph

Top of the gprof output file:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ks/call  Ks/call  name    
 25.36   1538.32  1538.32        1     1.54     1.54  EnergyPlus::OutputReportTabular::WriteSurfaceShadowing()
 13.58   2361.93   823.61      320     0.00     0.00  EnergyPlus::HeatBalanceIntRadExchange::CalcMatrixInverse(ObjexxFCL::FArray2S<double>, ObjexxFCL::FArray2S<double>)
 12.47   3118.25   756.32     4859     0.00     0.00  EnergyPlus::HeatBalanceIntRadExchange::CalcInteriorRadExchange(ObjexxFCL::FArray1S<double>, int, ObjexxFCL::FArray1S<double>, ObjexxFCL::Optional<int const, void>, ObjexxFCL::Optional<std::string, void>)
  8.41   3628.06   509.81 112717320     0.00     0.00  ObjexxFCL::FArray1S<double>& ObjexxFCL::FArray1S<double>::operator-=<ObjexxFCL::FArray1D>(ObjexxFCL::FArray1D<double> const&)
  7.67   4093.20   465.14      400     0.00     0.00  EnergyPlus::HeatBalanceIntRadExchange::CalcScriptF(int, ObjexxFCL::FArray1A<double>, ObjexxFCL::FArray2A<double>, ObjexxFCL::FArray1A<double>, ObjexxFCL::FArray2A<double>)
  6.19   4468.64   375.44 3660157438     0.00     0.00  EnergyPlus::SolarShading::CLIPPOLY(int, int, int, int, int&)
  4.24   4725.95   257.31  1861134     0.00     0.00  EnergyPlus::SolarShading::SHDGSS(int, int, int, int, int, int)
  3.50   4938.40   212.45 3149207309     0.00     0.00  EnergyPlus::SolarShading::HTRANS1(int, int)
  3.00   5120.31   181.91 3148800670     0.00     0.00  EnergyPlus::SolarShading::CTRANS(int, int, int&, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>)
  3.00   5302.18   181.87 6806997353     0.00     0.00  EnergyPlus::SolarShading::HTRANS0(int, int)
  1.95   5420.25   118.07     1158     0.00     0.00  EnergyPlus::CalcHeatBalanceInsideSurf(ObjexxFCL::Optional<int const, void>)
  1.24   5495.30    75.05 299621174     0.00     0.00  EnergyPlus::HeatBalanceMovableInsulation::EvalInsideMovableInsulation(int, double&, double&)
  1.17   5566.02    70.72 3146933334     0.00     0.00  EnergyPlus::SolarShading::CLIP(int, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>)
  0.83   5616.27    50.25 978279264     0.00     0.00  EnergyPlus::InputProcessor::SameString(std::string const&, std::string const&)
  0.64   5654.88    38.61 18887203420     0.00     0.00  ObjexxFCL::FArray1S<double>::FArray1S<ObjexxFCL::FArray1D>(ObjexxFCL::FArray1D<double> const&)
  0.63   5692.99    38.11 3660161018     0.00     0.00  EnergyPlus::SolarShading::DeterminePolygonOverlap(int, int, int)
  0.56   5727.20    34.21 365708252     0.00     0.00  EnergyPlus::SolarShading::CHKGSS(int, int, double, bool&)
  0.51   5758.21    31.01    45850     0.00     0.00  EnergyPlus::InputProcessor::GetObjectItem(std::string const&, int, ObjexxFCL::FArray1S<std::string>, int&, ObjexxFCL::FArray1S<double>, int&, int&, ObjexxFCL::Optional<ObjexxFCL::FArray1<bool>, void>, ObjexxFCL::Optional<ObjexxFCL::FArray1<bool>, void>, ObjexxFCL::Optional<ObjexxFCL::FArray1<std::string>, void>, ObjexxFCL::Optional<ObjexxFCL::FArray1<std::string>, void>)
  0.46   5786.10    27.89        1     0.03     0.06  EnergyPlus::SolarShading::DetermineShadowingCombinations()
  0.44   5812.51    26.41     1158     0.00     0.00  EnergyPlus::HeatBalanceSurfaceManager::UpdateThermalHistories()
  0.30   5830.94    18.43        1     0.02     0.06  EnergyPlus::SurfaceGeometry::GetSurfaceData(bool&)
  0.25   5845.95    15.01       80     0.00     0.00  EnergyPlus::HeatBalanceIntRadExchange::FixViewFactors(int, ObjexxFCL::FArray1A<double>, ObjexxFCL::FArray2A<double>, int, double&, double&, double&, int&, double&)

Just scanning the grof output shows that the len_trim problem has gone away. This is more similar to the baseline now (profile wise and in execution time).

The improvement over the initial C++ version is 2.13x!

  • Execution time: 2:09:39 real,7751.81 user, 6.80 sys

Code

This is the first code from the C++ translation.

gprof dot call graph

Top of the gprof output file:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ks/call  Ks/call  name    
 27.56   2549.74  2549.74 6841883584     0.00     0.00  ObjexxFCL::Fstring::len_trim() const
 12.39   3695.44  1145.70        1     1.15     1.18  EnergyPlus::OutputReportTabular::WriteSurfaceShadowing()
  8.90   4518.48   823.04      320     0.00     0.00  EnergyPlus::HeatBalanceIntRadExchange::CalcMatrixInverse(ObjexxFCL::FArray2S<double>, ObjexxFCL::FArray2S<double>)
  8.45   5300.38   781.90     4859     0.00     0.00  EnergyPlus::HeatBalanceIntRadExchange::CalcInteriorRadExchange(ObjexxFCL::FArray1S<double>, int, ObjexxFCL::FArray1S<double>, ObjexxFCL::Optional<int const, void>, ObjexxFCL::Optional<ObjexxFCL::Fstring, void>)
  5.53   5811.70   511.32 112717320     0.00     0.00  ObjexxFCL::FArray1S<double>& ObjexxFCL::FArray1S<double>::operator-=<ObjexxFCL::FArray1D>(ObjexxFCL::FArray1D<double> const&)
  5.02   6275.70   464.00      400     0.00     0.00  EnergyPlus::HeatBalanceIntRadExchange::CalcScriptF(int, ObjexxFCL::FArray1A<double>, ObjexxFCL::FArray2A<double>, ObjexxFCL::FArray1A<double>, ObjexxFCL::FArray2A<double>)
  4.07   6652.27   376.57 3660157438     0.00     0.00  EnergyPlus::SolarShading::CLIPPOLY(int, int, int, int, int&)
  3.07   6935.94   283.67  1861134     0.00     0.00  EnergyPlus::SolarShading::SHDGSS(int, int, int, int, int, int)
  2.41   7158.72   222.78 3149207309     0.00     0.00  EnergyPlus::SolarShading::HTRANS1(int, int)
  2.19   7361.41   202.69 7918649184     0.00     0.00  ObjexxFCL::operator==(ObjexxFCL::Fstring const&, ObjexxFCL::Fstring const&)
  2.16   7561.37   199.96     1158     0.00     0.00  EnergyPlus::CalcHeatBalanceInsideSurf(ObjexxFCL::Optional<int const, void>)
  2.01   7747.50   186.13 3148800670     0.00     0.00  EnergyPlus::SolarShading::CTRANS(int, int, int&, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>)
  1.92   7925.36   177.86 6806997353     0.00     0.00  EnergyPlus::SolarShading::HTRANS0(int, int)
  1.89   8100.30   174.94 3954143273     0.00     0.00  EnergyPlus::InputProcessor::MakeUPPERCase(ObjexxFCL::Fstring const&)
  1.80   8266.52   166.22 3528901221     0.00     0.00  ObjexxFCL::Fstring::operator()(unsigned long, unsigned long) const
  1.61   8415.59   149.07 12650164270     0.00     0.00  ObjexxFCL::Fstring::~Fstring()
  1.49   8553.05   137.46 7838745677     0.00     0.00  ObjexxFCL::Fstring::reassign(ObjexxFCL::Fstring const&)
  0.65   8613.07    60.02 3146933334     0.00     0.00  EnergyPlus::SolarShading

With the profile data, you can see that it is very similar to the baseline (Fortran 8.0) but with a few extra things inserted. The major issue that appears is the first item in the flat profile, ObjexxFCL::Fstring::len_trim(). This demonstrates the problem of fixed length strings. It also had a performance impact in the Fortran version, because most operations on the fixed length strings required searching for their length (sans padding spaces).

You can also see that the execution time is over double that of the baseline.

  • Execution time: 4:35:45 real,16486.17 user, 10.09 sys

This run serves as a performance baseline.
gprof dot call graph

Here are the top lines from the gprof flat profile:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ks/call  Ks/call  name    
 31.77   1685.43  1685.43        1     1.69     1.69  __outputreporttabular_MOD_writesurfaceshadowing
 17.91   2635.64   950.21      320     0.00     0.00  __heatbalanceintradexchange_MOD_calcmatrixinverse
 10.97   3217.37   581.73     4859     0.00     0.00  __heatbalanceintradexchange_MOD_calcinteriorradexchange
  9.58   3725.51   508.14 3660157438     0.00     0.00  __solarshading_MOD_clippoly
  5.15   3998.57   273.06  1861134     0.00     0.00  __solarshading_MOD_shdgss
  3.79   4199.41   200.84 3148800670     0.00     0.00  __solarshading_MOD_ctrans
  3.62   4391.70   192.29     1158     0.00     0.00  calcheatbalanceinsidesurf_
  2.82   4541.42   149.72 3149207309     0.00     0.00  __solarshading_MOD_htrans1
  2.66   4682.26   140.84 6806997353     0.00     0.00  __solarshading_MOD_htrans0
  2.09   4793.23   110.97 2637023953     0.00     0.00  __inputprocessor_MOD_makeuppercase
  1.22   4858.17    64.94   242717     0.00     0.00  __outputreportpredefined_MOD_incrementtableentry
  1.20   4921.93    63.76 3146933334     0.00     0.00  __solarshading_MOD_clip
  0.88   4968.41    46.48        1     0.05     0.08  __solarshading_MOD_determineshadowingcombinations
  0.76   5008.89    40.48 3660161018     0.00     0.00  __solarshading_MOD_determinepolygonoverlap

You can see from the profile that the top processor cycle consumers are WriteSurfaceShadowing, CalcInteriorRadiantExchange (and its descendant in the call graph, CalcMatrixInverse, which is used in the routine CalcScriptF).

  • Execution time: 2:01:54 real,7252.78 user, 30.16 sys
Clone this wiki locally