Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trial tracemalloc as memory tracker (replacement v2 PR) #5946

Closed
wants to merge 4 commits into from

Conversation

pp-mo
Copy link
Member

@pp-mo pp-mo commented May 9, 2024

WIP DO NOT MERGE THIS, EVER

Some modified code for memory benchmarks, evaluating "tracemalloc" against our existing Linux process-RSS measuring technique.

Additional context : sample output from a desktop run

$ asv run -b Memcheck* --quick -e -E existing:/tmp/persistent/newconda-envs/iris3/bin/python 
Couldn't load asv.plugins._mamba_helpers because
No module named 'libmambapy'
· Ignoring ASV setting(s): `python`. Benchmark environment management is delegated to third party script(s).
· Discovering benchmarks
· Running 4 total benchmarks (1 commits * 1 environments * 4 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_tmp_persistent_newconda-envs_iris3_bin_python
[ 0.00%] ··· Importing benchmark suite produced output:
[ 0.00%] ···· /net/home/h05/itpp/git/iris/iris_main/benchmarks/benchmarks/generate_data/__init__.py:52: UserWarning: No BENCHMARK_DATA env var, defaulting to /net/home/h05/itpp/git/iris/iris_main/benchmarks/.data. Note that some benchmark files are GB in size.
              warn(message)

[12.50%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc                                                                                                                                 ok
[12.50%] ··· ========= ========== ========= =========
             --                   nworkers           
             --------- ------------------------------
              nblocks      1          4         9    
             ========= ========== ========= =========
                 1      568±0ms    543±0ms   514±0ms 
                 4      144±0ms    181±0ms   151±0ms 
                 9      55.9±0ms   116±0ms   109±0ms 
             ========= ========== ========= =========

[25.00%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc                                                                                                                         ok
[25.00%] ··· ========= ======= ======= =======
             --                nworkers       
             --------- -----------------------
              nblocks     1       4       9   
             ========= ======= ======= =======
                 1      305.3   305.3   305.3 
                 4       19.2    76.4    76.4 
                 9       4.0     15.2    33.9 
             ========= ======= ======= =======

[37.50%] ··· memtrace_evaluation.MemcheckRunstyles.time_calc                                                                                                                                        ok
[37.50%] ··· ============= ================= ================= =================== ===================
             --                                          runtype / ysize                              
             ------------- ---------------------------------------------------------------------------
                measure     threads / 10000   threads / 40000   processes / 10000   processes / 40000 
             ============= ================= ================= =================== ===================
              tracemalloc       91.2±0ms          228±0ms            40.2±0ms            82.4±0ms     
                  rss           87.0±0ms          222±0ms            45.5±0ms            81.5±0ms     
             ============= ================= ================= =================== ===================

[50.00%] ··· memtrace_evaluation.MemcheckRunstyles.track_addmem_calc                                                                                                                                ok
[50.00%] ··· ============= ================= ================= =================== ===================
             --                                          runtype / ysize                              
             ------------- ---------------------------------------------------------------------------
                measure     threads / 10000   threads / 40000   processes / 10000   processes / 40000 
             ============= ================= ================= =================== ===================
              tracemalloc         17.0              67.9               0.2                 0.2        
                  rss             9.1               34.4               0.9                 0.9        
             ============= ================= ================= =================== ===================

@pp-mo
Copy link
Member Author

pp-mo commented May 9, 2024

What I think we can deduce from the benchmark results

  1. neither "rss" or "tracemalloc" tracks memory in subprocesses
    • compare MemcheckRunstyles.track_addmem_calc , any "threads" with "processes"
  2. "rss" method is not picking up on memory used in multiple threads, but "tracemalloc" is
    • compare MemcheckRunstyles.track_addmem_calc , "threads" results ..
    • .. these are ~2 times as big for "tracemalloc" as for "rss" (this for nblocks=6, nworkers=4)

@pp-mo pp-mo reopened this May 9, 2024
Copy link

codecov bot commented May 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.78%. Comparing base (a56f939) to head (84ff4ad).
Report is 9 commits behind head on main.

Current head 84ff4ad differs from pull request most recent head 8f7166e

Please upload reports for the commit 8f7166e to get more accurate results.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5946   +/-   ##
=======================================
  Coverage   89.78%   89.78%           
=======================================
  Files          93       93           
  Lines       23007    23007           
  Branches     5017     5017           
=======================================
  Hits        20657    20657           
  Misses       1620     1620           
  Partials      730      730           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pp-mo
Copy link
Member Author

pp-mo commented May 9, 2024

Updated ideas

@trexfeathers pointed out that the above were run on a machine with only 4 CPUs (hyperthread), hence nworkers=4/9 can be expected to be questionable.
Also a bit difficult to interpret -- questioned whether the nworkers/nblocks axes were correct !

Hence changes in d039ad0

  • n_workers defaults to 3
  • n_workers = [1, 2, 3, 4] in BlocksAndWorkers testing
  • MemcheckBlocksAndWorkers_Rss repeats the n/n matrix with the Rss measure

New results

typical run ...

$ asv run -b Memcheck --quick -e -E existing:/tmp/persistent/newconda-envs/iris3/bin/python
---
    Couldn't load asv.plugins._mamba_helpers because
    No module named 'libmambapy'
    · Ignoring ASV setting(s): `python`. Benchmark environment management is delegated to third party script(s).
    · Discovering benchmarks
    · Running 6 total benchmarks (1 commits * 1 environments * 6 benchmarks)
    [ 0.00%] ·· Benchmarking existing-py_tmp_persistent_newconda-envs_iris3_bin_python
    [ 0.00%] ··· Importing benchmark suite produced output:
    [ 0.00%] ···· /net/home/h05/itpp/git/iris/iris_main/benchmarks/benchmarks/generate_data/__init__.py:52: UserWarning: No BENCHMARK_DATA env var, defaulting to /net/home/h05/itpp/git/iris/iris_main/benchmarks/.data. Note that some benchmark files are GB in size.
                  warn(message)

    [ 8.33%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc                                                                                                                                 ok
    [ 8.33%] ··· ========= ========== ========== ========= =========
                 --                         nworkers
                 --------- -----------------------------------------
                  nblocks      1          2          3         4
                 ========= ========== ========== ========= =========
                     1      546±0ms    508±0ms    518±0ms   517±0ms
                     4      120±0ms    156±0ms    151±0ms   139±0ms
                     9      54.6±0ms   71.2±0ms   141±0ms   122±0ms
                 ========= ========== ========== ========= =========

    [16.67%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc                                                                                                                         ok
    [16.67%] ··· ========= ======= ======= ======= =======
                 --                    nworkers
                 --------- -------------------------------
                  nblocks     1       2       3       4
                 ========= ======= ======= ======= =======
                     1      305.3   305.3   305.3   305.3
                     4       19.2    38.3    57.3    76.4
                     9       4.0     7.7     11.5    15.2
                 ========= ======= ======= ======= =======

    [25.00%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc                                                                                                                             ok
    [25.00%] ··· ========= ========== ========== ========== ==========
                 --                          nworkers
                 --------- -------------------------------------------
                  nblocks      1          2          3          4
                 ========= ========== ========== ========== ==========
                     1      501±0ms    518±0ms    570±0ms    597±0ms
                     4      121±0ms    150±0ms    132±0ms    161±0ms
                     9      52.4±0ms   67.5±0ms   64.3±0ms   64.0±0ms
                 ========= ========== ========== ========== ==========

    [33.33%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc                                                                                                                     ok
    [33.33%] ··· ========= ======= ======= ======= =======
                 --                    nworkers
                 --------- -------------------------------
                  nblocks     1       2       3       4
                 ========= ======= ======= ======= =======
                     1      305.8   305.8   305.8   305.8
                     4       19.8    38.9    20.0    20.8
                     9       4.6     8.0     11.6    15.3
                 ========= ======= ======= ======= =======

    [41.67%] ··· memtrace_evaluation.MemcheckRunstyles.time_calc                                                                                                                                        ok
    [41.67%] ··· ============= ================= ================= =================== ===================
                 --                                          runtype / ysize
                 ------------- ---------------------------------------------------------------------------
                    measure     threads / 10000   threads / 40000   processes / 10000   processes / 40000
                 ============= ================= ================= =================== ===================
                  tracemalloc       97.5±0ms          208±0ms            39.9±0ms            94.1±0ms
                      rss           49.5±0ms          203±0ms            51.5±0ms            91.6±0ms
                 ============= ================= ================= =================== ===================

    [50.00%] ··· memtrace_evaluation.MemcheckRunstyles.track_addmem_calc                                                                                                                                ok
    [50.00%] ··· ============= ================= ================= =================== ===================
                 --                                          runtype / ysize
                 ------------- ---------------------------------------------------------------------------
                    measure     threads / 10000   threads / 40000   processes / 10000   processes / 40000
                 ============= ================= ================= =================== ===================
                  tracemalloc         12.8              51.0               0.2                 0.2
                      rss             13.2              51.1               0.8                 0.8
                 ============= ================= ================= =================== ===================

Some interpretation

RunStyle checks are now more understandable.

MemcheckRunstyles.time_calc (run **WITHOUT** --quick)
      --                                          runtype / ysize
      ------------- ---------------------------------------------------------------------------
         measure     threads / 10000   threads / 40000   processes / 10000   processes / 40000
      ============= ================= ================= =================== ===================
       tracemalloc       58.2±8ms          179±5ms            44.3±1ms            86.4±3ms
           rss           71.1±6ms          171±8ms            43.7±3ms            91.2±3ms
      ============= ================= ================= =================== ===================

BUT N.B. results are still not very stable from time to time

      ============= ================= ================= =================== ===================
       tracemalloc       62.9±5ms          172±10ms           41.8±4ms            89.1±6ms
           rss           62.3±7ms          173±9ms            49.1±4ms           91.3±0.9ms
      ============= ================= ================= =================== ===================
       tracemalloc       58.2±3ms          152±20ms           44.9±5ms            91.8±5ms
           rss          58.7±10ms          170±7ms            46.1±2ms            91.6±7ms
      ============= ================= ================= =================== ===================

((but generally, ARE about the same for both methods, as we would expect))
conclude :

  • ~same for methods
  • processes slightly quicker than threads generally (??odd?)

MemcheckBlocksAndWorkers

General picture

memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc
     --                         nworkers
     --------- -----------------------------------------
      nblocks      1          2          3         4
     ========= ========== ========== ========= =========
         1      546±0ms    508±0ms    518±0ms   517±0ms
         4      120±0ms    156±0ms    151±0ms   139±0ms
         9      54.6±0ms   71.2±0ms   141±0ms   122±0ms
     ========= ========== ========== ========= =========

memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc
     --                    nworkers
     --------- -------------------------------
      nblocks     1       2       3       4
     ========= ======= ======= ======= =======
         1      305.3   305.3   305.3   305.3
         4       19.2    38.3    57.3    76.4
         9       4.0     7.7     11.5    15.2
     ========= ======= ======= ======= =======

MEMORY now makes more sense : can see memory reducing with nblocks, and increasing with nworkers
TIME is still weird -- always seems to take longer with more workers ??

compare methods

memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc
     --                    nworkers
     --------- -------------------------------
      nblocks     1       2       3       4
     ========= ======= ======= ======= =======
         1      305.8   305.8   305.8   305.8
         4       19.8    38.9    20.0    20.8
         9       4.6     8.0     11.6    15.3
     ========= ======= ======= ======= =======

conclude

  • results are mostly similar,
  • ... except RSS tends to under-read ..
  • .. and EXCEPT for nblocks=4
    * TYPICAL (see e.g. below)
/nworkers 1 2 3 4
tracemalloc 19.2 38.3 57.3 76.4
rss 19.8 38.9 20.0 20.8
  • POSSIBLY
    • RSS shows re-use of blocks within the operation ?
    • .. or RSS "misses" some pages ?
  • generally, it is NOT like RSS is only tracking 1 worker
    • since, with many blocks, mem-usage generally grows with nworkers, just as for tracemalloc

@pp-mo
Copy link
Member Author

pp-mo commented May 9, 2024

Outcome

  • methods are ~equivalent in scope,
    • both can measure across threads
    • neither can measure across processes
  • but tracemalloc is probably more reliable
    • rss results are less consistent on repeat
    • rss seems to under-read in some cases

@pp-mo pp-mo closed this May 9, 2024
@pp-mo pp-mo mentioned this pull request May 10, 2024
nblocks = params["nblocks"]
nworkers = params["nworkers"]

nyfull = ysize // nblocks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're applying this division twice, I think that's why adding blocks has such an exagerated effect on performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes I see !
Hopefully that may explain the peculiar behaviour of the timings too.

I will fix this and re-investigate ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

84ff4ad fixes, I think

@pp-mo pp-mo reopened this May 16, 2024
@pp-mo pp-mo added the benchmark_this Request that this pull request be benchmarked to check if it introduces performance shifts label May 16, 2024
@pp-mo
Copy link
Member Author

pp-mo commented May 16, 2024

Full results, from new code introduced in 84ff4ad
from $ asv run -b "Memcheck*" -e -E existing:<iris-test-python-path>/bin/python
also with export DATA_GEN_PYTHON=<iris-test-python-path>/bin/python
and all the numpy threading control settings given here in the Iris 'Dask Best Practice' section

$ asv run -b "Memcheck*" -e -E existing:/tmp/persistent/newconda-envs/iris3/bin/python
 . . .
· Discovering benchmarks
· Running 8 total benchmarks (1 commits * 1 environments * 8 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_tmp_persistent_newconda-envs_iris3_bin_python
[ 0.00%] ··· Importing benchmark suite produced output:
[ 0.00%] ···· /tmp/persistent/git/iris/benchmarks/benchmarks/generate_data/__init__.py:52: UserWarning: No BENCHMARK_DATA env var, defaulting to /tmp/persistent/git/iris/benchmarks/.data. Note that some benchmark files are GB in size.
              warn(message)

[ 6.25%] ··· Running (memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc--).
[18.75%] ··· Running (memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc--).
[31.25%] ··· Running (memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc--)..
[56.25%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc                                                                                                                       ok
[56.25%] ··· ========= ========== ========== ========= =========
             --                         nworkers                
             --------- -----------------------------------------
              nblocks      1          2          3         4    
             ========= ========== ========== ========= =========
                 1      531±5ms    538±8ms    532±7ms   531±6ms 
                 4      535±6ms    518±9ms    518±8ms   512±8ms 
                 9      539±10ms   514±10ms   515±7ms   520±7ms 
             ========= ========== ========== ========= =========

[62.50%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc                                                                                                               ok
[62.50%] ··· ========= ======= ======= ======= =======
             --                    nworkers           
             --------- -------------------------------
              nblocks     1       2       3       4   
             ========= ======= ======= ======= =======
                 1      305.3   305.3   305.3   305.3 
                 4       76.4   152.7   229.0   305.3 
                 9       34.1    68.0   101.9   135.8 
             ========= ======= ======= ======= =======

[68.75%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc                                                                                                                   ok
[68.75%] ··· ========= ========== ========== ========== ==========
             --                          nworkers                 
             --------- -------------------------------------------
              nblocks      1          2          3          4     
             ========= ========== ========== ========== ==========
                 1      547±20ms   545±20ms   563±20ms   555±20ms 
                 4      562±20ms   528±20ms   526±20ms   524±20ms 
                 9      561±20ms   528±20ms   529±20ms   516±9ms  
             ========= ========== ========== ========== ==========

[75.00%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc                                                                                                           ok
[75.00%] ··· ========= ======= ======= ======= =======
             --                    nworkers           
             --------- -------------------------------
              nblocks     1       2       3       4   
             ========= ======= ======= ======= =======
                 1      305.8   305.8   305.8   305.8 
                 4       77.0    83.6    82.3    82.3 
                 9       34.5    37.1    36.9    36.9 
             ========= ======= ======= ======= =======

[81.25%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc                                                                                                             ok
[81.25%] ··· ========= ========= ========= ========= =========
             --                        nworkers               
             --------- ---------------------------------------
              nblocks      1         2         3         4    
             ========= ========= ========= ========= =========
                 1      546±8ms   546±9ms   550±6ms   554±7ms 
                 4      544±5ms   283±2ms   287±3ms   178±4ms 
                 9      547±4ms   313±2ms   203±5ms   208±3ms 
             ========= ========= ========= ========= =========

[87.50%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc                                                                                                     ok
[87.50%] ··· ========= ===== ===== ===== =====
             --                nworkers       
             --------- -----------------------
              nblocks    1     2     3     4  
             ========= ===== ===== ===== =====
                 1      0.1   0.1   0.1   0.1 
                 4      0.2   0.2   0.2   0.2 
                 9      0.3   0.3   0.3   0.3 
             ========= ===== ===== ===== =====

[93.75%] ··· memtrace_evaluation.MemcheckRunstyles.time_calc                                                                                                                              ok
[93.75%] ··· ============= ================= ================= =================== ===================
             --                                          runtype / ysize                              
             ------------- ---------------------------------------------------------------------------
                measure     threads / 10000   threads / 40000   processes / 10000   processes / 40000 
             ============= ================= ================= =================== ===================
              tracemalloc       190±4ms          1.03±0.02s          117±2ms             389±4ms      
                  rss           190±4ms           1.04±0s            118±1ms             385±5ms      
             ============= ================= ================= =================== ===================

[100.00%] ··· memtrace_evaluation.MemcheckRunstyles.track_addmem_calc                                                                                                                      ok
[100.00%] ··· ============= ================= ================= =================== ===================
              --                                          runtype / ysize                              
              ------------- ---------------------------------------------------------------------------
                 measure     threads / 10000   threads / 40000   processes / 10000   processes / 40000 
              ============= ================= ================= =================== ===================
               tracemalloc         76.4             305.3               0.2                 0.2        
                   rss             76.6             109.3               0.8                 0.4        
              ============= ================= ================= =================== ===================

@pp-mo
Copy link
Member Author

pp-mo commented May 16, 2024

Conclusions:
Since we fixed the task-size scaling (and added a process-based BlocksAndWorkers), the results are now roughly :

  • Runstyle checks show that old + new memory schemes deliver roughly the same numbers
    • both methods can track memory in thread workers, but not subprocesses
    • (we find that tracemalloc delivers more stable answers -- but this was not re-retested)
  • threaded workers : space required reduces with nblocks + increases with nworkers, as you would expect
  • threaded workers : time taken is ~constant : this seems to indicate that due to Python GIL requirements, they don't actually run in parallel
  • parallel workers : time taken scales as expected (they do run in parallel, and task division gives speedups)

Copy link
Contributor

⏱️ Performance Benchmark Report: a3a20b5

Performance shifts

Full benchmark results

Benchmarks that have stayed the same:

| Change   | Before [95b7ffe5]    | After [a3a20b56]    | Ratio   | Benchmark (Parameter)                                                                                |
|----------|----------------------|---------------------|---------|------------------------------------------------------------------------------------------------------|
|          | 1.12±0.01ms          | 1.12±0.02ms         | 1.00    | cube.CubeCreation.time_create(False, 'construct')                                                    |
|          | 408±5μs              | 409±7μs             | 1.00    | cube.CubeCreation.time_create(False, 'instantiate')                                                  |
|          | 949±10μs             | 960±10μs            | 1.01    | cube.CubeCreation.time_create(True, 'construct')                                                     |
|          | 583±10μs             | 580±9μs             | 0.99    | cube.CubeCreation.time_create(True, 'instantiate')                                                   |
|          | 223±4ms              | 219±4ms             | 0.98    | cube.CubeEquality.time_equality(False, False, 'all_equal')                                           |
|          | 111±2ms              | 111±1ms             | 1.00    | cube.CubeEquality.time_equality(False, False, 'coord_inequality')                                    |
|          | 230±3ms              | 231±2ms             | 1.01    | cube.CubeEquality.time_equality(False, False, 'data_inequality')                                     |
|          | 16.5±0.2μs           | 17.2±0.4μs          | 1.05    | cube.CubeEquality.time_equality(False, False, 'metadata_inequality')                                 |
|          | 306±2ms              | 298±3ms             | 0.97    | cube.CubeEquality.time_equality(False, True, 'all_equal')                                            |
|          | 195±2ms              | 195±2ms             | 1.00    | cube.CubeEquality.time_equality(False, True, 'coord_inequality')                                     |
|          | 316±3ms              | 316±3ms             | 1.00    | cube.CubeEquality.time_equality(False, True, 'data_inequality')                                      |
|          | 17.0±0.3μs           | 17.0±0.2μs          | 1.00    | cube.CubeEquality.time_equality(False, True, 'metadata_inequality')                                  |
|          | 220±3ms              | 222±2ms             | 1.01    | cube.CubeEquality.time_equality(True, False, 'all_equal')                                            |
|          | 111±0.9ms            | 111±2ms             | 1.00    | cube.CubeEquality.time_equality(True, False, 'coord_inequality')                                     |
|          | 232±5ms              | 232±3ms             | 1.00    | cube.CubeEquality.time_equality(True, False, 'data_inequality')                                      |
|          | 54.2±0.9μs           | 53.7±0.3μs          | 0.99    | cube.CubeEquality.time_equality(True, False, 'metadata_inequality')                                  |
|          | 305±4ms              | 305±5ms             | 1.00    | cube.CubeEquality.time_equality(True, True, 'all_equal')                                             |
|          | 195±3ms              | 196±2ms             | 1.01    | cube.CubeEquality.time_equality(True, True, 'coord_inequality')                                      |
|          | 315±3ms              | 314±3ms             | 1.00    | cube.CubeEquality.time_equality(True, True, 'data_inequality')                                       |
|          | 55.0±0.9μs           | 55.4±0.7μs          | 1.01    | cube.CubeEquality.time_equality(True, True, 'metadata_inequality')                                   |
|          | 401±3ns              | 416±3ns             | 1.04    | experimental.ugrid.regions_combine.CombineRegionsComputeRealData.time_compute_data(50)               |
|          | 257±3ms              | 260±3ms             | 1.01    | experimental.ugrid.regions_combine.CombineRegionsComputeRealData.time_compute_data(500)              |
|          | 14.5±0.08ms          | 14.4±0.2ms          | 0.99    | experimental.ugrid.regions_combine.CombineRegionsCreateCube.time_create_combined_cube(50)            |
|          | 16.4±0.3ms           | 16.2±0.1ms          | 0.98    | experimental.ugrid.regions_combine.CombineRegionsCreateCube.time_create_combined_cube(500)           |
|          | 5.0                  | 5.0                 | 1.00    | experimental.ugrid.regions_combine.CombineRegionsCreateCube.track_addedmem_create_combined_cube(50)  |
|          | 5.0                  | 5.0                 | 1.00    | experimental.ugrid.regions_combine.CombineRegionsCreateCube.track_addedmem_create_combined_cube(500) |
|          | 104±0.7ms            | 103±1ms             | 0.99    | experimental.ugrid.regions_combine.CombineRegionsFileStreamedCalc.time_stream_file2file(50)          |
|          | 704±7ms              | 703±1ms             | 1.00    | experimental.ugrid.regions_combine.CombineRegionsFileStreamedCalc.time_stream_file2file(500)         |
|          | 66.1±2ms             | 66.3±0.8ms          | 1.00    | experimental.ugrid.regions_combine.CombineRegionsSaveData.time_save(50)                              |
|          | 657±8ms              | 657±6ms             | 1.00    | experimental.ugrid.regions_combine.CombineRegionsSaveData.time_save(500)                             |
|          | 2.1752849999999997   | 2.1752849999999997  | 1.00    | experimental.ugrid.regions_combine.CombineRegionsSaveData.track_filesize_saved(50)                   |
|          | 216.01528499999998   | 216.01528499999998  | 1.00    | experimental.ugrid.regions_combine.CombineRegionsSaveData.track_filesize_saved(500)                  |
|          | 660±4μs              | 660±9μs             | 1.00    | import_iris.Iris.time__concatenate                                                                   |
|          | 179±1μs              | 184±4μs             | 1.03    | import_iris.Iris.time__constraints                                                                   |
|          | 110±0.7μs            | 110±0.9μs           | 1.00    | import_iris.Iris.time__data_manager                                                                  |
|          | 94.3±0.5μs           | 94.0±0.8μs          | 1.00    | import_iris.Iris.time__deprecation                                                                   |
|          | 137±2μs              | 136±0.5μs           | 0.99    | import_iris.Iris.time__lazy_data                                                                     |
|          | 900±5μs              | 899±10μs            | 1.00    | import_iris.Iris.time__merge                                                                         |
|          | 78.4±0.6μs           | 78.3±0.6μs          | 1.00    | import_iris.Iris.time__representation                                                                |
|          | 491±3μs              | 487±2μs             | 0.99    | import_iris.Iris.time_analysis                                                                       |
|          | 142±2μs              | 144±2μs             | 1.01    | import_iris.Iris.time_analysis__area_weighted                                                        |
|          | 110±3μs              | 111±0.6μs           | 1.01    | import_iris.Iris.time_analysis__grid_angles                                                          |
|          | 243±2μs              | 245±1μs             | 1.01    | import_iris.Iris.time_analysis__interpolation                                                        |
|          | 185±2μs              | 189±2μs             | 1.02    | import_iris.Iris.time_analysis__regrid                                                               |
|          | 112±1μs              | 111±0.9μs           | 0.99    | import_iris.Iris.time_analysis__scipy_interpolate                                                    |
|          | 140±1μs              | 140±1μs             | 1.00    | import_iris.Iris.time_analysis_calculus                                                              |
|          | 331±4μs              | 331±2μs             | 1.00    | import_iris.Iris.time_analysis_cartography                                                           |
|          | 94.9±0.3μs           | 95.1±0.5μs          | 1.00    | import_iris.Iris.time_analysis_geomerty                                                              |
|          | 221±3μs              | 222±3μs             | 1.00    | import_iris.Iris.time_analysis_maths                                                                 |
|          | 98.3±0.4μs           | 98.5±1μs            | 1.00    | import_iris.Iris.time_analysis_stats                                                                 |
|          | 176±2μs              | 177±1μs             | 1.00    | import_iris.Iris.time_analysis_trajectory                                                            |
|          | 307±6μs              | 300±3μs             | 0.98    | import_iris.Iris.time_aux_factory                                                                    |
|          | 85.1±1μs             | 85.4±1μs            | 1.00    | import_iris.Iris.time_common                                                                         |
|          | 162±2μs              | 165±4μs             | 1.02    | import_iris.Iris.time_common_lenient                                                                 |
|          | 987±10μs             | 986±9μs             | 1.00    | import_iris.Iris.time_common_metadata                                                                |
|          | 133±1μs              | 133±1μs             | 1.00    | import_iris.Iris.time_common_mixin                                                                   |
|          | 1.17±0.01ms          | 1.18±0.01ms         | 1.00    | import_iris.Iris.time_common_resolve                                                                 |
|          | 201±2μs              | 202±2μs             | 1.01    | import_iris.Iris.time_config                                                                         |
|          | 116±0.9μs            | 116±0.7μs           | 1.00    | import_iris.Iris.time_coord_categorisation                                                           |
|          | 355±5μs              | 352±4μs             | 0.99    | import_iris.Iris.time_coord_systems                                                                  |
|          | 737±7μs              | 733±5μs             | 0.99    | import_iris.Iris.time_coords                                                                         |
|          | 658±10μs             | 656±8μs             | 1.00    | import_iris.Iris.time_cube                                                                           |
|          | 225±1μs              | 226±3μs             | 1.01    | import_iris.Iris.time_exceptions                                                                     |
|          | 78.8±0.5μs           | 77.8±1μs            | 0.99    | import_iris.Iris.time_experimental                                                                   |
|          | 189±1μs              | 189±1μs             | 1.00    | import_iris.Iris.time_fileformats                                                                    |
|          | 251±2μs              | 249±2μs             | 0.99    | import_iris.Iris.time_fileformats__ff                                                                |
|          | 2.71±0.02ms          | 2.69±0.02ms         | 0.99    | import_iris.Iris.time_fileformats__ff_cross_references                                               |
|          | 79.1±1μs             | 79.7±1μs            | 1.01    | import_iris.Iris.time_fileformats__pp_lbproc_pairs                                                   |
|          | 114±1μs              | 115±0.9μs           | 1.00    | import_iris.Iris.time_fileformats_abf                                                                |
|          | 348±4μs              | 349±2μs             | 1.00    | import_iris.Iris.time_fileformats_cf                                                                 |
|          | 5.31±0.06ms          | 5.40±0.06ms         | 1.02    | import_iris.Iris.time_fileformats_dot                                                                |
|          | 75.2±0.9μs           | 75.6±2μs            | 1.01    | import_iris.Iris.time_fileformats_name                                                               |
|          | 260±2μs              | 261±2μs             | 1.01    | import_iris.Iris.time_fileformats_name_loaders                                                       |
|          | 120±2μs              | 120±1μs             | 1.00    | import_iris.Iris.time_fileformats_netcdf                                                             |
|          | 121±1μs              | 122±1μs             | 1.01    | import_iris.Iris.time_fileformats_nimrod                                                             |
|          | 208±5μs              | 208±4μs             | 1.00    | import_iris.Iris.time_fileformats_nimrod_load_rules                                                  |
|          | 769±7μs              | 765±7μs             | 1.00    | import_iris.Iris.time_fileformats_pp                                                                 |
|          | 181±4μs              | 180±4μs             | 1.00    | import_iris.Iris.time_fileformats_pp_load_rules                                                      |
|          | 134±2μs              | 134±1μs             | 1.00    | import_iris.Iris.time_fileformats_pp_save_rules                                                      |
|          | 520±7μs              | 513±2μs             | 0.99    | import_iris.Iris.time_fileformats_rules                                                              |
|          | 221±2μs              | 219±1μs             | 0.99    | import_iris.Iris.time_fileformats_structured_array_identification                                    |
|          | 84.4±0.7μs           | 84.2±0.4μs          | 1.00    | import_iris.Iris.time_fileformats_um                                                                 |
|          | 161±0.7μs            | 161±2μs             | 1.00    | import_iris.Iris.time_fileformats_um__fast_load                                                      |
|          | 136±1μs              | 137±1μs             | 1.01    | import_iris.Iris.time_fileformats_um__fast_load_structured_fields                                    |
|          | 76.7±0.5μs           | 77.6±0.8μs          | 1.01    | import_iris.Iris.time_fileformats_um__ff_replacement                                                 |
|          | 82.7±1μs             | 82.4±2μs            | 1.00    | import_iris.Iris.time_fileformats_um__optimal_array_structuring                                      |
|          | 986±10μs             | 987±7μs             | 1.00    | import_iris.Iris.time_fileformats_um_cf_map                                                          |
|          | 137±2μs              | 137±0.9μs           | 1.00    | import_iris.Iris.time_io                                                                             |
|          | 171±2μs              | 172±2μs             | 1.01    | import_iris.Iris.time_io_format_picker                                                               |
|          | 203±3μs              | 209±2μs             | 1.03    | import_iris.Iris.time_iris                                                                           |
|          | 129±2μs              | 129±1μs             | 1.01    | import_iris.Iris.time_iterate                                                                        |
|          | 8.52±0.1ms           | 8.55±0.07ms         | 1.00    | import_iris.Iris.time_palette                                                                        |
|          | 339±6μs              | 342±2μs             | 1.01    | import_iris.Iris.time_plot                                                                           |
|          | 106±1μs              | 104±1μs             | 0.99    | import_iris.Iris.time_quickplot                                                                      |
|          | 2.07±0.01ms          | 2.08±0.01ms         | 1.00    | import_iris.Iris.time_std_names                                                                      |
|          | 1.78±0.01ms          | 1.77±0.01ms         | 0.99    | import_iris.Iris.time_symbols                                                                        |
|          | 22.6±1ms             | 23.1±1ms            | 1.03    | import_iris.Iris.time_tests                                                                          |
|          | 230±4μs              | 228±4μs             | 0.99    | import_iris.Iris.time_third_party_cartopy                                                            |
|          | 4.83±0.04ms          | 4.83±0.02ms         | 1.00    | import_iris.Iris.time_third_party_cf_units                                                           |
|          | 108±1μs              | 107±0.4μs           | 0.99    | import_iris.Iris.time_third_party_cftime                                                             |
|          | 2.82±0.01ms          | 2.81±0.01ms         | 1.00    | import_iris.Iris.time_third_party_matplotlib                                                         |
|          | 1.06±0ms             | 1.06±0.01ms         | 1.00    | import_iris.Iris.time_third_party_numpy                                                              |
|          | 159±2μs              | 158±2μs             | 1.00    | import_iris.Iris.time_third_party_scipy                                                              |
|          | 100±0.6μs            | 101±2μs             | 1.00    | import_iris.Iris.time_time                                                                           |
|          | 319±2μs              | 317±3μs             | 0.99    | import_iris.Iris.time_util                                                                           |
|          | 75.4±2μs             | 73.4±1μs            | 0.97    | iterate.IZip.time_izip                                                                               |
|          | 8.03±0.2ms           | 8.04±0.06ms         | 1.00    | load.LoadAndRealise.time_load((1280, 960, 5), False, 'FF')                                           |
|          | 23.8±0.6ms           | 23.3±0.6ms          | 0.98    | load.LoadAndRealise.time_load((1280, 960, 5), False, 'NetCDF')                                       |
|          | 9.08±0.1ms           | 8.79±0.2ms          | 0.97    | load.LoadAndRealise.time_load((1280, 960, 5), False, 'PP')                                           |
|          | 8.06±0.1ms           | 8.10±0.04ms         | 1.00    | load.LoadAndRealise.time_load((1280, 960, 5), True, 'FF')                                            |
|          | 20.9±0.7ms           | 20.8±0.2ms          | 1.00    | load.LoadAndRealise.time_load((1280, 960, 5), True, 'NetCDF')                                        |
|          | 8.84±0.2ms           | 8.84±0.06ms         | 1.00    | load.LoadAndRealise.time_load((1280, 960, 5), True, 'PP')                                            |
|          | 1.37±0.01s           | 1.34±0.01s          | 0.98    | load.LoadAndRealise.time_load((2, 2, 1000), False, 'FF')                                             |
|          | 20.5±0.2ms           | 20.7±0.09ms         | 1.01    | load.LoadAndRealise.time_load((2, 2, 1000), False, 'NetCDF')                                         |
|          | 1.49±0.02s           | 1.51±0.02s          | 1.01    | load.LoadAndRealise.time_load((2, 2, 1000), False, 'PP')                                             |
|          | 1.35±0.01s           | 1.35±0.01s          | 1.00    | load.LoadAndRealise.time_load((2, 2, 1000), True, 'FF')                                              |
|          | 20.8±0.5ms           | 20.6±0.1ms          | 0.99    | load.LoadAndRealise.time_load((2, 2, 1000), True, 'NetCDF')                                          |
|          | 1.50±0.02s           | 1.49±0.01s          | 1.00    | load.LoadAndRealise.time_load((2, 2, 1000), True, 'PP')                                              |
|          | 3.91±0.03ms          | 3.91±0.03ms         | 1.00    | load.LoadAndRealise.time_load((50, 50, 2), False, 'FF')                                              |
|          | 20.0±0.4ms           | 19.6±0.2ms          | 0.98    | load.LoadAndRealise.time_load((50, 50, 2), False, 'NetCDF')                                          |
|          | 4.16±0.02ms          | 4.17±0.01ms         | 1.00    | load.LoadAndRealise.time_load((50, 50, 2), False, 'PP')                                              |
|          | 3.87±0.02ms          | 3.91±0.04ms         | 1.01    | load.LoadAndRealise.time_load((50, 50, 2), True, 'FF')                                               |
|          | 20.3±0.4ms           | 19.5±0.2ms          | 0.96    | load.LoadAndRealise.time_load((50, 50, 2), True, 'NetCDF')                                           |
|          | 4.16±0.02ms          | 4.20±0.03ms         | 1.01    | load.LoadAndRealise.time_load((50, 50, 2), True, 'PP')                                               |
|          | 32.7±2ms             | 32.2±3ms            | 0.98    | load.LoadAndRealise.time_realise((1280, 960, 5), False, 'FF')                                        |
|          | 19.5±0.5ms           | 19.2±0.2ms          | 0.98    | load.LoadAndRealise.time_realise((1280, 960, 5), False, 'NetCDF')                                    |
|          | 14.1±2ms             | 14.0±2ms            | 0.99    | load.LoadAndRealise.time_realise((1280, 960, 5), False, 'PP')                                        |
|          | 25.7±2ms             | 25.4±1ms            | 0.99    | load.LoadAndRealise.time_realise((1280, 960, 5), True, 'FF')                                         |
|          | 70.6±2ms             | 70.7±2ms            | 1.00    | load.LoadAndRealise.time_realise((1280, 960, 5), True, 'NetCDF')                                     |
|          | 25.7±1ms             | 25.3±2ms            | 0.98    | load.LoadAndRealise.time_realise((1280, 960, 5), True, 'PP')                                         |
|          | 437±3ms              | 435±2ms             | 0.99    | load.LoadAndRealise.time_realise((2, 2, 1000), False, 'FF')                                          |
|          | 2.75±0.1ms           | 2.78±0.1ms          | 1.01    | load.LoadAndRealise.time_realise((2, 2, 1000), False, 'NetCDF')                                      |
|          | 444±3ms              | 440±3ms             | 0.99    | load.LoadAndRealise.time_realise((2, 2, 1000), False, 'PP')                                          |
|          | 442±5ms              | 442±3ms             | 1.00    | load.LoadAndRealise.time_realise((2, 2, 1000), True, 'FF')                                           |
|          | 2.82±0.07ms          | 2.73±0.07ms         | 0.97    | load.LoadAndRealise.time_realise((2, 2, 1000), True, 'NetCDF')                                       |
|          | 449±4ms              | 447±4ms             | 1.00    | load.LoadAndRealise.time_realise((2, 2, 1000), True, 'PP')                                           |
|          | 1.55±0.07ms          | 1.56±0.08ms         | 1.01    | load.LoadAndRealise.time_realise((50, 50, 2), False, 'FF')                                           |
|          | 2.82±0.09ms          | 2.76±0.05ms         | 0.98    | load.LoadAndRealise.time_realise((50, 50, 2), False, 'NetCDF')                                       |
|          | 1.56±0.1ms           | 1.59±0.09ms         | 1.02    | load.LoadAndRealise.time_realise((50, 50, 2), False, 'PP')                                           |
|          | 1.62±0.08ms          | 1.58±0.06ms         | 0.98    | load.LoadAndRealise.time_realise((50, 50, 2), True, 'FF')                                            |
|          | 2.88±0.08ms          | 2.84±0.07ms         | 0.99    | load.LoadAndRealise.time_realise((50, 50, 2), True, 'NetCDF')                                        |
|          | 1.59±0.06ms          | 1.56±0.07ms         | 0.98    | load.LoadAndRealise.time_realise((50, 50, 2), True, 'PP')                                            |
|          | 357±3ms              | 357±6ms             | 1.00    | load.ManyVars.time_many_var_load                                                                     |
|          | 8.12±0.02ms          | 8.12±0.04ms         | 1.00    | load.STASHConstraint.time_stash_constraint((1280, 960, 5), 'FF')                                     |
|          | 8.95±0.08ms          | 9.00±0.07ms         | 1.01    | load.STASHConstraint.time_stash_constraint((1280, 960, 5), 'PP')                                     |
|          | 1.35±0.01s           | 1.36±0.01s          | 1.01    | load.STASHConstraint.time_stash_constraint((2, 2, 1000), 'FF')                                       |
|          | 1.52±0.02s           | 1.52±0.01s          | 1.00    | load.STASHConstraint.time_stash_constraint((2, 2, 1000), 'PP')                                       |
|          | 3.95±0.02ms          | 3.98±0.02ms         | 1.01    | load.STASHConstraint.time_stash_constraint((2, 2, 2), 'FF')                                          |
|          | 4.23±0.03ms          | 4.23±0.02ms         | 1.00    | load.STASHConstraint.time_stash_constraint((2, 2, 2), 'PP')                                          |
|          | 8.06±0.03ms          | 8.08±0.06ms         | 1.00    | load.StructuredFF.time_structured_load((1280, 960, 5), False)                                        |
|          | 4.74±0.03ms          | 4.73±0.05ms         | 1.00    | load.StructuredFF.time_structured_load((1280, 960, 5), True)                                         |
|          | 1.34±0.02s           | 1.32±0.01s          | 0.98    | load.StructuredFF.time_structured_load((2, 2, 1000), False)                                          |
|          | 372±6ms              | 368±2ms             | 0.99    | load.StructuredFF.time_structured_load((2, 2, 1000), True)                                           |
|          | 3.92±0.01ms          | 3.90±0.03ms         | 0.99    | load.StructuredFF.time_structured_load((2, 2, 2), False)                                             |
|          | 3.53±0.02ms          | 3.54±0.03ms         | 1.00    | load.StructuredFF.time_structured_load((2, 2, 2), True)                                              |
|          | 148±0.6ms            | 148±0.5ms           | 1.00    | load.TimeConstraint.time_time_constraint(20, 'FF')                                                   |
|          | 23.1±0.3ms           | 22.9±0.2ms          | 0.99    | load.TimeConstraint.time_time_constraint(20, 'NetCDF')                                               |
|          | 166±3ms              | 162±0.9ms           | 0.98    | load.TimeConstraint.time_time_constraint(20, 'PP')                                                   |
|          | 29.5±0.3ms           | 29.3±0.2ms          | 0.99    | load.TimeConstraint.time_time_constraint(3, 'FF')                                                    |
|          | 22.9±0.3ms           | 22.9±0.1ms          | 1.00    | load.TimeConstraint.time_time_constraint(3, 'NetCDF')                                                |
|          | 31.5±0.1ms           | 31.5±0.5ms          | 1.00    | load.TimeConstraint.time_time_constraint(3, 'PP')                                                    |
|          | 17.2±0.2ms           | 17.2±0.4ms          | 1.00    | load.ugrid.BasicLoading.time_load_file(1)                                                            |
|          | 40.9±0.3ms           | 41.0±0.6ms          | 1.00    | load.ugrid.BasicLoading.time_load_file(200000)                                                       |
|          | 14.0±0.1ms           | 14.1±0.4ms          | 1.00    | load.ugrid.BasicLoading.time_load_mesh(1)                                                            |
|          | 21.5±0.3ms           | 21.8±0.3ms          | 1.02    | load.ugrid.BasicLoading.time_load_mesh(200000)                                                       |
|          | 17.3±0.3ms           | 17.2±0.2ms          | 0.99    | load.ugrid.BasicLoadingTime.time_load_file(1)                                                        |
|          | 19.9±0.4ms           | 20.2±0.6ms          | 1.01    | load.ugrid.BasicLoadingTime.time_load_file(200000)                                                   |
|          | 14.1±0.2ms           | 14.0±0.2ms          | 0.99    | load.ugrid.BasicLoadingTime.time_load_mesh(1)                                                        |
|          | 16.7±0.3ms           | 16.6±0.2ms          | 0.99    | load.ugrid.BasicLoadingTime.time_load_mesh(200000)                                                   |
|          | 18.3±0.2ms           | 18.6±0.4ms          | 1.02    | load.ugrid.Callback.time_load_file_callback(1)                                                       |
|          | 50.2±0.7ms           | 50.5±0.4ms          | 1.01    | load.ugrid.Callback.time_load_file_callback(200000)                                                  |
|          | 18.2±0.3ms           | 18.1±0.2ms          | 0.99    | load.ugrid.CallbackTime.time_load_file_callback(1)                                                   |
|          | 21.9±0.8ms           | 21.9±0.4ms          | 1.00    | load.ugrid.CallbackTime.time_load_file_callback(200000)                                              |
|          | 2.75±0.2ms           | 2.73±0.1ms          | 0.99    | load.ugrid.DataRealisation.time_realise_data(10000)                                                  |
|          | 5.36±0.1ms           | 3.62±1ms            | ~0.67   | load.ugrid.DataRealisation.time_realise_data(200000)                                                 |
|          | 39.3±4ms             | 39.5±2ms            | 1.01    | load.ugrid.DataRealisationTime.time_realise_data(10000)                                              |
|          | 827±10ms             | 821±8ms             | 0.99    | load.ugrid.DataRealisationTime.time_realise_data(200000)                                             |
|          | 276±0.8ms            | 276±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(1, 1)                                         |
|          | 276±3ms              | 275±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(1, 2)                                         |
|          | 277±1ms              | 278±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(1, 3)                                         |
|          | 277±1ms              | 278±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(1, 4)                                         |
|          | 274±2ms              | 274±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(4, 1)                                         |
|          | 267±0.7ms            | 266±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(4, 2)                                         |
|          | 269±1ms              | 268±0.6ms           | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(4, 3)                                         |
|          | 270±0.8ms            | 270±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(4, 4)                                         |
|          | 280±2ms              | 279±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(9, 1)                                         |
|          | 271±1ms              | 269±0.7ms           | 0.99    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(9, 2)                                         |
|          | 272±2ms              | 273±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(9, 3)                                         |
|          | 273±2ms              | 272±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(9, 4)                                         |
|          | 305.3                | 305.3               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(1, 1)                                 |
|          | 305.3                | 305.3               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(1, 2)                                 |
|          | 305.3                | 305.3               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(1, 3)                                 |
|          | 305.3                | 305.3               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(1, 4)                                 |
|          | 76.4                 | 76.4                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(4, 1)                                 |
|          | 152.7                | 152.7               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(4, 2)                                 |
|          | 229.0                | 229.0               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(4, 3)                                 |
|          | 305.3                | 305.3               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(4, 4)                                 |
|          | 34.1                 | 34.1                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(9, 1)                                 |
|          | 68.0                 | 68.0                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(9, 2)                                 |
|          | 101.9                | 101.9               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(9, 3)                                 |
|          | 135.8                | 135.8               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(9, 4)                                 |
|          | 275±1ms              | 276±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(1, 1)                                     |
|          | 276±1ms              | 276±0.8ms           | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(1, 2)                                     |
|          | 276±1ms              | 277±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(1, 3)                                     |
|          | 276±1ms              | 278±2ms             | 1.01    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(1, 4)                                     |
|          | 275±2ms              | 276±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(4, 1)                                     |
|          | 267±2ms              | 266±0.5ms           | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(4, 2)                                     |
|          | 268±1ms              | 269±0.9ms           | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(4, 3)                                     |
|          | 271±1ms              | 270±0.7ms           | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(4, 4)                                     |
|          | 277±1ms              | 276±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(9, 1)                                     |
|          | 271±2ms              | 270±0.5ms           | 0.99    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(9, 2)                                     |
|          | 272±1ms              | 273±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(9, 3)                                     |
|          | 272±1ms              | 272±0.8ms           | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(9, 4)                                     |
|          | 307.9                | 307.9               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(1, 1)                             |
|          | 307.9                | 307.9               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(1, 2)                             |
|          | 307.9                | 307.8               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(1, 3)                             |
|          | 307.9                | 307.9               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(1, 4)                             |
|          | 79.1                 | 79.1                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(4, 1)                             |
|          | 83.4                 | 83.4                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(4, 2)                             |
|          | 83.5                 | 83.5                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(4, 3)                             |
|          | 83.4                 | 83.4                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(4, 4)                             |
|          | 36.7                 | 36.8                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(9, 1)                             |
|          | 38.0                 | 37.8                | 0.99    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(9, 2)                             |
|          | 38.0                 | 37.9                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(9, 3)                             |
|          | 38.2                 | 37.8                | 0.99    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(9, 4)                             |
|          | 286±1ms              | 286±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(1, 1)                               |
|          | 292±1ms              | 291±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(1, 2)                               |
|          | 297±1ms              | 297±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(1, 3)                               |
|          | 302±2ms              | 303±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(1, 4)                               |
|          | 286±2ms              | 286±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(4, 1)                               |
|          | 158±1ms              | 161±7ms             | 1.02    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(4, 2)                               |
|          | 162±1ms              | 162±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(4, 3)                               |
|          | 157±2ms              | 157±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(4, 4)                               |
|          | 289±2ms              | 290±3ms             | 1.01    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(9, 1)                               |
|          | 175±2ms              | 177±7ms             | 1.01    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(9, 2)                               |
|          | 171±0.9ms            | 171±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(9, 3)                               |
|          | 176±2ms              | 179±2ms             | 1.02    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(9, 4)                               |
|          | 0.1                  | 0.1                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(1, 1)                       |
|          | 0.1                  | 0.1                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(1, 2)                       |
|          | 0.1                  | 0.1                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(1, 3)                       |
|          | 0.1                  | 0.1                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(1, 4)                       |
|          | 0.2                  | 0.2                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(4, 1)                       |
|          | 0.2                  | 0.2                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(4, 2)                       |
|          | 0.2                  | 0.2                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(4, 3)                       |
|          | 0.2                  | 0.2                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(4, 4)                       |
|          | 0.3                  | 0.3                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(9, 1)                       |
|          | 0.3                  | 0.3                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(9, 2)                       |
|          | 0.3                  | 0.3                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(9, 3)                       |
|          | 0.3                  | 0.3                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(9, 4)                       |
|          | 107±5ms              | 109±3ms             | 1.01    | memtrace_evaluation.MemcheckRunstyles.time_calc('rss', 'processes', 10000)                           |
|          | 308±5ms              | 311±6ms             | 1.01    | memtrace_evaluation.MemcheckRunstyles.time_calc('rss', 'processes', 40000)                           |
|          | 121±1ms              | 121±0.4ms           | 1.00    | memtrace_evaluation.MemcheckRunstyles.time_calc('rss', 'threads', 10000)                             |
|          | 537±1ms              | 536±0.9ms           | 1.00    | memtrace_evaluation.MemcheckRunstyles.time_calc('rss', 'threads', 40000)                             |
|          | 108±4ms              | 108±4ms             | 1.00    | memtrace_evaluation.MemcheckRunstyles.time_calc('tracemalloc', 'processes', 10000)                   |
|          | 309±3ms              | 310±4ms             | 1.01    | memtrace_evaluation.MemcheckRunstyles.time_calc('tracemalloc', 'processes', 40000)                   |
|          | 121±0.7ms            | 121±0.5ms           | 1.00    | memtrace_evaluation.MemcheckRunstyles.time_calc('tracemalloc', 'threads', 10000)                     |
|          | 537±2ms              | 538±2ms             | 1.00    | memtrace_evaluation.MemcheckRunstyles.time_calc('tracemalloc', 'threads', 40000)                     |
|          | 2.5                  | 2.5                 | 1.00    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('rss', 'processes', 10000)                   |
|          | 2.6                  | 2.5                 | 0.96    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('rss', 'processes', 40000)                   |
|          | 79.0                 | 79.0                | 1.00    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('rss', 'threads', 10000)                     |
|          | 110.3                | 110.3               | 1.00    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('rss', 'threads', 40000)                     |
|          | 0.2                  | 0.2                 | 1.00    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('tracemalloc', 'processes', 10000)           |
|          | 0.2                  | 0.2                 | 1.00    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('tracemalloc', 'processes', 40000)           |
|          | 76.4                 | 76.4                | 1.00    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('tracemalloc', 'threads', 10000)             |
|          | 305.3                | 305.3               | 1.00    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('tracemalloc', 'threads', 40000)             |
|          | 183±2ms              | 185±2ms             | 1.01    | merge_concat.Concatenate.time_concatenate                                                            |
|          | 45.4±0.5ms           | 45.6±1ms            | 1.00    | merge_concat.Merge.time_merge                                                                        |
|          | 6.67±0.08ms          | 6.53±0.03ms         | 0.98    | plot.AuxSort.time_aux_sort                                                                           |
|          | 76.1±3ms             | 79.0±2ms            | 1.04    | regridding.CurvilinearRegridding.time_regrid_pic                                                     |
|          | 98.1±0.5ms           | 97.5±0.6ms          | 0.99    | regridding.HorizontalChunkedRegridding.time_regrid_area_w                                            |
|          | 48.6±2ms             | 49.0±2ms            | 1.01    | regridding.HorizontalChunkedRegridding.time_regrid_area_w_new_grid                                   |
|          | 4.03±0.05ms          | 4.04±0.03ms         | 1.00    | save.NetcdfSave.time_netcdf_save_cube(50, False)                                                     |
|          | 70.9±0.3ms           | 70.6±0.4ms          | 0.99    | save.NetcdfSave.time_netcdf_save_cube(50, True)                                                      |
|          | 52.5±2ms             | 52.6±1ms            | 1.00    | save.NetcdfSave.time_netcdf_save_cube(600, False)                                                    |
|          | 568±6ms              | 567±4ms             | 1.00    | save.NetcdfSave.time_netcdf_save_cube(600, True)                                                     |
|          | 90.3±0.5ns           | 91.4±0.7ns          | 1.01    | save.NetcdfSave.time_netcdf_save_mesh(50, False)                                                     |
|          | 54.6±0.8ms           | 54.1±0.4ms          | 0.99    | save.NetcdfSave.time_netcdf_save_mesh(50, True)                                                      |
|          | 90.1±0.8ns           | 90.9±2ns            | 1.01    | save.NetcdfSave.time_netcdf_save_mesh(600, False)                                                    |
|          | 498±4ms              | 501±2ms             | 1.01    | save.NetcdfSave.time_netcdf_save_mesh(600, True)                                                     |
|          | 42.0±0.7ms           | 42.2±1ms            | 1.00    | stats.PearsonR.time_lazy                                                                             |
|          | 19.1±0.3ms           | 18.9±0.2ms          | 0.99    | stats.PearsonR.time_real                                                                             |
|          | 23.0±0.8ms           | 23.0±0.7ms          | 1.00    | trajectory.TrajectoryInterpolation.time_trajectory_linear                                            |
|          | 57.4±0.5ms           | 57.8±0.6ms          | 1.01    | trajectory.TrajectoryInterpolation.time_trajectory_nearest                                           |

Generated by GHA run 9113925886

@pp-mo
Copy link
Member Author

pp-mo commented May 16, 2024

Note: the results reported in the whole-benchmark runs behave a little differently.

  • the parameter tables are laid out differently in some cases, which is odd but maybe due to some additional config in the GHA action ?
  • the memory scaling for threaded workers is different -- I suspect the number of available CPUs is different

@pp-mo pp-mo closed this May 16, 2024
@pp-mo pp-mo reopened this May 20, 2024
@pp-mo
Copy link
Member Author

pp-mo commented May 20, 2024

Added
Investigate repeated ops with simulated memory 'leak'.

Just for reference.
A sample output ...

$ python bm_runner.py custom run -- -b "MemcheckTaskRepeats" -e --quick -E existing:/tmp/persistent/newconda-envs/iris3/bin/python
 . . .
    BM_RUNNER DEBUG: asv run -b MemcheckTaskRepeats -e --quick -E existing:/tmp/persistent/newconda-envs/iris3/bin/python
 . . .
· Ignoring ASV setting(s): `python`. Benchmark environment management is delegated to third party script(s).
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_tmp_persistent_newconda-envs_iris3_bin_python
[ 0.00%] ··· Importing benchmark suite produced output:
[ 0.00%] ···· /net/home/h05/itpp/git/iris/iris_main/benchmarks/benchmarks/generate_data/__init__.py:52: UserWarning: No BENCHMARK_DATA env var, defaulting to /net/home/h05/itpp/git/iris/iris_main/benchmarks/.data. Note that some benchmark files are GB in size.
              warn(message)

[100.00%] ··· memtrace_evaluation.MemcheckTaskRepeats.track_mem                                                                                                      ok
[100.00%] ··· ======= =======
               nreps         
              ------- -------
                 1      76.8 
                 2      77.1 
                 3     102.9 
                 4     103.3 
              ======= =======

Some combined results

                   nreps
                  ------- -------
                     1      76.8,  76.8,  76.9,  76.8,  76.8,  76.8,
                     2      77.1,  77.0,  77.1,  77.0,  77.0,  77.1,
                     3     102.8, 103.0, 103.0, 103.0, 103.0, 102.9,
                     4     103.3, 103.2, 103.5, 103.4, 103.2, 103.3,
                  ======= =======
(( incremental differences, from above numbers ))
                     1-2     0.3   0.2   0.2   0.2   0.3
                     2-3    25.7  26.0  25.9  26.0  25.8
                     3-4     0.5   0.2   0.5   0.4   0.4
                  ======= =======

Conclusions

The allocation pattern is a trifle hard to explain.
Sizes don't relate very directly to the actual allocations and are irregular
- this presumably show the python heap memory handling in action
This raises some doubts how stable results may be.

@pp-mo pp-mo closed this May 20, 2024
@pp-mo pp-mo mentioned this pull request May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark_this Request that this pull request be benchmarked to check if it introduces performance shifts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants