Skip to content

Release 2.5.4

Latest

Choose a tag to compare

@awsbuild awsbuild released this 01 Jun 12:15

OasisLMF Changelog - 2.5.4

  • #1940 - enhancement/profile check script
  • #1942 - Fix brittle PolNumber backfill in IL input preparation
  • #1947 - enhancement/conversion_tool_speed
  • #1955 - improved quadratic interpolation so it evaluates in a way that's robu…
  • #1957 - Stochastic hazard dynamic footprint
  • #1963 - Update API client for OIDC M2M
  • #1964 - perf(gulmc): replace numba dicts with precomputed array-backed structures
  • #1967 - Fix for stalled runs on V2 workers
  • #1968 - improve numerical stability in variance calculations and add unit tests
  • #1969 - Improved bash error detection
  • #1971 - Fix IL merge failure when layers sharing a CondTag mix %-TIV and flat terms
  • #1973 - Add portfolio complexity metrics to oasislmf exposure run
  • #1974 - Improve rtree builtin
  • #1975 - Fix platform checks for external PRs
  • #1979 - Feature/hazard selection dynamic
  • #1980 - Round progress bar down
  • #1985 - port receiving data from non oasis source wont crash
  • #1987 - fix/pytools-empty-inputs
  • #1992 - Speed up summarypy read_buffer
  • #1993 - fix broken docs link
  • #1994 - fix summarypy missing dtypes
  • #1997 - Fix for ci error
  • #1999 - fix/input_gen_status

OasisLMF Notes

fixes to default profile + tests - (PR #1940)

  • removed non OED fields
    • PolLimit
    • CondNumber
  • Fixed Cyber names
  • created tests to check default_acc_profile and default_loc_profile with the following checks:
    • check field is in OEDSpec (exceptions for BI Type fields and Cyber TIV fields)
    • check ProfileElementName matches key (except Cyber TIV field)

enhancement/conversion_tool_speed - (PR #1947)

Update conversion tools for speed

Rewrites the Python converter implementations (csvtobin, bintocsv, bintoparquet, parquettobin) to reduce peak memory and improve throughput. Changes apply across all converter directions.

What changed:

The core change across all converters is chunked processing: CSV is read in fixed DEFAULT_BUFFER_SIZE chunks via iter_csv_as_ndarray(), binary output is written through pre-allocated batch buffers (_BATCH_ROWS), and parquet I/O streams via PyArrow's native ParquetWriter/iter_batches(). Binary inputs switch from np.fromfile to np.memmap. Hot-path encoding in fm, gul, and summarycalc csvtobin uses Numba JIT to build the binary stream format per chunk; validation state is carried across chunk boundaries as scalars rather than accumulating full-file structures.

Behaviour changes worth noting:

  • Vulnerability csvtobin: three validation checks removed — damage_bin_id contiguity, damage_bin_id starts at 1, and intensity_bin_id contiguity within each vulnerability. These no longer run even when no_validation=False. The suppress_int_bin_checks=False global intensity-bin consistency check is also replaced by a rolling per-vulnerability check, so cross-file inconsistencies between non-adjacent vulnerabilities are no longer caught.
  • Footprint: new decompressed_size flag writes the uncompressed size into zip .idx files; bintocsv zip path reuses a single pre-allocated decompression buffer when the field is present
  • Occurrence csvtobin: no_date_alg=True path now validates period_no ≤ no_of_periods (previously unchecked)

Affected converters:

  • csvtobin: amplifications, coverages, damagebin, fm, footprint, gul, lossfactors, occurrence, summarycalc, vulnerability
  • bintocsv: amplifications, coverages, cdf, footprint, lossfactors, occurrence, vulnerability
  • bintoparquet / parquettobin: default handler (aal, melt, periods, items, correlations)

Tests: parametrised round-trip coverage for all converter types, no_validation paths, and decompressed_size index format.


Benchmark results

Best of 3 repeats. Memory via tracemalloc (Python heap + NumPy; excludes Numba-JIT internals).

csvtobin

Converter Dataset Speedup Peak mem: orig → new
fm 40k ev × 5 items × 100 samples 10.1x (25.2s → 2.5s) 610 MB → 89 MB (6.9x)
gul 40k ev × 5 items × 100 samples 10.4x (28.0s → 2.7s) 610 MB → 89 MB (6.9x)
summarycalc 20k ev × 3 summaries × 100 samples 8.3x (10.4s → 1.3s) 229 MB → 108 MB (2.1x)
lossfactors 200k ev × 10 amp 6.5x (12.1s → 1.9s) 221 MB → 52 MB (4.3x)
footprint 15k ev × 100 ap × 2 ib 105x (44.5s → 0.42s) 261 MB → 78 MB (3.3x)
footprint (zip) 15k ev × 100 ap × 2 ib 32x (48.1s → 1.5s) 261 MB → 67 MB (3.9x)
vulnerability + idx 15k v × 50 ib × 10 db 96x (233s → 2.4s) 229 MB → 57 MB (4.0x)
vulnerability + zip + idx 15k v × 50 ib × 10 db 53x (232s → 4.4s) 229 MB → 57 MB (4.0x)
occurrence 10M events 1.1x (2.2s → 2.1s) 382 MB → 96 MB (4.0x)
coverages 5M coverages 1.4x (0.63s → 0.47s) 76 MB → 12 MB (6.2x)
amplifications 5M items 1.1x (0.43s → 0.40s) 76 MB → 28 MB (2.8x)
damagebin 5M bins 1.1x (1.19s → 1.13s) 219 MB → 83 MB (2.6x)

bintocsv

Converter Dataset Speedup Peak mem: orig → new
footprint 15k ev × 100 ap × 2 ib 2.2x (0.33s → 0.15s) 39 MB → 707 KB (56.7x)
footprint (zip) 15k ev × 100 ap × 2 ib 1.2x 13 MB → 93 KB (144.8x)
vulnerability + idx 15k v × 50 ib × 10 db 1.4x 200 MB → 692 KB (296.5x)
vulnerability + zip + idx 15k v × 50 ib × 10 db 1.0x 153 MB → 80 KB (1960x)
lossfactors 200k ev × 10 amp 1.9x (3.1s → 1.6s) 244 MB → 337 KB (740.6x)
cdf 3k ev × 30 ap × 2 vuln × 10 bins 21.2x (5.5s → 0.26s) ~384 KB → ~883 KB
occurrence 10M events 2.6x (1.8s → 0.69s) 59 MB → 66 MB (~)
amplifications 5M items 1.6x (0.15s → 0.09s) 169 MB → 23 MB (7.4x)
coverages 5M coverages 1.3x (0.31s → 0.23s) 245 MB → 49 MB (5.0x)

bintoparquet / parquettobin (default handler: aal, melt, periods, items, correlations)

Direction Converter Dataset Speedup Peak mem: orig → new
bintoparquet aal 5M rows 1.2x (0.39s → 0.32s) 229 MB → 30 MB (7.5x)
bintoparquet melt 5M rows 1.2x (1.27s → 1.06s) 629 MB → 84 MB (7.5x)
parquettobin aal 5M rows 2.1x (0.17s → 0.08s) 153 MB → 43 MB (3.5x)
parquettobin melt 5M rows 1.7x (0.52s → 0.30s) 420 MB → 126 MB (3.3x)

closes #1944

Update API client for OIDC M2M - (PR #1963)

Added new auth_mode m2m which uses client_credentials grant direct to IdP. Added three new flags to the API client CLI to support this.

  --auth-type {simple,oidc,m2m}
                        Authentication type: simple (username/password JWT), oidc (client credentials via platform),
                        m2m (client credentials direct to IdP)
  --oidc-token-url OIDC_TOKEN_URL
                        Token endpoint URL for m2m client_credentials grant (e.g.
                        https://idp.example.com/oauth2/token)
  --oidc-scope OIDC_SCOPE
                        OAuth2 scope to request when fetching an m2m token (e.g. oasis/m2m)

  • Ground-up loss (gulmc) now runs ~45% faster end-to-end and uses ~30% less peak memory on representative workloads, by replacing numba dicts with precomputed array-backed structures.

Fix for stalled runs on V2 workers - (PR #1967)

Fixed issue where one run script matched an deleted another chunks FIFO queues, causing that chunk of events to stall

Improved bash error detection - (PR #1969)

  • Bash script generation checks bash version support and adds -p var to wait calls, this will check the exit code of tracked background processes and kill the script if one errors.
  • Moved the bash tracing support check into python
  • Added a check to ensure all expected named pipes exist and are FIFO (and not files), check happens before the main execution starts. see: #1967

Fixes a pandas.errors.MergeError: Merge keys are not unique in right dataset; not a many-to-one merge crash in IL input preparation when multiple policies/layers on the same account share a CondTag and at least one declares a %-of-TIV (or BI) financial term while another declares a flat or non-TIV-dependent term.

Improvements to rtree lookup builtin - (PR #1974)

  • Improve performance by using vectorised operations.
  • Rename parameter from nearest_neighbor_min_distance to nearest_neighbor_max_distance to correctly reflect that this is the greatest distance at which a point will be associated with a geometry. Former parameter is still accepted but will log a deprecation warning.
  • Hide the warning about distances being incorrect when using a geographical coordinate system (this is not ideal but can still function as a rough threshold).
  • Add comments explaining that the distance is the Euclidean distance, not the more accurate spherical or ellipsoidal approximation.
  • Add tests.
  • Remove references in code and parameter names to "area peril" since this is a generic function that can be used for other purposes.

Fix platform checks for external PRs - (PR #1975)

Fix so that platform checks work on outside PR's

fix/pytools-empty-inputs - (PR #1987)

updates elt, plt, aal, lec, kat, join-summary-info code and tests to handle empty input files

closes #1986

fix/input_gen_status - (PR #1999)

Adds OasisExceptionNoKeys error to generate files

closes OasisLMF/OasisPlatform#974