OasisLMF Changelog - 2.5.4
- #1940 - enhancement/profile check script
- #1942 - Fix brittle PolNumber backfill in IL input preparation
- #1947 - enhancement/conversion_tool_speed
- #1955 - improved quadratic interpolation so it evaluates in a way that's robu…
- #1957 - Stochastic hazard dynamic footprint
- #1963 - Update API client for OIDC M2M
- #1964 - perf(gulmc): replace numba dicts with precomputed array-backed structures
- #1967 - Fix for stalled runs on V2 workers
- #1968 - improve numerical stability in variance calculations and add unit tests
- #1969 - Improved bash error detection
- #1971 - Fix IL merge failure when layers sharing a CondTag mix %-TIV and flat terms
- #1973 - Add portfolio complexity metrics to oasislmf exposure run
- #1974 - Improve rtree builtin
- #1975 - Fix platform checks for external PRs
- #1979 - Feature/hazard selection dynamic
- #1980 - Round progress bar down
- #1985 - port receiving data from non oasis source wont crash
- #1987 - fix/pytools-empty-inputs
- #1992 - Speed up summarypy read_buffer
- #1993 - fix broken docs link
- #1994 - fix summarypy missing dtypes
- #1997 - Fix for ci error
- #1999 - fix/input_gen_status
OasisLMF Notes
fixes to default profile + tests - (PR #1940)
- removed non OED fields
PolLimitCondNumber
- Fixed Cyber names
- created tests to check
default_acc_profileanddefault_loc_profilewith the following checks:- check field is in OEDSpec (exceptions for BI Type fields and Cyber TIV fields)
- check ProfileElementName matches key (except Cyber TIV field)
enhancement/conversion_tool_speed - (PR #1947)
Update conversion tools for speed
Rewrites the Python converter implementations (csvtobin, bintocsv, bintoparquet, parquettobin) to reduce peak memory and improve throughput. Changes apply across all converter directions.
What changed:
The core change across all converters is chunked processing: CSV is read in fixed DEFAULT_BUFFER_SIZE chunks via iter_csv_as_ndarray(), binary output is written through pre-allocated batch buffers (_BATCH_ROWS), and parquet I/O streams via PyArrow's native ParquetWriter/iter_batches(). Binary inputs switch from np.fromfile to np.memmap. Hot-path encoding in fm, gul, and summarycalc csvtobin uses Numba JIT to build the binary stream format per chunk; validation state is carried across chunk boundaries as scalars rather than accumulating full-file structures.
Behaviour changes worth noting:
- Vulnerability csvtobin: three validation checks removed —
damage_bin_idcontiguity,damage_bin_idstarts at 1, andintensity_bin_idcontiguity within each vulnerability. These no longer run even whenno_validation=False. Thesuppress_int_bin_checks=Falseglobal intensity-bin consistency check is also replaced by a rolling per-vulnerability check, so cross-file inconsistencies between non-adjacent vulnerabilities are no longer caught. - Footprint: new
decompressed_sizeflag writes the uncompressed size into zip.idxfiles; bintocsv zip path reuses a single pre-allocated decompression buffer when the field is present - Occurrence csvtobin:
no_date_alg=Truepath now validatesperiod_no ≤ no_of_periods(previously unchecked)
Affected converters:
csvtobin: amplifications, coverages, damagebin, fm, footprint, gul, lossfactors, occurrence, summarycalc, vulnerabilitybintocsv: amplifications, coverages, cdf, footprint, lossfactors, occurrence, vulnerabilitybintoparquet/parquettobin: default handler (aal, melt, periods, items, correlations)
Tests: parametrised round-trip coverage for all converter types, no_validation paths, and decompressed_size index format.
Benchmark results
Best of 3 repeats. Memory via tracemalloc (Python heap + NumPy; excludes Numba-JIT internals).
csvtobin
| Converter | Dataset | Speedup | Peak mem: orig → new |
|---|---|---|---|
| fm | 40k ev × 5 items × 100 samples | 10.1x (25.2s → 2.5s) | 610 MB → 89 MB (6.9x) |
| gul | 40k ev × 5 items × 100 samples | 10.4x (28.0s → 2.7s) | 610 MB → 89 MB (6.9x) |
| summarycalc | 20k ev × 3 summaries × 100 samples | 8.3x (10.4s → 1.3s) | 229 MB → 108 MB (2.1x) |
| lossfactors | 200k ev × 10 amp | 6.5x (12.1s → 1.9s) | 221 MB → 52 MB (4.3x) |
| footprint | 15k ev × 100 ap × 2 ib | 105x (44.5s → 0.42s) | 261 MB → 78 MB (3.3x) |
| footprint (zip) | 15k ev × 100 ap × 2 ib | 32x (48.1s → 1.5s) | 261 MB → 67 MB (3.9x) |
| vulnerability + idx | 15k v × 50 ib × 10 db | 96x (233s → 2.4s) | 229 MB → 57 MB (4.0x) |
| vulnerability + zip + idx | 15k v × 50 ib × 10 db | 53x (232s → 4.4s) | 229 MB → 57 MB (4.0x) |
| occurrence | 10M events | 1.1x (2.2s → 2.1s) | 382 MB → 96 MB (4.0x) |
| coverages | 5M coverages | 1.4x (0.63s → 0.47s) | 76 MB → 12 MB (6.2x) |
| amplifications | 5M items | 1.1x (0.43s → 0.40s) | 76 MB → 28 MB (2.8x) |
| damagebin | 5M bins | 1.1x (1.19s → 1.13s) | 219 MB → 83 MB (2.6x) |
bintocsv
| Converter | Dataset | Speedup | Peak mem: orig → new |
|---|---|---|---|
| footprint | 15k ev × 100 ap × 2 ib | 2.2x (0.33s → 0.15s) | 39 MB → 707 KB (56.7x) |
| footprint (zip) | 15k ev × 100 ap × 2 ib | 1.2x | 13 MB → 93 KB (144.8x) |
| vulnerability + idx | 15k v × 50 ib × 10 db | 1.4x | 200 MB → 692 KB (296.5x) |
| vulnerability + zip + idx | 15k v × 50 ib × 10 db | 1.0x | 153 MB → 80 KB (1960x) |
| lossfactors | 200k ev × 10 amp | 1.9x (3.1s → 1.6s) | 244 MB → 337 KB (740.6x) |
| cdf | 3k ev × 30 ap × 2 vuln × 10 bins | 21.2x (5.5s → 0.26s) | ~384 KB → ~883 KB |
| occurrence | 10M events | 2.6x (1.8s → 0.69s) | 59 MB → 66 MB (~) |
| amplifications | 5M items | 1.6x (0.15s → 0.09s) | 169 MB → 23 MB (7.4x) |
| coverages | 5M coverages | 1.3x (0.31s → 0.23s) | 245 MB → 49 MB (5.0x) |
bintoparquet / parquettobin (default handler: aal, melt, periods, items, correlations)
| Direction | Converter | Dataset | Speedup | Peak mem: orig → new |
|---|---|---|---|---|
| bintoparquet | aal | 5M rows | 1.2x (0.39s → 0.32s) | 229 MB → 30 MB (7.5x) |
| bintoparquet | melt | 5M rows | 1.2x (1.27s → 1.06s) | 629 MB → 84 MB (7.5x) |
| parquettobin | aal | 5M rows | 2.1x (0.17s → 0.08s) | 153 MB → 43 MB (3.5x) |
| parquettobin | melt | 5M rows | 1.7x (0.52s → 0.30s) | 420 MB → 126 MB (3.3x) |
closes #1944
Update API client for OIDC M2M - (PR #1963)
Added new auth_mode m2m which uses client_credentials grant direct to IdP. Added three new flags to the API client CLI to support this.
--auth-type {simple,oidc,m2m}
Authentication type: simple (username/password JWT), oidc (client credentials via platform),
m2m (client credentials direct to IdP)
--oidc-token-url OIDC_TOKEN_URL
Token endpoint URL for m2m client_credentials grant (e.g.
https://idp.example.com/oauth2/token)
--oidc-scope OIDC_SCOPE
OAuth2 scope to request when fetching an m2m token (e.g. oasis/m2m)
- Ground-up loss (
gulmc) now runs ~45% faster end-to-end and uses ~30% less peak memory on representative workloads, by replacing numba dicts with precomputed array-backed structures.
Fix for stalled runs on V2 workers - (PR #1967)
Fixed issue where one run script matched an deleted another chunks FIFO queues, causing that chunk of events to stall
Improved bash error detection - (PR #1969)
- Bash script generation checks bash version support and adds
-p varto wait calls, this will check the exit code of tracked background processes and kill the script if one errors. - Moved the bash tracing support check into python
- Added a check to ensure all expected named pipes exist and are FIFO (and not files), check happens before the main execution starts. see: #1967
Fixes a pandas.errors.MergeError: Merge keys are not unique in right dataset; not a many-to-one merge crash in IL input preparation when multiple policies/layers on the same account share a CondTag and at least one declares a %-of-TIV (or BI) financial term while another declares a flat or non-TIV-dependent term.
Improvements to rtree lookup builtin - (PR #1974)
- Improve performance by using vectorised operations.
- Rename parameter from
nearest_neighbor_min_distancetonearest_neighbor_max_distanceto correctly reflect that this is the greatest distance at which a point will be associated with a geometry. Former parameter is still accepted but will log a deprecation warning. - Hide the warning about distances being incorrect when using a geographical coordinate system (this is not ideal but can still function as a rough threshold).
- Add comments explaining that the distance is the Euclidean distance, not the more accurate spherical or ellipsoidal approximation.
- Add tests.
- Remove references in code and parameter names to "area peril" since this is a generic function that can be used for other purposes.
Fix platform checks for external PRs - (PR #1975)
Fix so that platform checks work on outside PR's
fix/pytools-empty-inputs - (PR #1987)
updates elt, plt, aal, lec, kat, join-summary-info code and tests to handle empty input files
closes #1986
fix/input_gen_status - (PR #1999)
Adds OasisExceptionNoKeys error to generate files
closes OasisLMF/OasisPlatform#974