Skip to content

v2025.7.0

Choose a tag to compare

@ikrommyd ikrommyd released this 14 Jul 23:42
· 124 commits to master since this release
1a9dba2

Important announcement

This release changes the default behavior of coffea . We are now focusing on doing analysis with the newly developed "virtual arrays" of awkward as the main backend.
For more information on virtual arrays, see this talk at PyHEP.dev 2025.
For examples of virtual array usage, see the following example repositories:
https://github.com/ikrommyd/coffea-virtual-array-tests
https://github.com/ikrommyd/coffea-virtual-array-demo
https://github.com/iris-hep/calver-coffea-agc-demo/blob/2025_IRISHEP_Training/agc-coffea-2025-virtual-arrays-and-executors.ipynb
https://github.com/ikrommyd/virtual-array-agc

The default behavior of NanoEventsFactory.from_root() has changed. It now reads the input root file using virtual arrays by default.
The backend choice is controlled by the mode argument of the method which can be set to "eager", "virtual", or "dask".
The new default is "virtual" while the delayed argument has been removed.
The old delayed=True is now equivalent to mode="dask". The old delayed=False is now equivalent to mode="eager".

At the same time, the coffea 0.7 processors and executors have been revived and analysis can be done using coffea 0.7-like syntax

from coffea.processor import ProcessorABC, Runner, DaskExecutor

class MyProcessor(ProcessorABC):
    def process(self, events):
        ...
        
    def postprocess(self, accumulator):
        ...

run = Runner(
    DaskExecutor(client=client, compression=None),
    chunksize=250_000,
    skipbadfiles=True,
    schema=NanoAODSchema,
    savemetrics=True
)

out, report = run(fileset, processor_instance=MyProcessor())

Analyses still using coffea 0.7 can and should seamlessly transition to this new release.

If you still want to use the dask interface (create task graphs), you should specify mode="dask" to NanoEventsFactory.from_root() when working on single file.
For scaling, you can still use the dataset_tools like the following

from coffea.dataset_tools import apply_to_fileset

apply_to_fileset(MyProcessor(), fileset, uproot_options={"allow_read_errors_with_report": True})

It is recommended to convert all analyses to use the new virtual arrays feature of awkward2 and not stick with packages that are unmaintained for 3 years (coffea 0.7 which still uses awkward1 ).
Please reach out for any help and to report problems.

New features

  • feat: EDM4HEPSchema and Newstyle FCCSchema by @prayagyadav in #1245
  • feat: add virtual arrays by @pfackeldey in #1277
  • feat: bring back iterative, futures and dask executors by @ikrommyd in #1323
  • feat: 0.7 style processor/executor model using ak2 virtual arrays. by @lgray in #1309
  • feat: bring back parsl executor by @ikrommyd in #1325
  • feat: add @original_array attr to events in virtual mode by @ikrommyd in #1327
  • feat: make column_accumulator support awkward arrays and add accumulator tests by @ikrommyd in #1352
  • feat: taskvine executor for new coffea by @btovar in #1360
  • feat: systematics handling for dask mode by @lgray in #786
  • feat: make max_chunks return the first N chunks per dataset (not per file per dataset like it is now) by @ikrommyd in #1359

Bug-fixes and performance

  • fix: properly support older numba/numpy mixtures by @lgray in #1298
  • fix: do not error or return None when calling min/max over length zero chunks in weight statistics, return infinities instead by @ikrommyd in #1328
  • fix: skip OSErrors when skipping bad files using executors by @ikrommyd in #1333
  • fix: executor's preprocess requires treename as input argument when there should be a default by @ikrommyd in #1334
  • fix: make NanoAODSchema the default in exectors for consistency with apply_to_fileset by @ikrommyd in #1335
  • fix: use awkward for min and max in Weights to avoid inconsistencies between eager/virtual and dask mode by @ikrommyd in #1337
  • fix: make nanoevents properly copiable and do not store the @original_array attribute as that will get copied by @ikrommyd in #1346
  • perf: use numpy only in eager weight statistics by @ikrommyd in #1351
  • fix: print the original processor error with the "failed processing file" exception by @ikrommyd in #1353
  • fix: _lazywhere was removed from scipy, use apply_where from scipy._lib.array_api_extra by @lgray in #1356
  • fix: make offsets start at zero forListOffsetArray coming from uproot (fix physlite entry start problem) by @ikrommyd in #1363
  • fix: clarify function names in systematics by @lgray in #1366
  • fix: Fix bugs in CorrectedMETFactory.build in coffea 202x by @ikrommyd in #1342
  • fix: error when delayed kwarg is used in nanoevents by @ikrommyd in #1367

Other

Full Changelog: v2025.3.0...v2025.7.0