Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actually fix MET cycles #487

Merged
merged 1 commit into from
Apr 7, 2021
Merged

Actually fix MET cycles #487

merged 1 commit into from
Apr 7, 2021

Conversation

nsmith-
Copy link
Member

@nsmith- nsmith- commented Apr 7, 2021

Lesson learned: don't let a virtual array generator return a highlevel virtual array that uses the same cache as the parent, because then you have a strong reference to the cache in the cache = cycle. Worse, if the strong reference is inside a generator, its invisible to python gc and will never get cleaned up.

A simpler example:

import numpy as np
import awkward as ak
import weakref

form = ak.forms.Form.fromjson('"int64"')
cache = {}
def make(name):
    out = np.arange(5)
    weakref.finalize(out, print, f"all done with {name}")
    return ak.virtual(lambda: out, length=5, form=form, cache=cache)

a = ak.zip({"x": make("x"), "y": make("y")}, depth_limit=1)
b = ak.virtual(
    lambda x, y: ak.zip({"sum": x + y, "diff": x - y}, depth_limit=1),
    args=(a.x, a.y),
    length=5,
    form=ak.forms.RecordForm({"sum": form, "diff": form}),
    cache=cache,
)
del a
print("made b")

def getdiff_good(thing):
    return ak.materialized(thing.diff)

def getdiff_bad(thing):
    return thing.diff

c = ak.virtual(getdiff_good, args=(b,), length=len(b), form=form, cache=cache)
del b
print("made c")
print(f"cache size: {len(cache)}")

print(c)
del c
del cache
print("all done")

With getdiff_good, running this produces:

made b
made c
cache size: 0
[0, 0, 0, 0, 0]
all done with x
all done with y
all done

With getdiff_bad we get:

made b
made c
cache size: 0
[0, 0, 0, 0, 0]
all done
all done with y
all done with x

notice that x and y are only finalized at program exit.

@nsmith-
Copy link
Member Author

nsmith- commented Apr 7, 2021

One easy workaround is to use a separate cache, e.g.

c = ak.virtual(getdiff_bad, args=(b,), length=len(b), form=form, cache={})

works ok.

In practice, this means something like met_factory.build(met, corrected_jets, lazy_cache={}) as opposed to re-using the cache used to build jets.

@lgray lgray merged commit 84e8057 into master Apr 7, 2021
@lgray lgray deleted the metcycles2 branch November 17, 2021 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants